12 May 2026 5 min read

A virtual-cell benchmark with teeth

Nº 01 · The Lede arXiv Cell biology · Funding

Virtual-cell benchmark gets real

AssayBench scores LLMs on assay-level virtual-cell tasks — predicting readouts from perturbation experiments rather than retrieving textbook facts. The benchmark pits agents against real assay data across dose-response, viability, and transcriptomic endpoints, and current frontier models clear the easy splits but stall on anything requiring quantitative extrapolation. Anchors a new reference floor for virtual-cell AI claims: vendors pitching cell-scale prediction now have a public score to beat, and the gap between "reads biology" and "predicts biology" finally has a number.

Read the source →

Nº 02 bioRxiv Agents · Infrastructure

Agents pull QSP from papers

Talk2QSP turns literature into executable quantitative systems pharmacology scenarios, with a human-in-the-loop agent extracting parameters, compartments, and rate equations directly from unstructured text. QSP modeling has historically been a weeks-long manual reading job before a single simulation runs. Collapses one of the slowest handoffs in mechanistic pharmacology, moving model-building from artisanal to agent-assisted.

Nº 03 bioRxiv Field report

LLMs guess synthetic lethals cold

Zero-shot reasoning reproduces CRISPR-screen synthetic lethal predictions using open-weights LLMs with no fine-tuning and no screen data — just gene-pair prompts. The reproductions aren't perfect, but they recover known hits well above chance. Reopens a debate we've tracked over how much functional-genomics signal is already latent in pretraining corpora, and whether expensive screens are validating LLM priors as often as they're discovering new biology.

Also Filed · Three Briefs from the queue

Nº 04 arXiv Field report

Steerable molecule editing

SLIM steers molecular edits through sparse latent directions in an LLM, letting chemists nudge generated molecules toward specific properties (solubility, logP, toxicity flags) without retraining. Moves property-directed generation from black-box sampling toward interpretable knobs — narrowing the gap between generative chemistry and the medicinal-chemistry review it has to survive.

Read →

Nº 05 Anthropic Field report

Claude Opus 4.7 ships

Anthropic released Claude Opus 4.7 with longer-horizon agent work, self-verification before reporting back, and file-system memory across sessions. The system card discloses bio evals — LAB-Bench, VCT, WMDP-Bio, GPQA-Bio — without naming training corpora. Raises the floor for what a frontier agent should ship; biology evals are now standard disclosure even when training data isn't.

Read →

Nº 06 Hacker News Agents · Infrastructure

Reproducible tests for browser agents

Resurf open-sourced a test framework that records realistic browser sessions and replays them deterministically against AI agents — closing the reproducibility hole that has made browser-agent evaluation a coin flip. Relevant wherever agents drive web-based lab tools, ELNs, or public bio databases, where flaky tests have masked real regressions.

Read →

Reply with your discoveries. A human reads them. Forward freely.

Agentic Discovery · Nº Thirteen · 12 May 2026

Editor's Note

Tuesday's haul: a real virtual-cell benchmark lands, agents start reading QSP papers, and zero-shot LLMs guess CRISPR hits without ever seeing a screen.

Nº 01 · The Lede — arXiv — Cell biology · Funding

Virtual-cell benchmark gets real

Fig. I arXiv, 12 May 2026.

Read the source →

Why it matters

Virtual-cell AI has been pitched on vibes and cherry-picked demos for two years; AssayBench drops a falsifiable target into that conversation and resets what counts as evidence in a space CZ Biohub is funding at the half-billion-dollar level.

Nº 02 — bioRxiv — Agents · Infrastructure

Fig. II bioRxiv, 12 May 2026.

Agents pull QSP from papers

Nº 03 — bioRxiv — Field report

Fig. III bioRxiv, 12 May 2026.

LLMs guess synthetic lethals cold

Also Filed · Three Briefs from the queue

Nº 04 — arXiv — Field report

Steerable molecule editing

Read →

Nº 05 — Anthropic — Field report

Claude Opus 4.7 ships

Read →

Nº 06 — Hacker News — Agents · Infrastructure

Reproducible tests for browser agents

Read →

· · ·

Reply with your discoveries. A human reads them. Forward freely.