09 Jun 2026 5 min read

Biology breaks frontier agents

Nº 01 · The Lede Anthropic Agents · Infrastructure

Anthropic maps biology agent gap

Anthropic published a research note arguing biology is the hardest agentic domain yet — tasks that look like simple database retrievals collapse into nondeterminism when run by frontier models. The post pairs with a public benchmark showing Claude Sonnet 4 returning 106, 15, then 5 viral sequences from the same NCBI query across three runs. It frames why coding-agent playbooks don't transfer cleanly to wet-adjacent work.

Read the source →

Also discussed on X.

Nº 02 X Field report

NCBI retrieval test breaks Claude

Same NCBI query, three runs, three answers: 106 viral sequences, then 15, then 5. Bo Wang's thread surfaced the Anthropic result that's now circulating as the cleanest demonstration yet of agent nondeterminism on a task that should be deterministic. Tied directly to the Anthropic post in #1, but the retrieval failure mode — not reasoning, not tool use, just fetching records — is what's resetting expectations about where the floor actually sits.

Nº 03 X Structural biology · Protein design

Vermeer predicts protein localization

Vermeer generates microscopy images autoregressively to predict where proteins localize in cells, a Microsoft Research and Insitro collaboration posted to bioRxiv. The model treats microscopy as a generative target rather than a classification input — moves protein-localization prediction from labeled-dataset bottlenecks toward image-native foundation models, where the training signal is the pixel itself.

Also Filed · Four Briefs from the queue

Nº 04 bioRxiv Field report

Off-target foundation model

A drug-target specificity foundation model predicts off-target binding across the proteome, with the same weights doing repurposing and generative design. Raises the floor on what a single specificity model is expected to cover — separate off-target, repurposing, and de novo pipelines start looking redundant.

Read →

Nº 05 arXiv Field report

AI scientists rely on private data

Drug-asset valuation agents lose most of their edge when stripped of proprietary datasets, a stratified ablation finds. Reasoning skill alone doesn't carry the task — evidence access does. Reframes the AI-scientist debate: the differentiator is data licensing, not model choice.

Read →

Nº 06 arXiv Field report

Self-reflective molecular design loop

An LLM molecular-design system closes the prior-posterior loop by analyzing its own generated candidates and revising the next batch. Moves iterative molecule generation past one-shot prompting toward something closer to a working design-build-test cycle inside the model.

Read →

Nº 07 bioRxiv Cell biology · Funding

In-context learning for single cells

Stack does in-context learning on single-cell data — few-shot conditioning rather than fine-tuning per dataset. Lowers the friction tax on adapting foundation models to new scRNA-seq cohorts.

Read →

Reply with your discoveries. A human reads them. Forward freely.

Agentic Discovery · Nº 32 · 09 Jun 2026

Editor's Note

Today's theme: frontier models keep tripping on the same biology bench that vision and code mastered years ago.

Nº 01 · The Lede — Anthropic — Agents · Infrastructure

Anthropic maps biology agent gap

Fig. I Anthropic · Filed 09 Jun 2026.

Read the source →

Why it matters

Anchors a reference argument that biology is its own frontier — not a coding-agent transfer problem — which reshapes which benchmarks vendors have to clear before claiming bio-readiness.

Nº 02 — X — Field report

Fig. II X · Filed 09 Jun 2026.

NCBI retrieval test breaks Claude

Nº 03 — X — Structural biology · Protein design

Fig. III X · Filed 09 Jun 2026.

Vermeer predicts protein localization

Also Filed · Four Briefs from the queue

Nº 04 — bioRxiv — Field report

Off-target foundation model

Read →

Nº 05 — arXiv — Field report

AI scientists rely on private data

Read →

Nº 06 — arXiv — Field report

Self-reflective molecular design loop

Read →

Nº 07 — bioRxiv — Cell biology · Funding

In-context learning for single cells

Stack does in-context learning on single-cell data — few-shot conditioning rather than fine-tuning per dataset. Lowers the friction tax on adapting foundation models to new scRNA-seq cohorts.

Read →

· · ·

Reply with your discoveries. A human reads them. Forward freely.