5 min read

Biology breaks frontier agents

Biology breaks frontier agents
Nº 01 · The Lede Anthropic Agents · Infrastructure

Anthropic maps biology agent gap

Anthropic maps biology agent gap
Fig. IAnthropic · Filed 09 Jun 2026.

Anthropic published a research note arguing biology is the hardest agentic domain yet — tasks that look like simple database retrievals collapse into nondeterminism when run by frontier models. The post pairs with a public benchmark showing Claude Sonnet 4 returning 106, 15, then 5 viral sequences from the same NCBI query across three runs. It frames why coding-agent playbooks don't transfer cleanly to wet-adjacent work.

Read the source

Also discussed on X.

NCBI retrieval test breaks Claude
Fig. IIX · Filed 09 Jun 2026.
Nº 02 X Field report

NCBI retrieval test breaks Claude

Same NCBI query, three runs, three answers: 106 viral sequences, then 15, then 5. Bo Wang's thread surfaced the Anthropic result that's now circulating as the cleanest demonstration yet of agent nondeterminism on a task that should be deterministic. Tied directly to the Anthropic post in #1, but the retrieval failure mode — not reasoning, not tool use, just fetching records — is what's resetting expectations about where the floor actually sits.

Read more
Vermeer predicts protein localization
Fig. IIIX · Filed 09 Jun 2026.
Nº 03 X Structural biology · Protein design

Vermeer predicts protein localization

Vermeer generates microscopy images autoregressively to predict where proteins localize in cells, a Microsoft Research and Insitro collaboration posted to bioRxiv. The model treats microscopy as a generative target rather than a classification input — moves protein-localization prediction from labeled-dataset bottlenecks toward image-native foundation models, where the training signal is the pixel itself.

Read more
Also Filed · Four Briefs from the queue
Nº 04 bioRxiv Field report

Off-target foundation model

A drug-target specificity foundation model predicts off-target binding across the proteome, with the same weights doing repurposing and generative design. Raises the floor on what a single specificity model is expected to cover — separate off-target, repurposing, and de novo pipelines start looking redundant.

Read
Nº 05 arXiv Field report

AI scientists rely on private data

Drug-asset valuation agents lose most of their edge when stripped of proprietary datasets, a stratified ablation finds. Reasoning skill alone doesn't carry the task — evidence access does. Reframes the AI-scientist debate: the differentiator is data licensing, not model choice.

Read
Nº 06 arXiv Field report

Self-reflective molecular design loop

An LLM molecular-design system closes the prior-posterior loop by analyzing its own generated candidates and revising the next batch. Moves iterative molecule generation past one-shot prompting toward something closer to a working design-build-test cycle inside the model.

Read
Nº 07 bioRxiv Cell biology · Funding

In-context learning for single cells

Stack does in-context learning on single-cell data — few-shot conditioning rather than fine-tuning per dataset. Lowers the friction tax on adapting foundation models to new scRNA-seq cohorts.

Read

Reply with your discoveries. A human reads them. Forward freely.

Agentic Discovery  ·  Nº 32  ·  09 Jun 2026

Editor's Note

Today's theme: frontier models keep tripping on the same biology bench that vision and code mastered years ago.

 

Nº 01 · The Lede  —  Anthropic  —  Agents · Infrastructure

Anthropic maps biology agent gap

Anthropic maps biology agent gap

Fig. I  Anthropic · Filed 09 Jun 2026.

Anthropic published a research note arguing biology is the hardest agentic domain yet — tasks that look like simple database retrievals collapse into nondeterminism when run by frontier models. The post pairs with a public benchmark showing Claude Sonnet 4 returning 106, 15, then 5 viral sequences from the same NCBI query across three runs. It frames why coding-agent playbooks don't transfer cleanly to wet-adjacent work.

Read the source →

Why it matters

Anchors a reference argument that biology is its own frontier — not a coding-agent transfer problem — which reshapes which benchmarks vendors have to clear before claiming bio-readiness.

 

Nº 02  —  X  —  Field report

NCBI retrieval test breaks Claude

Fig. II  X · Filed 09 Jun 2026.

NCBI retrieval test breaks Claude

Same NCBI query, three runs, three answers: 106 viral sequences, then 15, then 5. Bo Wang's thread surfaced the Anthropic result that's now circulating as the cleanest demonstration yet of agent nondeterminism on a task that should be deterministic. Tied directly to the Anthropic post in #1, but the retrieval failure mode — not reasoning, not tool use, just fetching records — is what's resetting expectations about where the floor actually sits.

Read more →

 

Nº 03  —  X  —  Structural biology · Protein design

Vermeer predicts protein localization

Fig. III  X · Filed 09 Jun 2026.

Vermeer predicts protein localization

Vermeer generates microscopy images autoregressively to predict where proteins localize in cells, a Microsoft Research and Insitro collaboration posted to bioRxiv. The model treats microscopy as a generative target rather than a classification input — moves protein-localization prediction from labeled-dataset bottlenecks toward image-native foundation models, where the training signal is the pixel itself.

Read more →

 

Also Filed  ·  Four Briefs from the queue

Nº 04  —  bioRxiv  —  Field report

Off-target foundation model

A drug-target specificity foundation model predicts off-target binding across the proteome, with the same weights doing repurposing and generative design. Raises the floor on what a single specificity model is expected to cover — separate off-target, repurposing, and de novo pipelines start looking redundant.

Read →

Nº 05  —  arXiv  —  Field report

AI scientists rely on private data

Drug-asset valuation agents lose most of their edge when stripped of proprietary datasets, a stratified ablation finds. Reasoning skill alone doesn't carry the task — evidence access does. Reframes the AI-scientist debate: the differentiator is data licensing, not model choice.

Read →

Nº 06  —  arXiv  —  Field report

Self-reflective molecular design loop

An LLM molecular-design system closes the prior-posterior loop by analyzing its own generated candidates and revising the next batch. Moves iterative molecule generation past one-shot prompting toward something closer to a working design-build-test cycle inside the model.

Read →

Nº 07  —  bioRxiv  —  Cell biology · Funding

In-context learning for single cells

Stack does in-context learning on single-cell data — few-shot conditioning rather than fine-tuning per dataset. Lowers the friction tax on adapting foundation models to new scRNA-seq cohorts.

Read →

 

· · ·

Reply with your discoveries. A human reads them. Forward freely.