Agents tackle biology's hardest problems
Good Morning!
Monday morning: a $500M cell-model bet, a Claude bioinformatics benchmark, and a clean argument for where agents actually earn their keep in the lab.
1. Agents excel at verifiable bio tasks

Agentic systems reliably solve well-scoped, verifiable computational biology problems — sequence alignment, variant calling, metabolic flux analysis — but struggle where ground truth is ambiguous or experimental validation is slow, according to a bioRxiv preprint from Nair et al. The finding maps a practical frontier for bioinformatics teams building autonomous pipelines: a decision rule for which tasks to hand off now versus which still need a human in the loop. The deciding variable is not model size. It is problem verifiability — whether a correct answer can be checked automatically at run time. Read More →
Why it matters: Any group evaluating agents for genomic or structural biology workflows now has a principled framing for scoping those deployments. Start with problems that have a checkable answer, and treat ambiguous interpretation tasks as out of scope until evaluation methods catch up.
2. Claude benchmarked on expert bioinformatics
Anthropic's BioMysteryBench is 99 expert-written bioinformatics puzzles with experimentally validated ground truth. Claude's latest models match human experts on solvable problems and crack roughly 30% of cases that stumped specialist panels. For groups building AI-assisted genomics or structural-biology workflows, the benchmark gives you a concrete performance floor to demand from any vendor rather than accepting self-reported accuracy claims. Answers are experimentally anchored, which makes the dataset harder to game than text-prediction benchmarks — a direct parallel to the verifiability argument Nair et al. make. Read More →
3. Biohub bets $500M on cell AI

Chan Zuckerberg Biohub is committing $500M over five years to build predictive AI models of the human cell, targeting an order-of-magnitude scale-up past today's roughly one-billion-cell datasets. Alex Rives leads the effort, with Nvidia, the Allen Institute, the Human Cell Atlas, and the Human Protein Atlas all signed on as partners. The split: $400M for internal work, $100M for external collaborators. At that funding mass, the project will reset compute and data expectations for cell-scale biology across the field. Read More →
4. LLMs cluster SARS-CoV-2 sequences unsupervised

No labels, no problem: a new bioRxiv preprint shows that unsupervised LLM embeddings separate SARS-CoV-2 protein sequence variants without any labeled training data. Protein language model representations carry enough evolutionary signal to power phylogenetic and variant-surveillance workflows where curated labels are scarce or lagged — which describes most outbreak situations. Read More →
5. Agent sandbox gains checkpoint-restore

Crab now supports semantics-aware checkpointing in agent sandboxes, the isolated execution environments where agents run code or call tools. Long-running jobs can pause, be inspected, and resume mid-task rather than restart from scratch. Multi-step bioinformatics pipelines and docking sweeps that routinely fail late in execution are the obvious beneficiaries — fewer wasted compute cycles on retries that start at zero. Read More →
6. Live benchmark tracks real-world agent drift

Static benchmarks go stale. Claw-Eval-Live addresses that by continuously updating agent tasks against real-world workflow changes. Clinical and bioinformatics deployments are the target use case: underlying APIs, schemas, and data formats shift on production timelines, and fixed test sets stop reflecting actual failure modes well before anyone notices. Read More →
Reply with your discoveries. A human reads them. Forward freely.