3 min read

Agents tackle biology's hardest problems

Good Morning!

Monday morning: a $500M cell-model bet, a Claude bioinformatics benchmark, and a clean argument for where agents actually earn their keep in the lab.


1. Agents excel at verifiable bio tasks

Agents excel at verifiable bio tasks

Agentic systems reliably solve well-scoped, verifiable computational biology problems — sequence alignment, variant calling, metabolic flux analysis — but struggle where ground truth is ambiguous or experimental validation is slow, according to a bioRxiv preprint from Nair et al. The finding maps a practical frontier for bioinformatics teams building autonomous pipelines: a decision rule for which tasks to hand off now versus which still need a human in the loop. The deciding variable is not model size. It is problem verifiability — whether a correct answer can be checked automatically at run time. Read More →

Why it matters: Any group evaluating agents for genomic or structural biology workflows now has a principled framing for scoping those deployments. Start with problems that have a checkable answer, and treat ambiguous interpretation tasks as out of scope until evaluation methods catch up.


2. Claude benchmarked on expert bioinformatics

Claude benchmarked on expert bioinformatics

Anthropic's BioMysteryBench is 99 expert-written bioinformatics puzzles with experimentally validated ground truth. Claude's latest models match human experts on solvable problems and crack roughly 30% of cases that stumped specialist panels. For groups building AI-assisted genomics or structural-biology workflows, the benchmark gives you a concrete performance floor to demand from any vendor rather than accepting self-reported accuracy claims. Answers are experimentally anchored, which makes the dataset harder to game than text-prediction benchmarks — a direct parallel to the verifiability argument Nair et al. make. Read More →


3. Biohub bets $500M on cell AI

Biohub bets $500M on cell AI

Chan Zuckerberg Biohub is committing $500M over five years to build predictive AI models of the human cell, targeting an order-of-magnitude scale-up past today's roughly one-billion-cell datasets. Alex Rives leads the effort, with Nvidia, the Allen Institute, the Human Cell Atlas, and the Human Protein Atlas all signed on as partners. The split: $400M for internal work, $100M for external collaborators. At that funding mass, the project will reset compute and data expectations for cell-scale biology across the field. Read More →


4. LLMs cluster SARS-CoV-2 sequences unsupervised

LLMs cluster SARS-CoV-2 sequences unsupervised

No labels, no problem: a new bioRxiv preprint shows that unsupervised LLM embeddings separate SARS-CoV-2 protein sequence variants without any labeled training data. Protein language model representations carry enough evolutionary signal to power phylogenetic and variant-surveillance workflows where curated labels are scarce or lagged — which describes most outbreak situations. Read More →


5. Agent sandbox gains checkpoint-restore

Agent sandbox gains checkpoint-restore

Crab now supports semantics-aware checkpointing in agent sandboxes, the isolated execution environments where agents run code or call tools. Long-running jobs can pause, be inspected, and resume mid-task rather than restart from scratch. Multi-step bioinformatics pipelines and docking sweeps that routinely fail late in execution are the obvious beneficiaries — fewer wasted compute cycles on retries that start at zero. Read More →


6. Live benchmark tracks real-world agent drift

Live benchmark tracks real-world agent drift

Static benchmarks go stale. Claw-Eval-Live addresses that by continuously updating agent tasks against real-world workflow changes. Clinical and bioinformatics deployments are the target use case: underlying APIs, schemas, and data formats shift on production timelines, and fixed test sets stop reflecting actual failure modes well before anyone notices. Read More →


Reply with your discoveries. A human reads them. Forward freely.