18 May 2026 4 min read

Open LLMs run the lab bench

Nº 01 · The Lede bioRxiv Field report

Open LLMs tested as lab orchestrators

Open-weight LLMs evaluated as agentic orchestrators for routine biomedical analysis in a new bioRxiv preprint from the Galaxy team, benchmarking models a typical lab can actually self-host against the closed frontier on multi-step pipeline planning. The work scores how reliably each model chains tool calls, recovers from errors, and produces analyses a bench scientist would accept. Results map which open checkpoints clear the bar today and where they still drop calls. Sets the first concrete reference point for self-hosted agentic analysis in biology — the debate over whether labs need frontier API access for orchestration now has numbers attached.

Read the source →

Nº 02 bioRxiv Field report

Physics scoring rescues AI binders

Statistical physics scoring filters out hallucinated protein binders from generative AI pipelines, using a zero-shot ensemble approach that needs no task-specific training. The method flags designs that look plausible to the generator but fail thermodynamic sanity checks. Moves AI-designed binder workflows closer to deployment-viable by attacking the false-positive rate that has dogged every public benchmark so far.

Nº 03 arXiv Field report

LLMs describe monkey visual neurons

Language models characterize what individual monkey visual neurons respond to, generating natural-language descriptions of tuning properties directly from neural recordings. The pipeline turns hours of electrophysiology interpretation into automated captions a neuroscientist can read. Narrows the gap between raw recording data and shareable functional annotation — a workflow that has resisted automation for decades.

Also Filed · One Brief from the queue

Nº 04 arXiv Benchmarks · Evaluation

Multi-hop disease reasoning benchmark drops

MedHopQA tests multi-hop biomedical reasoning on disease-centered questions that require chaining facts across sources, going beyond single-lookup QA benchmarks. Anchors a tougher reference point for LLM-based clinical question answering, where most published scores still come from the kind of one-hop retrieval that flatters the models — a pattern BiomniBench exposed last week.

Read →

Reply with your discoveries. A human reads them. Forward freely.

Agentic Discovery · Nº 17 · 18 May 2026

Editor's Note

Monday opens with a quiet but consequential question: can the open-weight models running in your own basement actually orchestrate real biology yet?

Nº 01 · The Lede — bioRxiv — Field report

Open LLMs tested as lab orchestrators

Fig. I bioRxiv · Filed 18 May 2026.

Read the source →

Why it matters

Self-hosting agentic biology shifts from aspiration to a measurable gap; vendors pitching closed-model dependence for lab orchestration now have a published yardstick working against them.

Nº 02 — bioRxiv — Field report

Fig. II bioRxiv · Filed 18 May 2026.

Physics scoring rescues AI binders

Nº 03 — arXiv — Field report

Fig. III arXiv · Filed 18 May 2026.