10 Jun 2026 5 min read

Biosecurity becomes the agent benchmark

Nº 01 · The Lede arXiv Agents · Infrastructure

ABC-Bench scores bio-risk agents

ABC-Bench arrives as the first agentic bio-capabilities benchmark built specifically for biosecurity — measuring whether autonomous agents can plan, retrieve, and execute the multi-step tasks that constitute uplift risk. Where past evaluations stuck to text-only Q&A on virology trivia, ABC-Bench scores agents on tool-using workflows: protocol retrieval, reagent sourcing, troubleshooting loops. The framework gives policymakers and frontier labs a shared yardstick where there was only vibes-based assessment before.

Read the source →

Nº 02 X Computational biology

AI erodes tacit bio knowledge

A new essay argues AI is converting biology and chemistry's tacit knowledge — the bench intuition that used to gate dangerous capability — into explicit, transferable instructions. The piece reframes the biosecurity debate around knowledge type rather than information access, raising the stakes for the kind of agent evaluation story 1 above just operationalized.

Nº 03 arXiv Field report

LLMs check stroke care

LLM-orchestrated conformance checking evaluates stroke-care decisions against clinical guidelines that were never formalized as computer-interpretable rules — the usual prerequisite for automated audit. The approach lets agents reason directly over natural-language guideline text, opening guideline-adherence monitoring to the ~90% of clinical protocols that never got machine-readable versions.

Also Filed · Three Briefs from the queue

Nº 04 bioRxiv Field report

LLMs read metabolic models

A comprehensive evaluation tests whether LLMs can interpret genome-scale metabolic models for metabolic engineering — flux balance analysis, reaction essentiality, pathway design. Results map where current models help versus mislead — extending earlier work wiring agents to GEMs for hypothesis testing — toward a clear picture of LLM-assisted strain design.

Read →

Nº 05 bioRxiv Field report

SLiMNet finds linear motifs

SLiMNet uses protein-language-model embeddings plus paired inputs to detect short linear motifs — the disordered-region binding sites that drive signaling and have long resisted sequence-only prediction. Extends PLM utility from structured domains into the intrinsically disordered fraction of the proteome.

Read →

Nº 06 X Field report

Self-evolving AI scientists claim

An X thread flagged a category-theoretic framework letting AI systems rewrite their own reasoning rules, pitched as self-evolving AI scientists. Claim is strong, peer review is absent — file under watch-this-space until the math meets a benchmark.

Read →

Reply with your discoveries. A human reads them. Forward freely.

Agentic Discovery · Nº 33 · 10 Jun 2026

Editor's Note

Today: biosecurity moves from worry to measurable, and a stroke-care LLM tackles guidelines that nobody bothered to formalize.

Nº 01 · The Lede — arXiv — Agents · Infrastructure

ABC-Bench scores bio-risk agents

Fig. I arXiv · Filed 10 Jun 2026.

Read the source →

Why it matters

Biosecurity evaluation now has a reference benchmark agents can be scored against — collapsing the 'trust us, we red-teamed it' posture that has dominated frontier-lab safety claims into something auditable.

Nº 02 — X — Computational biology

Fig. II X · Filed 10 Jun 2026.

AI erodes tacit bio knowledge

Nº 03 — arXiv — Field report

Fig. III arXiv · Filed 10 Jun 2026.

LLMs check stroke care

Also Filed · Three Briefs from the queue

Nº 04 — bioRxiv — Field report

LLMs read metabolic models

Read →

Nº 05 — bioRxiv — Field report

SLiMNet finds linear motifs

Read →

Nº 06 — X — Field report

Self-evolving AI scientists claim

Read →

· · ·

Reply with your discoveries. A human reads them. Forward freely.