5 min read

Biosecurity becomes the agent benchmark

Biosecurity becomes the agent benchmark
Nº 01 · The Lede arXiv Agents · Infrastructure

ABC-Bench scores bio-risk agents

ABC-Bench scores bio-risk agents
Fig. IarXiv · Filed 10 Jun 2026.

ABC-Bench arrives as the first agentic bio-capabilities benchmark built specifically for biosecurity — measuring whether autonomous agents can plan, retrieve, and execute the multi-step tasks that constitute uplift risk. Where past evaluations stuck to text-only Q&A on virology trivia, ABC-Bench scores agents on tool-using workflows: protocol retrieval, reagent sourcing, troubleshooting loops. The framework gives policymakers and frontier labs a shared yardstick where there was only vibes-based assessment before.

Read the source

AI erodes tacit bio knowledge
Fig. IIX · Filed 10 Jun 2026.
Nº 02 X Computational biology

AI erodes tacit bio knowledge

A new essay argues AI is converting biology and chemistry's tacit knowledge — the bench intuition that used to gate dangerous capability — into explicit, transferable instructions. The piece reframes the biosecurity debate around knowledge type rather than information access, raising the stakes for the kind of agent evaluation story 1 above just operationalized.

Read more
LLMs check stroke care
Fig. IIIarXiv · Filed 10 Jun 2026.
Nº 03 arXiv Field report

LLMs check stroke care

LLM-orchestrated conformance checking evaluates stroke-care decisions against clinical guidelines that were never formalized as computer-interpretable rules — the usual prerequisite for automated audit. The approach lets agents reason directly over natural-language guideline text, opening guideline-adherence monitoring to the ~90% of clinical protocols that never got machine-readable versions.

Read more
Also Filed · Three Briefs from the queue
Nº 04 bioRxiv Field report

LLMs read metabolic models

A comprehensive evaluation tests whether LLMs can interpret genome-scale metabolic models for metabolic engineering — flux balance analysis, reaction essentiality, pathway design. Results map where current models help versus mislead — extending earlier work wiring agents to GEMs for hypothesis testing — toward a clear picture of LLM-assisted strain design.

Read
Nº 05 bioRxiv Field report

SLiMNet finds linear motifs

SLiMNet uses protein-language-model embeddings plus paired inputs to detect short linear motifs — the disordered-region binding sites that drive signaling and have long resisted sequence-only prediction. Extends PLM utility from structured domains into the intrinsically disordered fraction of the proteome.

Read
Nº 06 X Field report

Self-evolving AI scientists claim

An X thread flagged a category-theoretic framework letting AI systems rewrite their own reasoning rules, pitched as self-evolving AI scientists. Claim is strong, peer review is absent — file under watch-this-space until the math meets a benchmark.

Read

Reply with your discoveries. A human reads them. Forward freely.

Agentic Discovery  ·  Nº 33  ·  10 Jun 2026

Editor's Note

Today: biosecurity moves from worry to measurable, and a stroke-care LLM tackles guidelines that nobody bothered to formalize.

 

Nº 01 · The Lede  —  arXiv  —  Agents · Infrastructure

ABC-Bench scores bio-risk agents

ABC-Bench scores bio-risk agents

Fig. I  arXiv · Filed 10 Jun 2026.

ABC-Bench arrives as the first agentic bio-capabilities benchmark built specifically for biosecurity — measuring whether autonomous agents can plan, retrieve, and execute the multi-step tasks that constitute uplift risk. Where past evaluations stuck to text-only Q&A on virology trivia, ABC-Bench scores agents on tool-using workflows: protocol retrieval, reagent sourcing, troubleshooting loops. The framework gives policymakers and frontier labs a shared yardstick where there was only vibes-based assessment before.

Read the source →

Why it matters

Biosecurity evaluation now has a reference benchmark agents can be scored against — collapsing the 'trust us, we red-teamed it' posture that has dominated frontier-lab safety claims into something auditable.

 

Nº 02  —  X  —  Computational biology

AI erodes tacit bio knowledge

Fig. II  X · Filed 10 Jun 2026.

AI erodes tacit bio knowledge

A new essay argues AI is converting biology and chemistry's tacit knowledge — the bench intuition that used to gate dangerous capability — into explicit, transferable instructions. The piece reframes the biosecurity debate around knowledge type rather than information access, raising the stakes for the kind of agent evaluation story 1 above just operationalized.

Read more →

 

Nº 03  —  arXiv  —  Field report

LLMs check stroke care

Fig. III  arXiv · Filed 10 Jun 2026.

LLMs check stroke care

LLM-orchestrated conformance checking evaluates stroke-care decisions against clinical guidelines that were never formalized as computer-interpretable rules — the usual prerequisite for automated audit. The approach lets agents reason directly over natural-language guideline text, opening guideline-adherence monitoring to the ~90% of clinical protocols that never got machine-readable versions.

Read more →

 

Also Filed  ·  Three Briefs from the queue

Nº 04  —  bioRxiv  —  Field report

LLMs read metabolic models

A comprehensive evaluation tests whether LLMs can interpret genome-scale metabolic models for metabolic engineering — flux balance analysis, reaction essentiality, pathway design. Results map where current models help versus mislead — extending earlier work wiring agents to GEMs for hypothesis testing — toward a clear picture of LLM-assisted strain design.

Read →

Nº 05  —  bioRxiv  —  Field report

SLiMNet finds linear motifs

SLiMNet uses protein-language-model embeddings plus paired inputs to detect short linear motifs — the disordered-region binding sites that drive signaling and have long resisted sequence-only prediction. Extends PLM utility from structured domains into the intrinsically disordered fraction of the proteome.

Read →

Nº 06  —  X  —  Field report

Self-evolving AI scientists claim

An X thread flagged a category-theoretic framework letting AI systems rewrite their own reasoning rules, pitched as self-evolving AI scientists. Claim is strong, peer review is absent — file under watch-this-space until the math meets a benchmark.

Read →

 

· · ·

Reply with your discoveries. A human reads them. Forward freely.