6 min read

Agents start curating the omics stack

Agents start curating the omics stack
Nº 01 · The Lede bioRxiv Agents · Infrastructure

Agent curates a decade of spatial omics

Agent curates a decade of spatial omics
Fig. IbioRxiv · Filed 01 Jun 2026.

SpatialDataAgent ingests ten years of spatial-omics studies autonomously, harmonizing metadata, tissue ontologies, and assay types across hundreds of datasets without hand-curation. The agent pairs an LLM planner with structured extraction tools, hitting curation accuracy competitive with expert annotators on held-out benchmarks. Spatial omics — transcriptomics and proteomics measured with spatial coordinates intact — has been bottlenecked for years by inconsistent metadata, with most pan-study analyses dying at the harmonization step. Moves agentic curation from toy demos to decade-scale corpora, the first credible signal that the metadata wall in multi-omics integration is breachable.

Read the source

Protein LM predicts host-pathogen binding
Fig. IIbioRxiv · Filed 01 Jun 2026.
Nº 02 bioRxiv Structural biology · Protein design

Protein LM predicts host-pathogen binding

Proteome-scale language model predicts host-pathogen protein interactions directly from sequence, without docking or co-evolution signals. The model generalizes across viral and bacterial pathogens on held-out species, including zoonotic pairs with no training overlap. Pushes protein language models from single-protein property prediction into cross-organism interaction inference — the same structural shift AlphaFold made into pairwise interaction calls — opening a tractable computational front on outbreak preparedness where wet-lab interactome mapping has always been the bottleneck.

Read more
FHIR benchmark stress-tests clinical reasoning
Fig. IIIarXiv · Filed 01 Jun 2026.
Nº 03 arXiv Clinical AI · Evaluation

FHIR benchmark stress-tests clinical reasoning

MedCase-Structured converts clinical vignettes into FHIR (the HL7 standard for structured EHR data), then asks LLMs to diagnose from the structured record rather than free text. Accuracy drops sharply versus narrative inputs, exposing how much current clinical-LLM performance leans on prose framing. Anchors a new reference benchmark for clinical reasoning under realistic EHR conditions — vendor claims of 'GPT-4 passes USMLE' lose force once the input looks like what hospitals actually store.

Read more
Also Filed · Three Briefs from the queue
Nº 04 arXiv Field report

Hypothesis generation under partial info

ProjectionBench tests whether LLMs can generate scientific hypotheses as information is progressively disclosed, mimicking how working scientists update mid-investigation. Frontier models often anchor on early evidence and fail to revise. Establishes hypothesis revision — not one-shot generation — as the relevant capability for autonomous discovery agents, complicating story 1's optimism about agents running long-horizon scientific work.

Read
Nº 05 OpenAI Field report

Boston Children's logs 40+ rare diagnoses

Boston Children's Hospital credits OpenAI-powered tooling with surfacing 40+ rare-disease diagnoses that had escaped standard workups. Moves rare-disease AI from conference posters to named institutional deployment, giving pediatric genetics a concrete reference site competitors and payers will now point to.

Read
Nº 06 OpenAI Computational biology

OpenAI opens biodefense access tier

OpenAI launched Rosalind Biodefense, a vetted-access program giving U.S. government partners and screened developers a biosecurity-tuned model variant. Formalizes a two-tier access model for frontier bio capabilities — the debate over who gets the dangerous-good models now has a working precedent rather than a hypothetical.

Read

Reply with your discoveries. A human reads them. Forward freely.

Agentic Discovery  ·  Nº 27  ·  01 Jun 2026

Editor's Note

Monday opens with agents reaching for the unglamorous middle of biology: curation, ontologies, FHIR — the plumbing that decides what AI actually gets to learn from.

 

Nº 01 · The Lede  —  bioRxiv  —  Agents · Infrastructure

Agent curates a decade of spatial omics

Agent curates a decade of spatial omics

Fig. I  bioRxiv · Filed 01 Jun 2026.

SpatialDataAgent ingests ten years of spatial-omics studies autonomously, harmonizing metadata, tissue ontologies, and assay types across hundreds of datasets without hand-curation. The agent pairs an LLM planner with structured extraction tools, hitting curation accuracy competitive with expert annotators on held-out benchmarks. Spatial omics — transcriptomics and proteomics measured with spatial coordinates intact — has been bottlenecked for years by inconsistent metadata, with most pan-study analyses dying at the harmonization step. Moves agentic curation from toy demos to decade-scale corpora, the first credible signal that the metadata wall in multi-omics integration is breachable.

Read the source →

Why it matters

Curation has been the silent ceiling on every cross-study omics ambition — atlas projects, foundation models, meta-analyses all hit the same wall. A working autonomous curator at this scale resets what's affordable to attempt and forces foundation-model groups to answer why their training corpora still stop at whatever a postdoc had time to clean.

 

Nº 02  —  bioRxiv  —  Structural biology · Protein design

Protein LM predicts host-pathogen binding

Fig. II  bioRxiv · Filed 01 Jun 2026.

Protein LM predicts host-pathogen binding

Proteome-scale language model predicts host-pathogen protein interactions directly from sequence, without docking or co-evolution signals. The model generalizes across viral and bacterial pathogens on held-out species, including zoonotic pairs with no training overlap. Pushes protein language models from single-protein property prediction into cross-organism interaction inference — the same structural shift AlphaFold made into pairwise interaction calls — opening a tractable computational front on outbreak preparedness where wet-lab interactome mapping has always been the bottleneck.

Read more →

 

Nº 03  —  arXiv  —  Clinical AI · Evaluation

FHIR benchmark stress-tests clinical reasoning

Fig. III  arXiv · Filed 01 Jun 2026.

FHIR benchmark stress-tests clinical reasoning

MedCase-Structured converts clinical vignettes into FHIR (the HL7 standard for structured EHR data), then asks LLMs to diagnose from the structured record rather than free text. Accuracy drops sharply versus narrative inputs, exposing how much current clinical-LLM performance leans on prose framing. Anchors a new reference benchmark for clinical reasoning under realistic EHR conditions — vendor claims of 'GPT-4 passes USMLE' lose force once the input looks like what hospitals actually store.

Read more →

 

Also Filed  ·  Three Briefs from the queue

Nº 04  —  arXiv  —  Field report

Hypothesis generation under partial info

ProjectionBench tests whether LLMs can generate scientific hypotheses as information is progressively disclosed, mimicking how working scientists update mid-investigation. Frontier models often anchor on early evidence and fail to revise. Establishes hypothesis revision — not one-shot generation — as the relevant capability for autonomous discovery agents, complicating story 1's optimism about agents running long-horizon scientific work.

Read →

Nº 05  —  OpenAI  —  Field report

Boston Children's logs 40+ rare diagnoses

Boston Children's Hospital credits OpenAI-powered tooling with surfacing 40+ rare-disease diagnoses that had escaped standard workups. Moves rare-disease AI from conference posters to named institutional deployment, giving pediatric genetics a concrete reference site competitors and payers will now point to.

Read →

Nº 06  —  OpenAI  —  Computational biology

OpenAI opens biodefense access tier

OpenAI launched Rosalind Biodefense, a vetted-access program giving U.S. government partners and screened developers a biosecurity-tuned model variant. Formalizes a two-tier access model for frontier bio capabilities — the debate over who gets the dangerous-good models now has a working precedent rather than a hypothetical.

Read →

 

· · ·

Reply with your discoveries. A human reads them. Forward freely.