6 min read

Microbial genomes get a foundation model

Microbial genomes get a foundation model
Nº 01 · The Lede bioRxiv Computational biology

Foundation model for the microbiome

Foundation model for the microbiome
Fig. IbioRxiv · Filed 25 May 2026.

Genos-m trains a foundation model on human-associated microbial genomes, extending the protein-language-model playbook to the bacterial DNA that lives in and on us. The bioRxiv preprint pitches Genos-m as a general-purpose backbone for downstream microbiome tasks — strain identification, gene-function prediction, host-association — that currently require bespoke pipelines per question. Microbiome work has lagged the foundation-model wave largely because reference databases are messier than UniProt; Genos-m is the first serious attempt to absorb that mess into pretraining weights.

Read the source

Transformer for single-cell multiomics
Fig. IIbioRxiv · Filed 25 May 2026.
Nº 02 bioRxiv Cell biology · Funding

Transformer for single-cell multiomics

scDynOmics applies an optimized transformer to joint single-cell RNA and ATAC data, learning shared representations across modalities rather than stitching them post-hoc. The architecture targets a long-standing pain point: multiomic integration has been the domain of bespoke graph methods and VAEs that don't transfer between datasets. Moves single-cell multiomics one step closer to the plug-and-play embedding model that scRNA-seq alone already has.

Read more
Molecular plugins for LLMs
Fig. IIIarXiv · Filed 25 May 2026.
Nº 03 arXiv Field report

Molecular plugins for LLMs

SciCore-Mol bolts molecular cognition modules onto general LLMs — small specialist networks that handle SMILES parsing, property prediction, and reaction logic, swapped in via adapters rather than rebaked into pretraining. The approach narrows the gap between general-purpose chat models and chemistry-native tools, and raises the question of whether every scientific domain ends up shipping a pluggable cognition layer instead of a full domain model.

Read more
Also Filed · Four Briefs from the queue
Nº 04 arXiv Field report

Generative re-ranking for entity linking

BeLink pairs biomedical entity linking with a generative re-ranker, using an LLM to break ties that retrieval-only systems get wrong on rare gene and disease mentions. Pushes biomedical NER closer to the accuracy floor clinical and curation workflows actually need before they'll let an agent touch records.

Read
Nº 05 Anthropic Field report

Anthropic updates Project Glasswing

Anthropic posted an update on Project Glasswing, its interpretability-meets-safety research program, sharing early findings on what mechanistic analysis catches that black-box evals miss. Adjacent to a broader push we've tracked that has interpretability moving from research curiosity to a vendor checkbox for high-stakes deployments — including biomedical agents touching patient data.

Read
Nº 06 OpenAI Field report

AdventHealth deploys ChatGPT in clinics

AdventHealth rolled out ChatGPT for Healthcare across its system to handle documentation and admin load, OpenAI announced. Signals that LLM deployment inside large hospital networks has moved past pilot phase — the kind of footprint that starts shaping which AI vendors clinical IT departments default to.

Read
Nº 07 Axios Field report

Karpathy joins Anthropic

Andrej Karpathy joined Anthropic's pre-training team, leaving a quiet post-OpenAI stretch to work on Claude's core training runs. Concentrates more of the field's top pretraining talent at the lab whose models biomedical agent builders increasingly default to.

Read

Reply with your discoveries. A human reads them. Forward freely.

Agentic Discovery  ·  Nº 22  ·  25 May 2026

Editor's Note

Monday open: a foundation model for the gut microbiome lands, single-cell multiomics gets a transformer makeover, and Karpathy switches jerseys.

 

Nº 01 · The Lede  —  bioRxiv  —  Computational biology

Foundation model for the microbiome

Foundation model for the microbiome

Fig. I  bioRxiv · Filed 25 May 2026.

Genos-m trains a foundation model on human-associated microbial genomes, extending the protein-language-model playbook to the bacterial DNA that lives in and on us. The bioRxiv preprint pitches Genos-m as a general-purpose backbone for downstream microbiome tasks — strain identification, gene-function prediction, host-association — that currently require bespoke pipelines per question. Microbiome work has lagged the foundation-model wave largely because reference databases are messier than UniProt; Genos-m is the first serious attempt to absorb that mess into pretraining weights.

Read the source →

Why it matters

Resets the reference architecture for microbiome AI — what was a patchwork of task-specific classifiers now has a candidate backbone to fine-tune against, the same shift that protein-language models triggered for sequence biology three years ago.

 

Nº 02  —  bioRxiv  —  Cell biology · Funding

Transformer for single-cell multiomics

Fig. II  bioRxiv · Filed 25 May 2026.

Transformer for single-cell multiomics

scDynOmics applies an optimized transformer to joint single-cell RNA and ATAC data, learning shared representations across modalities rather than stitching them post-hoc. The architecture targets a long-standing pain point: multiomic integration has been the domain of bespoke graph methods and VAEs that don't transfer between datasets. Moves single-cell multiomics one step closer to the plug-and-play embedding model that scRNA-seq alone already has.

Read more →

 

Nº 03  —  arXiv  —  Field report

Molecular plugins for LLMs

Fig. III  arXiv · Filed 25 May 2026.

Molecular plugins for LLMs

SciCore-Mol bolts molecular cognition modules onto general LLMs — small specialist networks that handle SMILES parsing, property prediction, and reaction logic, swapped in via adapters rather than rebaked into pretraining. The approach narrows the gap between general-purpose chat models and chemistry-native tools, and raises the question of whether every scientific domain ends up shipping a pluggable cognition layer instead of a full domain model.

Read more →

 

Also Filed  ·  Four Briefs from the queue

Nº 04  —  arXiv  —  Field report

Generative re-ranking for entity linking

BeLink pairs biomedical entity linking with a generative re-ranker, using an LLM to break ties that retrieval-only systems get wrong on rare gene and disease mentions. Pushes biomedical NER closer to the accuracy floor clinical and curation workflows actually need before they'll let an agent touch records.

Read →

Nº 05  —  Anthropic  —  Field report

Anthropic updates Project Glasswing

Anthropic posted an update on Project Glasswing, its interpretability-meets-safety research program, sharing early findings on what mechanistic analysis catches that black-box evals miss. Adjacent to a broader push we've tracked that has interpretability moving from research curiosity to a vendor checkbox for high-stakes deployments — including biomedical agents touching patient data.

Read →

Nº 06  —  OpenAI  —  Field report

AdventHealth deploys ChatGPT in clinics

AdventHealth rolled out ChatGPT for Healthcare across its system to handle documentation and admin load, OpenAI announced. Signals that LLM deployment inside large hospital networks has moved past pilot phase — the kind of footprint that starts shaping which AI vendors clinical IT departments default to.

Read →

Nº 07  —  Axios  —  Field report

Karpathy joins Anthropic

Andrej Karpathy joined Anthropic's pre-training team, leaving a quiet post-OpenAI stretch to work on Claude's core training runs. Concentrates more of the field's top pretraining talent at the lab whose models biomedical agent builders increasingly default to.

Read →

 

· · ·

Reply with your discoveries. A human reads them. Forward freely.