5 min read

OpenAI ships a biology frontier model

OpenAI ships a biology frontier model
Nº 01 · The Lede OpenAI Field report

OpenAI launches GPT-Rosalind

OpenAI launches GPT-Rosalind
Fig. IOpenAI, 13 May 2026.

OpenAI shipped GPT-Rosalind, a frontier reasoning model tuned for chemistry, protein engineering, and genomics, with trusted-access deployments at Amgen, Moderna, the Allen Institute, and Thermo Fisher. It beats GPT-5.4 on 6 of 11 LABBench2 tasks — the biggest jump comes on CloningQA — and clears the 95th-percentile human-expert score on Dyno's RNA prediction task. A free Codex "Life Sciences research" plugin ships alongside, exposing 50+ scientific tools. The frontier-lab playbook for biology is now explicit: train a domain reasoning model, wire it to lab-relevant tools, partner with named pharma and research institutes.

Read the source

ClaroAI-Bench grades reproducibility agents
Fig. IIbioRxiv, 13 May 2026.
Nº 02 bioRxiv Agents · Infrastructure

ClaroAI-Bench grades reproducibility agents

ClaroAI-Bench scores agents on whether they can actually reproduce results from real biomedical papers — code execution, data wrangling, the full pipeline, not just Q&A. The benchmark joins a thin shelf of evaluations stressing agents on end-to-end science rather than isolated reasoning, and anchors a concrete reference for the reproducibility claims every bio-agent vendor now makes.

Read more
Agents extend PROTAC databases
Fig. IIIarXiv, 13 May 2026.
Nº 03 arXiv Agents · Infrastructure

Agents extend PROTAC databases

Agentic literature extraction augments targeted protein degradation databases past what manual curation can sustain, pulling PROTAC and molecular glue data straight from papers into structured records. Curation throughput, long the binding constraint on degrader chemistry catalogs, stops being the bottleneck — adjacent to the reproducibility framing in #2 above, this is the curation-side counterpart.

Read more
Also Filed · Two Briefs from the queue
Nº 04 arXiv Field report

MolDeTox tests fragment-by-fragment edits

MolDeTox benchmarks LLMs on stepwise fragment editing for molecular detoxification — can a model walk a toxic molecule to a safe analog one substructure at a time? Establishes a checkable yardstick for agentic medchem, where "the model suggested a fix" has been hard to grade objectively.

Read
Nº 05 bioRxiv Field report

Generative platform targets RNA chemistry

A generative chemistry platform for small molecules targeting RNA gets a case study in chemical optimization, pushing generative design past the protein-target comfort zone. RNA-targeted small molecules — a notoriously hard modality — gain a deployable design loop.

Read

Reply with your discoveries. A human reads them. Forward freely.

Agentic Discovery  ·  Nº Fourteen  ·  13 May 2026

Editor's Note

Five releases, one theme: agents are being graded on real biology now, not toy tasks.

 

Nº 01 · The Lede  —  OpenAI  —  Field report

OpenAI launches GPT-Rosalind

OpenAI launches GPT-Rosalind

Fig. I  OpenAI, 13 May 2026.

OpenAI shipped GPT-Rosalind, a frontier reasoning model tuned for chemistry, protein engineering, and genomics, with trusted-access deployments at Amgen, Moderna, the Allen Institute, and Thermo Fisher. It beats GPT-5.4 on 6 of 11 LABBench2 tasks — the biggest jump comes on CloningQA — and clears the 95th-percentile human-expert score on Dyno's RNA prediction task. A free Codex "Life Sciences research" plugin ships alongside, exposing 50+ scientific tools. The frontier-lab playbook for biology is now explicit: train a domain reasoning model, wire it to lab-relevant tools, partner with named pharma and research institutes.

Read the source →

Why it matters

First time a top frontier lab has shipped a biology-specific reasoning model with named pharma deployments on day one — resets what "general-purpose LLM in the lab" means and forces every bio-AI vendor to answer how their stack compares to GPT-Rosalind plus 50 tools.

 

Nº 02  —  bioRxiv  —  Agents · Infrastructure

ClaroAI-Bench grades reproducibility agents

Fig. II  bioRxiv, 13 May 2026.

ClaroAI-Bench grades reproducibility agents

ClaroAI-Bench scores agents on whether they can actually reproduce results from real biomedical papers — code execution, data wrangling, the full pipeline, not just Q&A. The benchmark joins a thin shelf of evaluations stressing agents on end-to-end science rather than isolated reasoning, and anchors a concrete reference for the reproducibility claims every bio-agent vendor now makes.

Read more →

 

Nº 03  —  arXiv  —  Agents · Infrastructure

Agents extend PROTAC databases

Fig. III  arXiv, 13 May 2026.

Agents extend PROTAC databases

Agentic literature extraction augments targeted protein degradation databases past what manual curation can sustain, pulling PROTAC and molecular glue data straight from papers into structured records. Curation throughput, long the binding constraint on degrader chemistry catalogs, stops being the bottleneck — adjacent to the reproducibility framing in #2 above, this is the curation-side counterpart.

Read more →

 

Also Filed  ·  Two Briefs from the queue

Nº 04  —  arXiv  —  Field report

MolDeTox tests fragment-by-fragment edits

MolDeTox benchmarks LLMs on stepwise fragment editing for molecular detoxification — can a model walk a toxic molecule to a safe analog one substructure at a time? Establishes a checkable yardstick for agentic medchem, where "the model suggested a fix" has been hard to grade objectively.

Read →

Nº 05  —  bioRxiv  —  Field report

Generative platform targets RNA chemistry

A generative chemistry platform for small molecules targeting RNA gets a case study in chemical optimization, pushing generative design past the protein-target comfort zone. RNA-targeted small molecules — a notoriously hard modality — gain a deployable design loop.

Read →

 

· · ·

Reply with your discoveries. A human reads them. Forward freely.