2 min read

Can agents read a genome end to end?

Good Morning!

Eight papers walked in; here's what actually moves the needle for builders in biomedical AI today.


1. LAFA benchmarks protein annotation models over time

Phan et al. built a reproducible framework for longitudinal evaluation of protein function annotation models, tracking how model performance drifts as UniProt and Gene Ontology annotations are updated over time. LAFA exposes a chronic blind spot: most benchmark comparisons freeze the annotation database at training time, making published numbers non-comparable across papers and years. The authors release splits, evaluation code, and a leaderboard designed to stay current as ground-truth annotations evolve — a structural fix for a field where benchmark inflation has quietly corrupted model selection for years. Read More →

Why it matters: Protein function annotation is exactly the kind of task where agents make consequential decisions; a framework that catches model decay before deployment changes how teams should think about production monitoring — Heureka Bench addresses this same class of longitudinal evaluation gap for biomedical agents.


2. PLMs crack enzyme function without labels

Penner et al. show that unsupervised protein language models trained purely on sequence recover structured patterns of enzyme function — EC hierarchy, substrate scope, and catalytic site geometry — without any task-specific supervision. The result strengthens the case that large-scale pretraining on raw sequence data encodes functional chemistry at a level that narrows the gap to curated enzyme databases. Read More →


3. ModernGENA resets the DNA model baseline

Aspidova et al. argue that a modernized BERT-style architecture — dubbed ModernGENA — matches or beats recent large DNA foundation models on standard genomics benchmarks while training faster and using less compute. The paper is a pointed challenge to the assumption that scale alone differentiates DNA models, and supplies a rigorous open baseline the field has lacked. Read More →


4. RNABag unifies biopsy types for oncology

Luo et al. released RNABag, a transcriptome foundation model trained to generalize across fresh-frozen, FFPE, and liquid biopsy modalities for precision oncology tasks. Cross-modality generalization has been a persistent failure mode for clinical RNA models. Read More →


5. Agent framework segments capsule endoscopy video

Liu et al. proposed Divide-then-Diagnose, a clinician-inspired agent pipeline that splits ultra-long capsule endoscopy videos into semantically coherent segments before diagnosis, sidestepping the context-length limits that cripple single-pass video models on hour-long GI studies. Read More →


6. CHRep predicts spatial gene expression from histology

Wang et al. introduced CHRep, a cross-modal representation model that predicts spatial gene expression from H&E histology images and adds post-hoc calibration to reduce overconfident predictions — a known reliability problem for spatial transcriptomics proxies. Read More →


7. Dynamic tool gating cuts MCP context overhead

Sadani et al. describe a dynamic tool-gating and lazy schema-loading scheme that reduces the token overhead of large MCP tool registries in agentic workflows — relevant for any bio pipeline hitting context limits with dozens of registered tools. Read More →


8. Argument for epistemic guardrails in agentic coding

Palmblad argues that agentic AI coding assistants in scientific software development should encode epistemic provenance — linking generated code to the assumptions and data sources that justify it — before bad practices calcify. Read More →


Reply to talk back — this email comes to a human (newsletter@heurekalabs.co). Forward freely.

Agentic Discovery is a project of Heureka Labs · Unsubscribe