06 Jun 2026 6 min read

Protocols quietly become the new moat

Last week's issue argued the agent stack had matured faster than the science riding on it, and that the artifact under review was becoming the trajectory rather than the result. This week sharpens that claim in an unexpected direction. The trajectories are converging on shared rails. LAP proposed a wire protocol for agents to drive lab instruments. mcp-proto-okn routed federated scientific knowledge graphs through MCP. A graph-based planner replaced prompt chains with MCP-native plans. The story is no longer that agents are running pipelines. It is that the connectors underneath those pipelines are becoming standardized — and the choice of standard is starting to look like the choice of scientific instrument.

The protocol layer hardens

Three converging releases name the same shift. LAP does for instruments what MCP did for software: a wire protocol that lets an agent address a mass spectrometer, a sequencer, or a liquid handler without bespoke glue code. mcp-proto-okn extends MCP to federated scientific knowledge graphs, so a biomedical agent can query the Open Knowledge Network the same way it queries a local tool. The MCP-native planner goes further still, replacing prompt-chained orchestration with a graph of tool calls discoverable at runtime. Each release on its own reads like infrastructure. Together they describe a stack: instruments at the bottom, knowledge graphs in the middle, planners on top, and a single protocol family carrying state between them.

What gets standardized is what gets reproducible

The accountability question last week was where scrutiny lives when an agent runs the chain end-to-end. The answer was: in the trajectory — memory state, tool versions, parameter choices. This week makes that answer operational. A trajectory recorded across MCP calls is portable; one recorded inside a bespoke prompt chain is not. CodeCytos makes the stakes concrete: instead of selecting from canned spatial-imaging analyses, the agent writes and executes code against multiplexed tissue images on the fly. The result is more flexible and far harder to audit — unless the code, the tool calls, and the intermediate states all travel through a protocol the next reviewer can replay. The protocol is the audit log.

The instrument is the protocol

CERN's Archi system ran agentic operations on the CMS detector this week — agents making real decisions on one of the largest scientific instruments in existence, under reliability constraints biology rarely matches. The template that arrives with it is not the model or the planner; it is the interface contract between agent and instrument. Once that contract is standardized, swapping the model behind it costs less than swapping the protocol underneath. This is the inversion worth naming: in a stack where foundation models are commoditizing and benchmarks are migrating toward process evaluation, the durable layer is the wire. The protocol decides what an agent can see, what it can act on, and what a future reviewer can reconstruct. That is what an instrument does.

The open question

Standards converge through either coordination or capture. MCP arrived from Anthropic; LAP arrived from an academic preprint; the Open Knowledge Network sits inside a federation of public scientific databases. None of these are neutral. The protocol that wins will encode someone's assumptions about what an agent should be allowed to do — what tools it can discover, what state it must log, what failures it must surface. The benchmarks arc covered earlier this year argued that the yardstick designer sets the ceiling on progress. The protocol designer sets something stronger: the floor of what is recordable, and therefore what is reproducible. The next quarter is going to be about who writes those specifications, and which labs adopt which stack before the question is settled by inertia.

Still tracking

Benchmarks as audits: PromptBio-Bench grades full bioinformatics pipelines, MedCase-Structured exposes how LLMs fail on FHIR records, ProjectionBench tests hypothesis revision under partial information, and a process-level chemistry benchmark scores intermediate reasoning — watch for the first model claim rejected on trajectory grounds rather than final-answer accuracy.
The open stack and its liabilities: Rosalind Biodefense formalizes a vetted-access tier for biosecurity-sensitive capabilities while GPT-Rosalind ships to the general tier — watch which academic groups accept the gated terms and which route around them to open-weight alternatives.

Reply with what you're seeing. A human reads them. Forward freely.

AGENTIC ARC

Nº III · week of 01 Jun 2026 · from Agentic Discovery