Research agenda — R1–R7, on the record

Seven directions, three frontiers, one question.

Ephemerent studies how useful intelligence emerges, briefly, from systems of small agents — many minds spun up for a moment, cooperating, judged, and gone, leaving only the result. Our seven research directions are not seven bets; they are three fundable frontiers, each with a product that expresses it and a public artifact that proves it. The discipline that keeps the lab honest is one line: LLMs now, layers later. We ship LLM-powered, legible coding agents today, and research the layers that integrate alongside them — never as a replacement, always as a measured addition with an eval that proves it earned its place.

Explore Orrery Back to the lab

Directions

R1 — R7

Frontiers

Coordination · Verification · World models

Discipline

LLMs now, layers later

Horizon

Eight quarters · 2026 — 2028

Frontier A

Emergent multi-agent coordination.

The open gap: today's interoperability protocols assume the problem is already solved. A2A v1.0 and MCP both encode static roles and pairwise hand-offs — neither has a framework for spontaneous collaboration, distributed consensus, or emergent task-sharing. Gossip-protocol-style decentralized coordination is a named open gap. For a lab called Ephemerent, this is the natural home and the most distinctive claim we can stake. Over eight quarters we will instrument Orrery's existing parallel-worktree runs as coordination traces, prototype a gossip-style consensus layer where agents exchange intermediate diffs and converge without a central judge, and scale that layer to Colony at fleet scale. The credible public output: an open multi-agent coordination dataset (anonymized Orrery traces) and a tech report on gossip vs. central dispatch — gossip coordination is marked as a bet, advisory-first, promoted only when it beats central dispatch on the benchmark.

R1 Emergent orchestration Decomposing a goal into parallel agents, running each in isolation, and merging only what works. Orrery already does this with a central dispatcher; the research question is what coordination looks like with no dispatcher at all — agents that share partial results, vote, and re-allocate work via gossip. Product: Colony.

R7 Distributed compute mesh A hive-style network where contributors parallelize training and inference across their own GPUs — async gradient averaging, fault-tolerant participation, credit-backed rewards — so coordination is tested at fleet scale, not just on one laptop. Datacenter-scale ambition without datacenter monopoly. Product: Colony (marked as a bet).

Frontier B

Verification & evaluation as the hard problem.

The open gap is not another polished completion; it is evidence that a system did the requested work inside the intended boundary. Multi-agent, long-horizon, and self-improving systems make that gap harder because more steps can fail silently. Over eight quarters Arbiter becomes the through-line: we ship execution-based scoring as Orrery's default judge, build a public eval harness with step-level and outcome-level scoring, and stand up a drift record that re-measures solve rate as hosted models change. Vellum is the spatial expression — replacing theatrical proof with a real software rasterizer so “prove, don't eyeball” is literal, with no GPU in the loop. These are commitments, not benchmark claims. The intended public output is a reproducible eval harness, a multi-agent verification benchmark, and a short paper on execution-based selection versus model-judge review.

R2 Verifiable selection Choosing the best attempt by execution, not impression — panels, metrics, and judges instead of guesswork, with step-level scoring (was each split, tool-call, and hand-off sound?) and outcome-level scoring (did the merged result work?). Products: Arbiter (the verification layer) and Vellum (spatial verification).

Frontier C

World models as a planning substrate.

The open gap: 2026 mainstreamed world models — Genie 3, World Labs / Marble, LeCun's AMI direction. For code, the analogue is an agent that simulates execution and repo state before acting — simultaneously a planning primitive and a verification primitive: predict the consequence of a patch, prune the bad branches before spending tokens on them. Over eight quarters Seed is the bet, and we mark it as a bet. We will collect patch-consequence data from Orrery runs (proposed diff → test/build outcome), train a small predictor that estimates a patch outcome before execution and use it to prune best-of-N candidates, and explore latent and multimodal reasoning and spatial grounding where Vellum's geometry gives a signal. Seed runs advisory-only first — it ranks candidates; the LLM and Arbiter still decide. A pre-execution patch predictor lands behind a flag by Q1 2028, integrated alongside (never replacing) the LLM agents, and only promoted if it improves solve-rate-per-token on Arbiter's harness. The credible public output: an open patch-consequence dataset and a tech report on pre-execution pruning. R6 — wave/RF/photonic analog compute — is the longest-horizon bet and stays explicitly research-only across this window.

R3 Latent world models Computational state — repos, runtimes, proof states — encoded and rolled forward in embedding space, scoring candidates and imagining outcomes before real execution. Product: Seed (marked as a bet).

R4 Multimodal latent reasoning Text, code, and vision sharing a representation space for planning and verification — alongside the LLMs that still emit the final code and proofs. Multimodal latent reasoning probes scheduled mid-window; integrated only when measured to earn its place.

R5 Spatial & embodied agents Agents that build and check things you can see — shaders, parts, and scenes, verified without a GPU. Vellum's geometry gives a grounded signal; visual context as a future layer on the same agent stack.

R6 Wave compute substrates RF and photonic analog accelerators — matrix operations in electromagnetic waves instead of shuttled electrons. Algorithm–hardware co-design, not bigger GPUs alone. The lab's longest-horizon bet; explicitly research-only across this window.

On the record

What is shipped versus what is a bet.

To stay honest: execution-based verification, the eval harness, the drift leaderboard, and Vellum's real rasterizer are commitments — extensions of work already running in Orrery and Vellum, and they fund themselves by making the product more trustworthy. Gossip-style emergent coordination, Colony at fleet scale, Seed's software world model, and every line of R6 are bets — flagged, advisory-first, and promoted only when an Arbiter eval proves they beat the LLM-only baseline on solve-rate-per-token. Nothing replaces the LLM. Each layer is added when, and only when, it is measured to earn its place. That is the whole discipline: layers later, but only the layers that pass the test.

Explore Orrery Back to the lab