The Rerun

On July 1, 2026, we deleted our own research history and made the discovery loop start over from the rules alone — twice. One arm climbed back to roughly where we had started. The other found, in its third experiment, a rule 115 experiments had missed, and its champion beat ours.

Research frozen — results as of July 2, 2026

Saturation is evidence about the search, not the game

The previous chapter ended with a claim we could not distinguish from its own shadow. The discovery loop had saturated: on July 1, 2026, a final session pushed the journal from entry 092 to entry 115 and kept nothing . Every sweep around The Shape Reader (X188) came back flat. One reading was that the champion sat near the ceiling of hand-coded cricket. The other reading was quieter and worse: that 115 experiments of accumulated doctrine — every kept mechanic, every annotated dead end, every threshold tuned against the same growing opponent pool — had walked the search into one basin and could no longer see out of it. Path-dependence and a true ceiling look identical from the inside.

There is only one experiment that separates them, and it is an uncomfortable one: delete the history and run the search again. If a fresh loop, starting from nothing but the rules of the game, climbs to the same champion and stalls at the same height, the plateau belongs to cricket. If it finds things we never found, the plateau was ours. So on the same day the old loop's final session produced zero keeps, we started the rerun.

The clean room: two arms, zero memory

The protocol's whole job was to prevent leakage. Each rerun arm ran in an isolated git worktree checked out at the loop's birth commit — the April 20 scaffold , created before the first experiment was ever journaled. At that commit the simulator contains only the classic bots; the journal contains only the objective, the original 11-bot opponent pool, and the E12 baseline to beat. No X-series strategy exists in the engine, no entry exists in the journal, and the agent driving each arm started with a fresh context: no access to the main tree, its journal, or the project's memory notes. The same engine family, the same bench harness, the same pre-registered keep rules — and none of the answers.

We ran two arms. A control arm, driven by a different model (Opus), was given a fixed ten-iteration budget — enough to test whether the protocol itself reproduces the original loop's opening trajectory. A treatment arm (Fable 5) was allowed to keep running, and eventually wrote 114 journal entries across nine sessions on July 1–2 . The two arms differ in both model and budget, so they are not a controlled comparison of models — they are two independent replays of the same search, one short and one long.

The control arm reproduced our start — and stopped where its budget did

Ten iterations in, the control arm's champion was X107_PSChaseMild8: a 55.2% mean win rate, 51.7% minimum, beating all 11 pool opponents , against an E12 baseline of 54.6% mean and a 49.5% worst matchup . That is almost exactly where the treatment arm stood at the same iteration count — its tenth-iteration champion X109 sat at 55.0% on the same pool . Two independent agents, same rules, same first plateau.

The arm's own closing reflection is worth quoting in substance: its final gains came from tuning a scoring threshold whose response curve was single-peaked — 54.7%, 54.9%, 55.2%, 55.1%, 54.5% across the five multipliers it swept — and anything further would need a genuinely new mechanic the session did not reach. We are careful with the word "plateau" here: ten iterations is a budget, not an asymptote. What the control arm establishes is the baseline shape of the search — from the rules alone, ten experiments buy you roughly 55%, by either model's hand.

It also put a price on our history. Ported onto the main lineage's expanded 14-bot pool, the control champion scored a 50.9% mean and lost to The Shape Reader head-to-head 30–39% across skill profiles — 89 iterations of accumulated lineage are worth roughly 10pp of mean . The lineage was real capital. The question was whether it was also a cage.

The treatment arm found our biggest blind spot in three iterations

At iteration 3 — its third experiment ever — the treatment arm kept X102_WinTap, and with it a rule the original lineage never wrote down in 115 entries: the win-tap . With six lanes closed and any non-negative lead, aim the last unclosed target, and take the minimum-miss single when one mark away — because under our rules a tie is a winning position for the player who closes out. The old champion never internalized this. The Shape Reader's tie handling aims a double at the bull , and when the exact endgame solve later audited it (see Ground Truth), the missing win-tap turned out to be its single biggest blunder class: in a tied endgame one single from victory, it forfeits 0.542 of a win probability on one dart . A fresh agent, reading the rules with no inherited doctrine, saw in its third experiment what 115 experiments of composition had built on top of and around.

Eight iterations later came the largest single mechanic found anywhere in this project. Entry 011, X110_FaucetShut, added faucet denial: while covering at a crushing lead, shut the opponent's income lanes — the numbers they have closed and you have not — before claiming empty real estate. The champion's mean jumped from 55.0% to 64.4%, +9.4pp in one step . The differential signature says why: the gain concentrates against opponents that farm points while ahead — +17.9pp against PS, +15.7pp against S6, +13.9pp against E1 . Every strategy in the classic field decides what to do with its own lanes; faucet denial is about the opponent's.

Key finding

Faucet denial — shut the opponent's scoring lanes before claiming empty ones — was worth +9.4pp in a single iteration (55.0% → 64.4% pool mean) , the largest one-step gain in the project. The original 89-iteration lineage never found it.

The climb: from X109 to The Shutout

From there the treatment arm did what the original loop had done — compose, sweep, expand its own opponent pool, prune — except along a different ridge. Seven champions later it arrived at The Shutout (X181): race-gated banking, faucet denial, a hoard brake, and minimum-miss single bull taps, the first strategy to sweep all 18 of its pool opponents (64.40% mean, 50.44% minimum; a seed-7 re-bench reproduces it at 64.40/50.41) . The home-pool numbers in the table below are not comparable across rows — the arm's pool grew from 11 to 18 opponents as it went — so the right column places every champion on the project's unified Elo ladder (27 artifacts, anchor S1 = 1000; see One Ladder for the method) .

The rerun champion climb, plotted on the project's unified Elo scale (the one measure comparable across all six champions — home-pool win rates shift as the opponent pool grows from 11 to 18, so they are not). The win-tap arrived at iteration 3; faucet denial was the +9.4pp jump. The climb runs X109 (1113) to The Shutout (X181, 1176).
The treatment arm's champion trajectory: each kept champion, its new mechanic, its home-pool bench at keep (pool size grew from 11 to 18 opponents, so home means are not comparable across rows), and its unified-ladder Elo (Bradley–Terry fit over 311 pairings; C-engine pairings 4,000 games each). The Shape Reader (X188) is shown for reference.
Champion New mechanic Home pool Home mean / min Unified Elo
X109 tie semantics + win-tap 11 bots 55.0% / 49.6% 1113
X123 faucet denial + unified descending walk + extra-darts redirect 11 bots 66.0% / 56.6% 1107
X129 endgame-gated extra-darts redirect 1135
X147 hoard brake at −45 + min-miss taps 16 bots 65.7% / 53.9% 1162
X159 the anti-champion (beats X147 head-to-head +1.9pp) 65.1% / — 1167
X168 sixth extra-darts site 17 bots 64.9% / 49.3% 1173
The Shutout (X181) race-gated banking + single-aim bull taps; first 18/18 sweep 18 bots 64.40% / 50.44% 1176
The Shape Reader (X188), for reference the original lineage's champion 1118

The cross-lineage verdict came in stages. The first two rerun champions were ported verbatim into the main tree and benched on the main lineage's expanded 14-bot pool, same seed and protocol as every X188 bench. X123 scored 63.3% mean against The Shape Reader's 60.65% baseline — +2.66pp same-seed, all 14 opponents beaten — the first candidate to pass the main keep rule since X188 itself. X129 pushed it to 64.8% (+4.13pp, minimum 56.7%) and beat The Shape Reader head-to-head at every skill profile: 57.5%, 56.2%, 55.6% across amateur, good, and pro . No equivalent fixed-reference bench was run for X147, X168, or X181 ; their superiority rests on the unified ladder, where — as of July 2, 2026 — every rerun champion from X129 up (Elo 1135–1176) rates above The Shape Reader (1118) .

Head-to-head win rates of three rerun champions against The Shape Reader (X188), from the unified-ladder match data, 4,000 games per pairing, both players pro skill. Color encodes win rate.
Rerun champion Win rate vs The Shape Reader Games
X123 50.6% 4,000
X129 57.0% 4,000
The Shutout (X181) 54.6% 4,000

Shading: win rate 51% → 57%.

The rerun's verdict

As of July 2, 2026, the best hand-coded strategy in this project is not the champion of 115 accumulated experiments — it is The Shutout (X181), from a rerun that started over from the rules alone. On the unified ladder it rates ~58 Elo above The Shape Reader (1176 vs 1118) and beats it head-to-head 54.6% (4,000 games) . Our plateau was path-dependence, not the game's ceiling.

One honest wrinkle the ladder adds: X123, despite its +2.66pp cross-bench and its head-to-head edge, rates at 1107 on the unified field — slightly below The Shape Reader's 1118 . A head-to-head win and whole-field strength are different quantities, and against the full 27-strategy field the early rerun champions were merely competitive. The decisive break comes at X129 and holds through The Shutout. The Shape Reader also still edged the rerun's very first champion, beating X109 50.8% — the rerun did not start ahead of us. It ended ahead of us.

Same basin, different doctrine

The rerun could have diverged for uninteresting reasons — a different random walk through the same landscape. Two observations say something more structured happened. First, where the landscape is real, the arms converged: the treatment arm independently re-derived findings the main lineage had proved from the other side, including the dead-code result on The Shape Reader's mr≤9 endgame gate (their entry 037, our entry 107) and the value-ordered lane-selection law . When two searches with no shared history land on the same mechanism, the basin belongs to the game.

Second, where the arms disagreed, the disagreements traced back to inherited doctrine — and the fresh arm was right. The bull tap is the cleanest case: The Shape Reader aims HIT_DOUBLE at the bull, a choice inherited from the third experiment the original loop ever ran ; the rerun derived the minimum-miss single instead and measured it as +0.27pp with no losing matchup . The exact solve later sided with the rerun . The original lineage's own late history points the same direction: its two "keep" refinements X112 and X165, each tuned against a fixed frontier pool, rate roughly 40 Elo below their ancestor X109 on the unified field (1071 and 1073 vs 1113) . Tuning against the pool you evolved is how a lineage converts its history into overfitting — the mechanism behind the path-dependence this chapter set out to test.

Burning your history is a method

Here is the accounting. The control arm shows the protocol reproduces the search's opening: from the rules alone, ten iterations reach ~55% by two different models' hands . The treatment arm shows the original plateau was not the game's: given the same starting line and room to run, a history-free search found the win-tap at iteration 3 , faucet denial worth +9.4pp at iteration 11 , and a champion that, as of July 2, 2026, out-rates ours by ~58 Elo . The convergent findings say the rerun was exploring the same game; the divergent ones say our journal — the loop's greatest strength, the thing that made 89 iterations compound in two days — had also frozen iteration-3 guesses into permanent doctrine.

So we take the lesson as method, not metaphor: a research lineage should periodically be made to justify itself against a clean restart. The journal made discards cheap and composition fast; what it could not do was re-litigate its own foundations, because every later experiment stood on them. One clean rerun did in a day what the 24 entries of the final saturating session (092–115) could not : it told us which of our beliefs were about cricket and which were about us. The obvious limit of the claim is that this is a single replication, two arms, one game — and a hand-coded search, however fresh, still only explores strategies a person could write down. What happens when the search is not hand-coded at all — when a policy learns the game from millions of self-played darts — is the next chapter.