The False Summit

The first strategy this project invented looked like a champion at nine of eleven skill levels — until two simulator bugs were fixed. This chapter is kept because it failed: the retraction became the discipline every later result had to survive.

Research frozen — results as of July 2, 2026

Retraction · April 20, 2026

The Phase Switch strategy was originally reported as the #1 strategy at 9 of 11 skill levels. Two simulator bugs — a 200-turn cap that truncated long casual-skill games, and a tally that credited truncated games to the losing side — had inflated its apparent win rate at low skill. After the fixes it sits mid-pack, ranking #8–#10 across the 31-strategy field. The design story below is preserved as written; every number on this page comes from the post-fix, frozen record.

The idea was real: commitment beats oscillation

The Phase Switch (PS) began with a genuine observation about the published strategies. Frongello’s S5 sounds formidable on paper: it keeps scoring until its lead reaches nine times the value of the highest open number, then switches to covering. More points, more insurance. Yet in round-robin play S5 loses to S2, a far humbler rule that starts covering the moment it has any lead at all.

The mechanism behind that upset is oscillation. S5 re-evaluates its score-versus-cover decision on every dart. Whenever the opponent scores, S5’s lead dips below the threshold and it reverts to scoring; the lead climbs back and it switches again. It never commits to a plan, and the back-and-forth costs darts and tempo at exactly the stage of the game where darts are scarcest.

The Phase Switch made the transition one-way. Score aggressively until a trigger condition is met, then cover permanently — no re-evaluation, no going back. The trigger was a two-part “combo” condition found by grid search: switch when you have three or fewer targets left to close and nine or fewer total marks remaining. The second clause matters because three untouched targets (nine marks to go) is a very different position from three targets with two marks each (three to go); marks remaining measures actual progress, not just a count of open numbers.

The grid search itself behaved encouragingly. Across every lead-threshold multiplier tested, the combo trigger beat every fixed switch-at alternative, and a wide middle band of thresholds performed almost identically — the design looked robust rather than tuned. In head-to-head play the finished strategy beat S2, the strongest classic S-bot in the sweep — 55.8% in the 50,000-game grid search, which ran at pro skill only. (The post-fix sweep shows how narrow that edge is: it holds only from MPR 2.5 up, 50.5–52.9%, and reverses below, 49.1–49.6% at MPR 0.8–1.5, 20,000 games/matchup.) Everything a researcher wants to see from a new mechanism was there: a plausible causal story, insensitivity to exact parameter choice, and a win against the reigning baseline.

The summit: #1 at nine of eleven skill levels

Dropped into the full tournament — every strategy against every other, across eleven simulated skill levels from beginner to professional — the Phase Switch appeared to top the table at nine of the eleven levels. For a first original design, measured against seventeen published strategies and a growing set of experimental ones, that read as a breakthrough: the one-way commitment insight seemed to be worth several percentage points against the entire field, not just against S2.

It wasn’t. The effect was concentrated at casual and amateur skill — precisely the levels where games run longest, because weak simulated players miss often and take many turns to close. That concentration should have been the tell. It was, instead, the bug’s signature.

The tighter benchmark: two bugs, one deflation

The claim collapsed under a closer look at the simulator, not at the strategy. Two defects compounded:

  1. A 200-turn cap truncated long games. At casual skill, games regularly ran past the simulator’s MAX_TURNS=200 limit and were cut off unfinished — disproportionately the slow, grinding games that low-skill play produces.
  2. Truncated games were tallied for the wrong side. The tournament matrix attributed cut-off games in a way that systematically favored strategies whose games ran long — and the Phase Switch, which spends its first phase piling up points instead of closing, runs long at low skill.

Fix both, re-run everything, and the summit evaporates. In the corrected 31-strategy sweep (20,000 games per matchup, equal skill both sides, 11 skill levels from MPR 0.8 to 5.6), the Phase Switch ranks #8–#10 at every level. It still beats S2 head-to-head at the pro skill the grid search ran at — 51.9% on the frozen ladder (4,000 games) — but the edge vanishes at low skill (49.1–49.6% at MPR 0.8–1.5 in this same sweep) and never generalized to the field. The strategies above it, led by E12 (#2 at all 11 levels, average win rate 57.8–59.5%) and eventually The Shape Reader (X188), beat a wider field.

Key finding

The Phase Switch’s apparent dominance was an artifact of game truncation at low skill. Post-fix, it ranks #8–#10 within the 31-strategy field at all 11 skill levels (20,000 games/matchup) — a real but unremarkable strategy carrying one real insight.

What did the corrected summit look like? The same sweep that demoted the Phase Switch crowned the Shape Reader #1 at all eleven skill levels — the subject of the next chapter. Its average win rate rises steadily with skill, from 59.0% at MPR 0.8 to 68.0% at MPR 5.6 — a champion within that 31-strategy field as of July 2, 2026, though later arcs of this project produced strategies that beat it.

The Shape Reader (X188) average win rate across the 31-strategy field at each of 11 skill levels (MPR 0.8 to 5.6), 20,000 games per matchup — the post-fix sweep in which the Phase Switch ranks #8–#10.
MPR 0.81.01.2 1.52.02.5 3.03.64.0 4.95.6
X188 avg WR 59.0% 59.1% 59.5% 59.7% 60.7% 61.6% 62.5% 64.1% 65.3% 66.1% 68.0%

Shading: win rate 59% → 68%.

Average win rate of the post-fix sweep’s actual #1, The Shape Reader (X188), across the 31-strategy field at each skill level (20,000 games/matchup, equal skill both sides). The Phase Switch sits #8–#10 in every column.

Where the Phase Switch stands on the frozen field

The project’s later instruments — a unified Elo ladder spanning all three research lineages, and an exact endgame tablebase — give the Phase Switch its final, frozen assessment. As of July 2, 2026, on the unified ladder it rates Elo 986.7 ±1.7 (88,000 games), 26th of 27 rated artifacts — below even the S1 anchor at 1000, the pure closer that never intentionally scores a point. Against the modern champions it is not close: The Shutout (X181) beats it 84.4% head-to-head on the ladder’s round robin (4,000 games per pairing).

And yet the endgame tablebase audit hands the design one quiet vindication. Scored against exact win probabilities in the late-game slice (13 participants, 6,000 games each), the Phase Switch ranks 7th of 13, forfeiting 0.0475 win-probability points per game (95% CI 0.0436–0.0511, 2.48% blunder rate) — ahead of The Shape Reader (X188) itself, which ranks 10th at 0.0519 per game with a 3.42% blunder rate. Once the Phase Switch locks into its closing phase, it plays the endgame more cleanly than the strategy that dethroned it. The one-way commitment insight was sound; it just wasn’t worth a championship.

The Phase Switch measured by the project’s three frozen instruments as of July 2, 2026: the 31-strategy skill sweep, the 27-artifact unified Elo ladder, and the 13-participant endgame tablebase audit.
Instrument Field Phase Switch result
Skill sweep (11 MPR levels) 31 strategies, 20,000 games/matchup Rank #8–#10 at every level
Unified Elo ladder 27 artifacts, all three lineages Elo 986.7 ±1.7, rank 26 of 27 (88,000 games)
Endgame tablebase audit 13 participants, 6,000 games each Rank 7 of 13 — 0.0475 EV lost/game, 2.48% blunders

Ladder and audit details: One Ladder and Ground Truth. Raw data ships with this site in /data/ladder.json and /data/endgame_audit.json.

What the machinery learned: the replication gate

The Phase Switch retraction cost the project its first headline, and in exchange it bought the thing every later result stands on. The failure mode was specific: a result that flattered the hypothesis, concentrated where the simulator was weakest, accepted without an independent check. So the fix was made mechanical rather than moral. From the autoresearch loop onward, no result was kept on a single measurement: a candidate had to clear a pre-registered win bar on its benchmark, and champions had to reproduce under a fresh random seed before any claim was recorded. A number that appears once is an anecdote; a number that survives replication is a result.

The frozen record shows that discipline holding across every later arc of the project. When the clean-slate lineage crowned The Shutout on its first 18-of-18 sweep at 64.40% mean / 50.44% minimum, the keep was confirmed by a seed-7 re-bench that reproduced it at 64.40 / 50.41. When the reinforcement-learning arc reported its PPO policy at 75.79% mean over an 11-bot pool (2,000 games/matchup), an independent-seed replication returned 76.01% (1,000 games/matchup); the deterministic it2 milestone’s 72.89% greedy mean replicated at 73.03%. Even the underdog specialist’s modest headline — The Grinder (X214_UD) at 17.63% mean / 13.31% minimum win rate when outgunned — carries its seed-7 validation (17.65 / 13.22) in the journal beside it.

The rule the false summit wrote

No keep without a pre-registered bar; no champion without an independent-seed replication. Every headline number in the chapters that follow — and on this page — survived that gate before it was frozen on July 2, 2026.

That is why this chapter stays on the site. The Phase Switch is the project’s honesty template: the full anatomy of a wrong result — plausible mechanism, confirming grid search, flattering tournament, hidden artifact — followed by the correction, published in place rather than quietly deleted. Read the chapters that follow with this one in mind. Each of them made a stronger claim than the Phase Switch ever did, and each one had to earn it twice.