Reference — Darts Cricket Research

Every number on this site traces to one fact sheet

This site makes quantitative claims — rankings, win rates, Elo ratings, head-to-head results — and every one of them was transcribed from a single facts sheet, itself read from committed research artifacts on July 2, 2026, the project's freeze date. Each fact carries a stable ID (F-ELO-IT3, F-TB-TIEBLUNDER, …). When a chapter states a load-bearing number, it links that ID to this page; the anchors land either on the row of a table below or on the entry in the fact register at the bottom. If a claim has no fact ID, it is context, not evidence.

Two conventions apply everywhere. First, results are frozen: "champion" means champion as of July 2, 2026, and no page updates that date. Second, every win rate carries its sample size, because a win rate without an n is a rumor.

The catalog: 27 strategies, three lineages, one scale

The table below is the project's roster, ordered by the unified Elo ladder that put all three lineages — the classic catalog, the hand-coded discovery champions, and the neural policies — on one scale (One Ladder tells the story). Ratings are Bradley–Terry maximum-likelihood fits over 253 C-engine pairings (4,000 games each) plus 58 Python-engine pairings (1,500 games each), anchored at S1 = 1000 . As of July 2, 2026, the top of the ladder belongs to the reinforcement-learning policies: The Closer (it3) rates 1222 , about 47 Elo clear of the best hand-coded strategy, The Shutout (X181), which it beats 60.0% head-to-head (1,500 games) .

A scope note on coverage: the classic catalog is larger than its ladder representation. The project's pre-discovery field held 30 strategies — the 17 Frongello S-bots plus experimental E-variants — but only the representatives that later work trained against or audited were rated. Per the naming convention, five artifacts carry style-of-play names ; everything else goes by its research ID.

The unified Elo ladder: all 27 rated strategies with lineage, Elo rating, standard error, rated games, and a one-line description, as of July 2, 2026.
#	Strategy	Lineage	Elo	±SE	Games	One line
1	The Closer (it3)	RL	1222.0	2.8	24,000	PPO league self-play policy; strongest artifact in the project; 74.0% agreement with the exact endgame solve
2	it2_det	RL	1215.7	2.7	24,000	Determinism milestone: step-penalty fine-tune that ended point-hoarding — 0 stalls in 26,000 eval games
3	PPO_run2	RL	1205.8	2.7	24,000	Stochastic PPO policy (75.79% pool mean); its argmax deadlocks — randomness was load-bearing
4	The Shutout (X181)	Clean slate	1175.5	1.7	94,000	Best hand-coded strategy: race-gated banking, faucet denial, min-miss bull taps; first 18/18 home-pool sweep
5	X168	Clean slate	1173.3	1.7	94,000	X168_LastLaneED — added the lineage's sixth extra-darts redirect site
6	X159	Clean slate	1167.1	1.7	94,000	X159_BrokeDenyED, the anti-champion: beat then-champion X147 head-to-head
7	X147	Clean slate	1162.2	1.7	94,000	X147_BrakeTap45 — hoard brake at −45 plus min-miss taps
8	X129	Clean slate	1135.1	1.7	94,000	Endgame-gated extra-darts redirect; beat X188 at every skill profile on the cross-bench
9	The Shape Reader (X188)	Main	1117.5	1.7	88,000	The 89-iteration main-lineage champion and the old site's hero; detector-branch chase denial
10	X109	Main	1113.0	1.7	88,000	The main-lineage keep X188 improved on; owner of the audit's largest catalogued blunder (0.563 win prob at a tie)
11	X123	Clean slate	1107.3	1.6	94,000	Denial + descending-value walk + extra-darts redirect; the first cross-lineage milestone (+2.66pp over X188, same seed)
12	The Textbook (DIST)	RL distillation	1085.2	2.6	24,000	The neural champion's doctrine as human-readable rules; beat all seven classics it played, loses ~62% to the X1xx champions
13	X165	Main	1072.7	1.7	88,000	Home-pool keep that overfit its frontier pool — ~40 Elo below X109 on the unified field
14	X112	Main	1070.8	1.7	88,000	Home-pool keep, same overfit pattern as X165
15	H1_Hoarder	Clean slate	1052.2	1.7	88,000	Point-hoarder injected as a "weak" pool trainer; outrates every classic baseline
16	S14	Classic	1036.6	1.6	94,000	Chase variant; top classic S-bot on the unified ladder
17	S10	Classic	1031.7	1.6	94,000	Chase variant, second of the S-family pair atop the classics
18	E12	Classic	1018.3	1.6	94,000	FinishOppClosed1: best classic bot, seed of the main lineage and the RL teacher — and the most exploitable audited artifact (82.1%)
19	X103	Main	1018.3	1.7	88,000	The main lineage's founding entry, seeded from E12
20	E10	Classic	1013.7	1.7	88,000	Experimental variant of the classic catalog
21	S2	Classic	1013.6	1.6	94,000	Score/cover at lead 0; strongest classic S-bot at every skill level of the old MPR sweep
22	E3	Classic	1012.3	1.6	94,000	Experimental variant of the classic catalog
23	S1	Classic	1000.0	—	88,000	Pure closer; the ladder's fixed anchor (Elo 1000 by definition)
24	E1	Classic	992.4	1.7	94,000	Experimental variant of the classic catalog
25	E11	Classic	989.7	1.7	88,000	Experimental variant of the classic catalog
26	PS	Classic	986.7	1.7	88,000	Points-first baseline
27	S6	Classic	949.5	1.7	94,000	Extra-darts classic; bottom of the rated field

The fit is honest about its own error: the mean gap between observed head-to-head results and the Bradley–Terry prediction is 5.08pp, with a worst residual of 16.53pp (S10 vs E1: observed 39.1% where the model predicts 55.6%) . And the field is unusually well-behaved — an intransitivity scan over every ordered triple found zero rock-paper-scissors cycles in which every edge clears 52% (≈2.5σ at 4,000 games): to measurement precision, the 27 strategies form a total order .

Selected head-to-heads

A sample of observed pairwise results from the ladder's match data; row beats column at the stated rate. Raw win/loss counts for all 311 pairings are in ladder.json.

Selected head-to-head win rates from the unified ladder match data: 4,000-game C-engine pairings or 1,500-game Python pairings; cell color encodes the winner's rate.
Matchup (winner first)	Win rate	Source
The Closer (it3) vs S6	90.4%	F-ELO-H2H
The Shutout (X181) vs PS	84.4%	F-ELO-H2H
The Closer (it3) vs S2	81.7%	F-ELO-H2H
The Shutout (X181) vs X165	74.5%	F-ELO-H2H
The Closer (it3) vs X123	65.3%	F-ELO-H2H
The Closer (it3) vs The Shutout (X181)	60.0%	F-ELO-RLTOP (1,500 games)
X129 vs The Shape Reader (X188)	57.0%	F-ELO-H2H
The Shutout (X181) vs The Shape Reader (X188)	54.6%	F-ELO-LINEAGEGAP
The Closer (it3) vs it2_det	54.5%	F-ELO-RLTOP (1,500 games)
it2_det vs PPO_run2	52.9%	F-ELO-RLTOP (1,500 games)
The Shape Reader (X188) vs X109	50.8%	F-ELO-H2H
X123 vs The Shape Reader (X188)	50.6%	F-ELO-H2H

Shading: head-to-head win rate 51% → 90%.

Off the ladder: rated by other instruments

Five artifacts matter to the story but do not appear on the unified ladder, either because they compete on a different grid (skill-mismatched play), because no Elo fit was ever committed, or because they were superseded before the ladder existed. Their headline numbers come from their own benchmark protocols and are not directly comparable to ladder Elo.

Artifacts not on the unified Elo ladder, with their arm, headline benchmark, and the reason they are unrated.
Artifact	Arm	Headline number	Why it is not on the ladder
The Grinder (X214_UD)	Clean-slate underdog program	17.63% mean underdog win rate vs the X181-clone baseline's 15.31% — a 15% relative gain; seed-7 validation 17.65%	Specialist for playing up a skill class; rated on the underdog gap grid, not the even-skill ladder
AZ value_iter1 (run 2)	AlphaZero cell	65.6% mean over the 11-bot pool with depth-3 expectimax (22,000 games/config, SE ≈ 0.32pp)	No committed Elo fit exists; the cell is absent from the ladder's match data
X107_PSChaseMild8	Opus control arm	55.2% mean / 51.7% min on its 11-bot pool, beating all 11	10-iteration budget champion; cross-benched at 50.9% mean on the main expanded pool
bc_e12	RL arc (behavior cloning)	55.10% greedy pool mean — a 99.7%-accurate clone that exactly recovers its teacher E12 (55.13%)	Stepping stone to PPO, superseded within the arc
A2C v19	Historical RL (Feb 2026)	49.3% stable win rate vs 11 hard bots	The pre-PPO ceiling; superseded by the 2026-07 BC→PPO→it3 arc

Methodology: how the numbers were made

Benchmarks and error bars

Two simulators produced every result: a C engine (fast_sim.c) for high-volume tournaments and the original Python engine. Games alternate the first thrower and use the pro/pro skill profile unless a fact says otherwise. The standard cell sizes and their 95% Monte-Carlo intervals :

Benchmark cell sizes and their 95 percent Monte-Carlo intervals.
Cell	Games	95% MC interval
C-engine ladder pairing	4,000	±0.8pp
Python ladder pairing	1,500	±1.3pp
RL / exploitability eval	2,000	±1.1pp
Hybrid endgame match	50,000	±0.44pp

Key results were replicated under a second seed before being trusted: the ladder ran on seed 42 ; The Shutout's home-pool sweep reproduced under a seed-7 re-bench (64.40% / 50.41% vs 64.40% / 50.44%) , as did The Grinder's underdog result (17.65 / 13.22 vs 17.63 / 13.31) , and the PPO pool mean replicated on an independent seed at 76.01% vs 75.79% .

The ladder fit

The unified ladder is a Bradley–Terry maximum-likelihood fit with anchor S1 = 1000 and scale 400/ln 10, over 253 C-engine pairings (a full round robin at 4,000 games each) plus 58 Python-engine pairings (1,500 games each, 0 stalls). Standard errors come from inverse Fisher information with the anchor fixed. Cross-engine trust rests on port validation: clean-slate champions ported into the C engine were verified bit-identical to their home engine, and Python ports agreed within 1.5pp of C against a common opponent . Independent corroboration: the RL arc's own keep-rule protocol pegged it3 at Elo 1215.9 from different match data — a 6-Elo (≈0.9pp) gap from the ladder's 1222.0 .

Stall accounting

Cricket has a degenerate corner: a player can refuse to close its last targets and farm points forever. The RL arc's first PPO run found and abused it — 13–34% of greedy-eval games deadlocked, and counting those stalls as losses dropped the run's mean to 46.29% . Every frozen result since uses the fix and reports the count: a 900-dart episode cap with stall scored as a loss. The ladder's Python pairings ran 0 stalls ; it2_det ran 0 stalls in 26,000 eval games (0 in 52,000 across two seeds) , and it3 ran 0 stalls ; the exploitability probes ran 0 stalls in all cells . A "0 stalls" note next to a result is that accounting, not a boast.

The tablebase solve and its gates

The exact endgame solve (Ground Truth) covers the slice of cricket with at most 6 of 14 lanes still open and score difference within ±250, both players on the pro profile: 2,767,444 mark patterns × 3 darts × 501 score diffs = 4.16 billion states, 16.6 GB of exact win probabilities, solved shell-by-shell so each openness shell is a usable checkpoint on its own . Four validation gates all passed: mechanical equivalence against the game engine, analytic spot-checks, a boundary-bracket re-solve of the inner shells, and a Monte-Carlo gate of 20 solved states × 100,000 games each (worst |z| = 2.63, 0 fails) . Validation earned its keep: it caught one real bug — biased first-dart RNG seeding under small sequential seeds, fixed by splitmix64 hardening (commit 8b49948) .

The endgame audit

With the solve in hand, 13 strategies were audited: 6,000 games each (2,000 vs each of X188, CS_X181, and S2), with every in-slice decision — 52,000 to 75,000 per participant — scored against the exact win probabilities. "Agreement" is tie-aware (a choice within 10⁻⁶ of the best move counts as agreeing); a "blunder" forfeits more than 0.05 win probability in one decision . Together with the unified ladder, this audit is the evidence base for the neural policies — there is no policy tournament grid.

Endgame audit ranking of 13 strategies by expected value lost per game against the exact tablebase, with agreement rate and blunder rate; 52,000 to 75,000 scored decisions per participant. Color in the agreement column encodes agreement magnitude.
Rank	Participant	Lineage	Agreement	EV lost/game (95% CI)	Blunder rate
1	The Closer (it3)	RL	74.0%	0.0272 (0.0261–0.0283)	0.38%
2	X123	Clean slate	58.0%	0.0319 (0.0300–0.0339)	1.15%
3	H1_Hoarder	Clean slate	39.0%	0.0369 (0.0342–0.0398)	2.07%
4	The Shutout (X181)	Clean slate	67.7%	0.0400 (0.0379–0.0421)	1.17%
5	X129	Clean slate	57.2%	0.0428 (0.0407–0.0450)	1.67%
6	X168	Clean slate	62.9%	0.0430 (0.0406–0.0453)	1.09%
7	PS	Baseline	51.9%	0.0475 (0.0436–0.0511)	2.48%
8	X147	Clean slate	63.5%	0.0497 (0.0472–0.0521)	1.69%
9	X159	Clean slate	63.9%	0.0507 (0.0484–0.0532)	1.85%
10	The Shape Reader (X188)	Main	61.6%	0.0519 (0.0494–0.0547)	3.42%
11	X109	Main	57.3%	0.0583 (0.0556–0.0611)	4.04%
12	S2	Baseline	43.8%	0.0990 (0.0943–0.1040)	4.63%
13	E12	Main	43.2%	0.1039 (0.0990–0.1090)	4.68%

Shading: tablebase agreement 39% → 74%.

Read the ranking by EV lost, not by agreement: H1_Hoarder agrees with the solve only 39% of the time yet leaks less per game than The Shutout at 67.7% agreement, because most of its disagreements are near-ties . A second instrument from the same solve: hybrid matches, where a champion plays itself except that in-slice decisions are replaced by tablebase-greedy play (50,000 games each, ±0.44pp; plain-vs-plain is 50% by symmetry):

Hybrid bot versus plain champion win rates over 50,000 games each: the endgame headroom of four champions.
Champion	Hybrid win rate	Endgame gain	Tablebase darts/game
The Shape Reader (X188)	54.40%	+4.4pp	7.5
The Shutout (X181)	53.54%	+3.5pp	11.0
The Closer (it3)	52.76%	+2.8pp	9.5
X123	51.40%	+1.4pp	9.5

Exploitability probes

How much can a dedicated adversary punish each strategy? For five frozen targets, a PPO best-response was trained against that target alone (warm-started from the champion) and evaluated at 2,000 games per mode, stalls counted as losses, 0 stalls observed . Lower is harder to exploit: the attack barely beats a coin flip against The Closer but wins 82% against E12.

Trained best-response win rates against five frozen targets, greedy and sampled evaluation, 2,000 games per mode; cell color encodes the adversary's win rate.
Target	Best-response WR (greedy)	(sampled)
The Closer (it3)	53.2%	53.0%
The Shutout (X181)	59.2%	57.3%
X129	61.1%	60.8%
The Shape Reader (X188)	72.1%	69.7%
E12	82.1%	81.2%

Shading: best-response win rate 53% → 82%.

Data downloads

Two data files ship with this site, both copied verbatim from committed research artifacts on the freeze date. Everything a chapter claims about the ladder or the endgame audit can be recomputed from them.

Data files shipped with the site, their contents, and provenance.
File	Contents	Provenance
ladder.json	Ladder metadata (protocol, port validation), the 27-entry rating table with standard errors, and all 311 raw match records (strategies, wins, games, engine, seed)	Verbatim copy of `autoresearch_strategies/elo_matches.json`, 2026-07-02, unmodified
endgame_audit.json	The 13-participant audit ranking (agreement, EV loss with 95% CI, blunder rate, breakdowns by darts/diff/openness) plus the 50,000-game hybrid match results	Field values assembled unmodified from `tablebase/audit_out/*.json`; lineage labels per `tablebase/AUDIT.md`

The previous site's data trees are retained at their old paths for link stability; the chapters on this site transcribe only from the frozen facts sheet, not from legacy JSON.

The fact register

Every citable fact on this site, by stable ID. Chapter links land here (or on the catalog and audit rows above, which carry their own IDs). Statements are condensed; the frozen facts sheet in the repository holds the full wording and the source path for each. All results as of July 2, 2026.

Named artifacts

Fact register: the five named artifacts.
Fact ID	Statement
F-NAME-SHUTOUT	The Shutout = X181 (clean-slate `X181_RaceBull`): best hand-coded strategy — race-gated banking, faucet denial, min-miss bull taps.
F-NAME-CLOSER	The Closer = it3 (`it3_league_ckpt0600`, PPO league iteration 3, greedy): strongest artifact in the project.
F-NAME-TEXTBOOK	The Textbook = DIST: interpretable rule-bot distilled from the PPO policy.
F-NAME-GRINDER	The Grinder = X214_UD (`X214_UD_Brake4`): underdog (outgunned-player) champion.
F-NAME-SHAPEREADER	The Shape Reader = X188 (`X188_X186DetectorChaseDenial`): 89-iteration main-lineage champion, the old site's hero.

The unified ladder

Fact register: unified Elo ladder facts. Per-strategy ratings anchor to the catalog rows above.
Fact ID	Statement
F-ELO-METHOD	27 artifacts rated by Bradley–Terry MLE, anchor S1 = 1000, scale 400/ln 10; 253 C-engine pairings (4,000 games each, pro/pro, seed 42) + 58 Python pairings (1,500 games each, 0 stalls); C ports validated bit-identical, Python ports within 1.5pp; SEs from inverse Fisher information.
F-ELO-FIT	Model fit: mean \|observed − predicted\| edge error 5.08pp; max residual 16.53pp (S10 vs E1: observed 39.1%, predicted 55.6%).
F-ELO-RLTOP	The RL policies top the ladder; it3 clears X181 by ~47 Elo and beats it 60.0% head-to-head (1,500 games); it3 > it2_det 54.5%, it2_det > PPO_run2 52.9%.
F-ELO-LINEAGEGAP	Every clean-slate champion from X129 up (1135–1176) rates above main-lineage champion X188 (1118); X181 beats X188 54.6% (top-of-lineage gap ~58 Elo).
F-ELO-OVERFIT	Within the main lineage only X188 improved on X109 on the unified field; X112 (1071) and X165 (1073), both home-pool "keeps", rate ~40 Elo below X109 (1113) — overfit to a fixed frontier pool.
F-ELO-DIST-STORY	DIST (1085) recovers about a third of the Elo gap from the classic baselines to its teacher PPO_run2 (1206); beats all seven classic representatives it played (55.7–70.9%), loses ~62% to the X1xx champions.
F-ELO-H1-STORY	H1_Hoarder (1052), injected as a "weak" pool trainer, outrates every Frongello baseline.
F-ELO-CYCLES	Zero intransitive cycles clear a >52% bar (≈2.5σ at 4,000 games): the 27-strategy field is, to measurement precision, a total order. 30 near-cycles exist inside the >50% noise band; strongest: E11 → S1 → S14 → E11 (weakest edge 51.9%).

Endgame audit

Fact register: endgame audit facts. The full ranking and hybrid tables anchor above.
Fact ID	Statement
F-TB-METHOD	13 participants × 6,000 games each (2,000 vs each of X188, CS_X181, S2; pro/pro, alternating starters); every in-slice decision (openness ≤ 6) scored against exact win probabilities; agreement tie-aware (< 10⁻⁶); blunder = > 0.05 forfeited; 52k–75k decisions per participant.
F-TB-IT3	The Closer (it3) has the best endgame of anything built: 74.0% agreement, 2.7pp/game leak (71,961 decisions), 0.38% blunder rate — half X188's leak, one-ninth its blunder rate, with no endgame-specific training.
F-TB-X188WEAK	X188 ranks 10/13 in the endgame slice, forfeiting 5.2 win-probability points per game (52,415 decisions, 3.42% blunder rate); every clean-slate champion beats it there. Its rating is carried by opening/midgame play.
F-TB-AHEAD	Agreement collapses when ahead: 52–62% for everyone (vs up to 90.2% when behind — it3's behind-agreement). Behind, chase heuristics coincide with optimal play; ahead, optimal play keeps scoring and denying while champions race to close.
F-TB-TIEBLUNDER	Largest catalogued single-decision forfeit: X109 gives up 0.563 win prob at a tied score with one dart (played double bull EV 0.403; optimal S16 EV 0.966 — the single wins because ties go to the closer). X188 has the identical blind spot (forfeits 0.542: played DB, optimal S15).
F-TB-MOTIF	Of 260 catalogued exemplar blunders (loss > 0.05, top-20 per participant): closed-when-should-score 188, scored-when-should-close 55, closed wrong lane 17. Dominant leak: throwing at one's own last open lane (usually the bull) when behind or level.
F-TB-AGREEMENT-VS-EV	Agreement % and EV lost measure different things: H1_Hoarder agrees only 39% yet loses less per game than X181 at 67.7% agreement — most of its disagreements are near-ties. Rank by EV, not match rate.

The tablebase solve

Fact register: exact tablebase solve facts.
Fact ID	Statement
F-TBSOLVE-SIZE	Solved slice: openness ≤ 6, diff band ±250, pro/pro — 2,767,444 mark patterns × 3 darts × 501 diffs = 4.16 billion states, 16.6 GB of f32 values, solved shell-by-shell.
F-TBSOLVE-GATES	All validation gates passed: mechanical equivalence, analytic spot-checks, boundary-bracket re-solve of shells ≤ 4, and a Monte-Carlo gate (20 states × 100,000 games; worst \|z\| = 2.63, 0 fails).
F-TBSOLVE-SEEDBUG	One real bug found in validation: raw xoshiro seeding biased the first dart under small sequential seeds; fixed by splitmix64 hardening (commit `8b49948`).

Exploitability

Fact register: exploitability probe facts. Per-target rates anchor to the exploitability table above.
Fact ID	Statement
F-EXP-READING	Lower best-response win rate = harder to exploit. A PPO adversary trained solely against it3 reaches 53.2% — barely better than a coin flip; the same attack beats X188 72% of the time and E12 82%. (2,000 games/mode, stalls-as-losses, 0 stalls in all cells.)

The RL arc

Fact register: reinforcement learning arc facts.
Fact ID	Statement
F-RL-BASELINE	Best heuristic on the 11-bot hard pool: E12 at 55.13% mean (2,000 games/matchup, pro/pro).
F-RL-BC	BC clone of E12 (99.7% action accuracy): 55.10% greedy mean — imitation exactly recovers its teacher.
F-RL-RUN1	PPO run 1 failed by reward hacking: refusing to close and hoarding points indefinitely; 13–34% of greedy games deadlocked; stalls-as-losses mean 46.29%. Fix: 900-dart cap, stall = loss.
F-RL-PPO	PPO run 2 (stochastic): 75.79% mean over the 11-bot pool (2,000 games/matchup, 0 stalls in 22,000 games, every matchup > 50%); seed replication 76.01%. Caveats: its argmax deadlocks, and strength is partly pool overfitting.
F-RL-HELDOUT	Vs held-out X123 (never in training): PPO 64.8%, BC clone 29.5% (2,000 games) — fine-tuning worth +35pp on a structurally novel denial opponent.
F-RL-IT2	it2_det (step-penalty fine-tune): greedy mean 72.89%, min 57.65%, 0 stalls in 26,000 games (0 in 52,000 across two seeds); hoarding gone — median game vs S2 is 52 darts, was 88.
F-RL-IT3	it3 (league self-play): 15-member league mean 70.41% greedy, min 52.45% (vs its parent it2), 0 stalls; floor gain +6.45pp on the 14-member league. Anchored Elo 1215.9 on the keep-rule protocol, consistent with the ladder's 1222.0 (Δ6 Elo ≈ 0.9pp).
F-RL-SKILLGRID	The champion transfers across skill without retraining: league mean 70.41% pro / 70.28% good / 66.56% amateur, 0 stalls, beats every member at every profile (500 games/matchup).
F-RL-PARETO	A 3×256 capacity jump moved the hard band +2–4pp but paid it back on the easy band — league mean flat over 5 evals. The ceiling is a hard/easy Pareto frontier (~70.4 league / ~1216 Elo for this policy class); extra exploration was net-negative (−1pp).
F-RL-INFERENCE	Opponent inference closed by measurement: features carry identity (37–38% top-1 over 15 classes vs 6.7% chance), the policy conditions on it (12.3% of greedy actions flip), yet league mean stays flat (70.41 / 70.9 / 70.4).
F-RL-BRENVELOPE	The per-opponent best-response envelope above the champion sums to ≈+1.8pp league mean (≈+13–14 Elo) under oracle assumptions — below the +2pp/+15-Elo keep bar. Measured: BR-vs-S14 +3.95pp, BR-vs-S16 +2.5pp, BR-vs-X129 +1.6pp.
F-RL-A2C	Historical A2C ceiling (Feb 2026): 49.3% stable WR vs 11 hard bots (v19, entropy floor 0.01). Superseded by the 2026-07 arc: 49.3% → 75.79% pool mean → 70.41% deterministic league champion.

The clean-slate arc

Fact register: clean-slate discovery arc facts.
Fact ID	Statement
F-CS-WINTAP	The win-tap (6 lanes closed, lead ≥ 0: aim the last unclosed target, min-miss single when 1 away — a tie is a winning position) was found at iteration 3 (`X102_WinTap`). The main lineage never found the full mechanic in 115 entries; its absence is X188's biggest blunder class.
F-CS-FAUCET	Faucet denial (`X110_FaucetShut`: while covering at a crushing lead, shut the opponent's income lanes before claiming empty ones) jumped the champion 55.0% → 64.4% mean — +9.4pp in one step, the largest single mechanic found in the project. Signature: PS +17.9, S6 +15.7, E1 +13.9pp.
F-CS-TRAJ	Champion trajectory (home-pool mean/min at keep; pool grew 11 → 18): X109 55.0/49.6 → X123 66.0/56.6 → X129 → X147 65.7/53.9 → X159 65.1 → X168 64.9/49.3 → X181 64.40/50.44, the first 18/18 sweep (seed-7 re-bench 64.40/50.41). Home means are not comparable across pool sizes.
F-CS-XBENCH-X123	Cross-lineage milestone 1: X123 on the main repo's expanded 14-bot pool scores 63.3% mean vs X188's 60.65% baseline (+2.66pp same-seed, 14/14 beaten); head-to-head vs X188: 55.8 / 52.4 / 50.0% (amateur/good/pro).
F-CS-XBENCH-X129	Cross-lineage milestone 2: X129 scores 64.8% (+4.13pp vs X188, same seed), min 56.7%; head-to-head vs X188 57.5 / 56.2 / 55.6% — beats X188 at every skill profile.
F-CS-XBENCH-LATER	No committed fixed-reference cross-bench exists for X147/X168/X181 on the main pool; their superiority over X188 is evidenced by the unified ladder (1162/1173/1176 vs 1118) and the direct head-to-head (X181 beats X188 54.6%, 4,000 games).
F-CS-OVERFIT-ECHO	The clean lineage independently re-derived main-lineage findings (the mr≤9 dead-code result; the value-ordered lane-selection law) — basin convergence, not path artifact.
F-CS-UD	Underdog program champion: The Grinder (X214_UD) at 17.63% mean / 13.31% min underdog WR vs the X181-clone baseline's 15.31% — a 15% relative gain (seed-7: 17.65/13.22). Recalibration: brake at any deficit, farm cap 80 → 10; the farm/deny boundaries are skill-relative.
F-CS-BULLTAP	Min-miss single-aim bull taps (`X179_BullTapSingle`) worth +0.27pp with no losing matchup; composed into X181. Corrects the main lineage's double-at-bull tap doctrine.

The Opus control arm

Fact register: Opus control arm facts.
Fact ID	Statement
F-OPUS-RESULT	Second clean-room arm (10-iteration budget, original 11-bot pool) ended at `X107_PSChaseMild8`: 55.2% mean / 51.7% min, beats all 11 opponents (E12 baseline: 54.6% mean).
F-OPUS-VS-FABLE	At the same iteration count the other arm was at ~55.0% on the same pool — comparable; it then ran 100+ more iterations and found faucet denial. X107 cross-benched on the main expanded pool: 50.9% mean, 30–39% head-to-head vs X188.
F-OPUS-PLATEAU	The arm's own reflection: threshold tuning saturated (single-peaked mean-vs-multiplier curve, best 55.2% at mult 8); further gains would need a genuinely new mechanic the session did not reach.

The AlphaZero cell

Fact register: AlphaZero cell facts.
Fact ID	Statement
F-AZ-RESULT	Final artifact `run2/value_iter1.pt` with depth-3 batched-leaf expectimax: 65.6% mean over the 11-bot pool (22,000 games/config, SE ≈ 0.32pp); net-only 63.4%, depth-2 64.3%.
F-AZ-CEILING	It did not beat the search-only ceiling: shallow expectimax with a hand-rolled race-heuristic leaf scores 67.8% (depth 2, 500 games/bot). The value net re-derived, but did not surpass, an explicit darts-to-victory computation.
F-AZ-REPLAYBUG	Most consequential training decision: replay-buffer recency, worth +9pp. Run 1's buffer never evicted, so every epoch re-fit near-random iter-0 self-play; the ".55 plateau" was stale-data anchoring. An empty-buffer restart jumped the pool mean +7–12pp in one iteration.
F-AZ-ELO	No anchored Elo for the AZ artifact exists in any committed file; the cell is absent from the ladder's match data. Chapters state its 65.6% pool mean instead.

The main lineage and the classics

Fact register: main lineage and classic catalog facts.
Fact ID	Statement
F-MAIN-X188	X188 kept 2026-04-21 at entry 089 (detector-branch chase denial). Baseline on the expanded 14-bot pool: 60.65% mean / 51.8% min, sweeping 14/14. Champion through entry 115; the 2026-07-01 session produced zero keeps — the sweep saturated.
F-MAIN-BULLTAP	X188's tap doctrine aims HIT_DOUBLE at the bull, inherited from entry 003. Two independent lines proved it wrong: the clean-slate min-miss single bull tap (+0.27pp) and the tablebase blunder catalog (0.542 forfeited playing DB at a tie).
F-MAIN-DEADCODE	Entry 107 proved X188's mr≤9 endgame-gate clause is dead code; entry 115 proved its tie-chase is unreachable (> vs ≥ bit-identical).
F-MAIN-LINEAGEVALUE	Cross-bench of the Opus arm's 10-iteration champion on the expanded pool (50.9% mean, 30–39% vs X188) quantifies 89 iterations of lineage at ~10pp of mean.
F-CLASSIC-E12	E12 (`FinishOppClosed1`) was the best classic bot: #1 of the 30 pre-X strategies, seed of the main lineage (X103) and the RL arc's BC teacher. Unified ladder: 1018 (mid-table); most exploitable audited artifact (82.1%).
F-CLASSIC-SBOTS	Of the 17 Frongello S-bots: S2 (score/cover at lead 0) is the strongest at every MPR level of the sweep; S14/S10 (chase variants) top the S-family on the ladder (1037/1032 vs S2's 1014); S1 (pure closer, anchor) and S6 (949.5, extra darts) bracket the bottom.
F-MPR-SWEEP	X188 MPR sweep (31 strategies, 20,000 games/matchup, equal skill, 11 MPR levels 0.8–5.6): X188 ranks #1 at all 11 levels, average WR 59.0% → 68.0%; E12 #2 at all 11 (57.8–59.5%); best S-bot S2 (57.4–59.2%). Scope caveat: within its 31-strategy field only — the sweep predates the clean-slate and RL artifacts, which beat X188. The old site's "never loses a head-to-head" claim is falsified on the unified field.

Timeline

Fact register: project timeline facts.
Fact ID	Statement
F-TIME-SESSIONS	36 research sessions, 2026-02-05 → 2026-07-02. By month: Feb 23, Mar 4, Apr 4, Jul 5 — the project slept ~10 weeks between the X188 keep and the clean-slate/RL/tablebase burst.
F-TIME-BIRTH	Project birth 2026-02-05 (the Q-learning/A2C era); first public commit 2026-03-19.
F-TIME-AUTORESEARCH	The autoresearch loop was born 2026-04-20; X188 was kept 2026-04-21 — 89 entries in two days.
F-TIME-CLEANSLATE	2026-07-01: the clean-slate arm (114 entries), the Opus control arm, and the RL arc's BC→PPO run all started the same day. 2026-07-02: unified ladder, tablebase solve + audit, exploitability audit, underdog program — and the freeze.
F-TIME-FREEZE	Site facts frozen as of July 2, 2026. Every champion claim on the site carries this stamp.