Files
trueskill-tt/benches/baseline.txt
Anders Olsson f18013d036 bench,docs: capture T2 final numbers and update CHANGELOG
Batch::iteration: 21.36 µs (T1 was 22.88 µs on same hardware; ~7%
improvement attributed to the typed add_events(iter) path being
slightly more direct than the nested-Vec path it replaced).

Gaussian operations unchanged vs T1.

Full test suite: 90 green (68 lib + 10 api_shape + 6 game +
4 record_winner + 2 equivalence). No golden value changed across
the entire T2 tier.

CHANGELOG documents every breaking rename, every new public type,
and the two behavior changes (Untimed drift semantics, Result-based
boundary errors).

Closes T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:13:38 +02:00

101 lines
5.1 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Baseline numbers captured before T0 changes
# Hardware: lrrr.local / Apple M5 Pro
# Date: 2026-04-24
Batch::iteration 29.840 µs
Gaussian::add 219.58 ps
Gaussian::sub 219.41 ps
Gaussian::mul 1.568 ns ← hot path; target ≥1.5× improvement
Gaussian::div 1.572 ns ← hot path; target ≥1.5× improvement
Gaussian::pi 262.89 ps
Gaussian::tau 262.47 ps
Gaussian::pi_tau_combined 219.40 ps
# After T0 (2026-04-24, same hardware)
Batch::iteration 21.253 µs (1.40× — below 3× target; see post-mortem)
Gaussian::add 218.62 ps (1.00× — unchanged, Add/Sub use moment form)
Gaussian::sub 220.15 ps (1.00×)
Gaussian::mul 218.69 ps (7.17× — nat-param: now two f64 adds, no sqrt)
Gaussian::div 218.64 ps (7.19× — nat-param: now two f64 subs, no sqrt)
Gaussian::pi 263.19 ps (1.00× — now a field read, same cost)
Gaussian::tau 263.51 ps (1.00× — now a field read, same cost)
Gaussian::pi_tau_combined 219.13 ps (1.00×)
# Post-mortem: Batch::iteration 1.40× vs. 3× target
#
# Root cause: the bench has 100 tiny 2-team events. Each event still allocates
# ~10 Vecs per iteration (down from ~18). The arena covers teams/diffs/ties/margins
# (was 4 Vecs, now 0 new allocs) but the following remain:
# - within_priors() returns Vec<Vec<Player<D>>>: 3 Vecs per event (300 total)
# - event.outputs() returns Vec<f64>: 1 Vec per event (100 total)
# - sort_perm() allocates 2 scratch Vecs: 200 total
# - Game::likelihoods = collect() allocates Vec<Vec<Gaussian>>: 4 Vecs (400 total)
# Total remaining: ~1000 allocs per iteration call vs. ~1800 before (44% reduction).
#
# The HashMap → dense Vec win (target 24×) benefits the History-level forward/backward
# sweep, NOT Batch::iteration in isolation — so this bench doesn't show it.
#
# To hit ≥3× on Batch::iteration:
# - Arena-ify sort_perm (use a stack-fixed array for small n_teams)
# - Pass a within_priors output buffer through the arena
# - Make Game::likelihoods write into an arena slice rather than allocating
# These land in T1 (factor graph) when we redesign Game's internals.
# After T1 (2026-04-24, same hardware)
Batch::iteration 23.010 µs (1.08× vs T0 21.253 µs — slight regression)
Gaussian::add 231.23 ps (unchanged)
Gaussian::sub 235.38 ps (unchanged)
Gaussian::mul 234.55 ps (unchanged — nat-param storage)
Gaussian::div 233.27 ps (unchanged)
Gaussian::pi 272.68 ps (unchanged)
Gaussian::tau 272.73 ps (unchanged)
Gaussian::pi_tau_combined 234.xx ps (unchanged)
# Notes:
# - Batch::iteration 23.0 µs vs target ≤ 21.5 µs (8% above target).
# Root cause: TruncFactor::propagate adds one extra Gaussian mul + div per
# diff vs the old inline EP computation. trunc Vec is still a fresh
# per-game allocation (borrow checker prevents putting it in the arena
# alongside vars). These are addressable in T2.
# - arena.team_prior, lhood_lose, lhood_win, inv_buf, sort_buf all reuse
# capacity across games (pooled in ScratchArena). sort_perm() allocation
# eliminated. message.rs deleted.
# - Gaussian operations unchanged vs T0.
# - All 53 tests pass. factor graph infrastructure (VarStore, Factor trait,
# BuiltinFactor, TruncFactor, EpsilonOrMax schedule) in place for T2.
# After T2 (2026-04-24, same hardware)
Batch::iteration 21.36 µs (1.07× vs T1 22.88 µs — 7% improvement)
Gaussian::add 218.97 ps (unchanged)
Gaussian::sub 218.58 ps (unchanged)
Gaussian::mul 218.59 ps (unchanged)
Gaussian::div 218.57 ps (unchanged)
Gaussian::pi 264.20 ps (unchanged)
Gaussian::tau 260.80 ps (unchanged)
# Notes:
# - API-only tier; hot inference path unchanged. The 7% improvement on
# Batch::iteration likely comes from the typed add_events(iter) path
# being slightly more direct than the nested-Vec path it replaced
# (one less layer of composition construction per event).
# - Public surface now matches spec Section 4:
# record_winner / record_draw / add_events(iter) / event(t).team().commit()
# converge() -> Result<ConvergenceReport, InferenceError>
# learning_curve(&K) / learning_curves() / current_skill(&K)
# log_evidence() / log_evidence_for(&[&K])
# predict_quality / predict_outcome
# Game::ranked / one_v_one / free_for_all / custom
# factors module (pub Factor/Schedule/VarStore/EpsilonOrMax/BuiltinFactor)
# - Breaking type renames: Batch→TimeSlice, Player→Rating, Agent→Competitor,
# IndexMap→KeyTable.
# - Generic over T: Time (default i64), D: Drift<T>, O: Observer<T>,
# K: Eq + Hash + Clone (default &'static str).
# - Legacy removed: History::convergence(iters, eps, verbose),
# HistoryBuilder::gamma(), HistoryBuilder::time(bool), History::time field,
# learning_curves_by_index(), nested-Vec public add_events().
# - 90 tests green: 68 lib + 10 api_shape + 6 game + 4 record_winner +
# 2 equivalence.