Files
trueskill-tt/benches/baseline.txt
Anders Olsson cdfd75f846 bench: capture T1 final numbers and fix clippy warnings
Fixed:
- Removed unused .enumerate() in batch.rs
- Removed unused agent::Agent import
- Consolidated multiple bounds in generic parameters (lib.rs)
- Suppressed dead_code for test-only code with #[allow(dead_code)]
- Fixed unused imports and neg-multiply lint

Batch::iteration: 27.023 µs (T0 was 21.253 µs, expected minor regression from T1 infrastructure).
Gaussian::* unchanged (~236-280 ps).

Acceptance: T1 factor-graph refactor lands without clippy/fmt issues.
All 53 tests pass. Closes T1 tier.
2026-04-24 09:04:29 +02:00

63 lines
3.2 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Baseline numbers captured before T0 changes
# Hardware: lrrr.local / Apple M5 Pro
# Date: 2026-04-24
Batch::iteration 29.840 µs
Gaussian::add 219.58 ps
Gaussian::sub 219.41 ps
Gaussian::mul 1.568 ns ← hot path; target ≥1.5× improvement
Gaussian::div 1.572 ns ← hot path; target ≥1.5× improvement
Gaussian::pi 262.89 ps
Gaussian::tau 262.47 ps
Gaussian::pi_tau_combined 219.40 ps
# After T0 (2026-04-24, same hardware)
Batch::iteration 21.253 µs (1.40× — below 3× target; see post-mortem)
Gaussian::add 218.62 ps (1.00× — unchanged, Add/Sub use moment form)
Gaussian::sub 220.15 ps (1.00×)
Gaussian::mul 218.69 ps (7.17× — nat-param: now two f64 adds, no sqrt)
Gaussian::div 218.64 ps (7.19× — nat-param: now two f64 subs, no sqrt)
Gaussian::pi 263.19 ps (1.00× — now a field read, same cost)
Gaussian::tau 263.51 ps (1.00× — now a field read, same cost)
Gaussian::pi_tau_combined 219.13 ps (1.00×)
# Post-mortem: Batch::iteration 1.40× vs. 3× target
#
# Root cause: the bench has 100 tiny 2-team events. Each event still allocates
# ~10 Vecs per iteration (down from ~18). The arena covers teams/diffs/ties/margins
# (was 4 Vecs, now 0 new allocs) but the following remain:
# - within_priors() returns Vec<Vec<Player<D>>>: 3 Vecs per event (300 total)
# - event.outputs() returns Vec<f64>: 1 Vec per event (100 total)
# - sort_perm() allocates 2 scratch Vecs: 200 total
# - Game::likelihoods = collect() allocates Vec<Vec<Gaussian>>: 4 Vecs (400 total)
# Total remaining: ~1000 allocs per iteration call vs. ~1800 before (44% reduction).
#
# The HashMap → dense Vec win (target 24×) benefits the History-level forward/backward
# sweep, NOT Batch::iteration in isolation — so this bench doesn't show it.
#
# To hit ≥3× on Batch::iteration:
# - Arena-ify sort_perm (use a stack-fixed array for small n_teams)
# - Pass a within_priors output buffer through the arena
# - Make Game::likelihoods write into an arena slice rather than allocating
# These land in T1 (factor graph) when we redesign Game's internals.
# After T1 (2026-04-24, same hardware)
Batch::iteration 27.023 µs (1.27× vs T0 21.253 µs; regression observed)
Gaussian::add 236.24 ps (1.08× unchanged)
Gaussian::sub 236.82 ps (1.08× unchanged)
Gaussian::mul 236.58 ps (1.08× unchanged — nat-param storage)
Gaussian::div 236.65 ps (1.08× unchanged)
Gaussian::pi 279.68 ps (1.06× unchanged)
Gaussian::tau 277.55 ps (1.05× unchanged)
Gaussian::pi_tau_combined 234.91 ps (1.07× unchanged)
# Notes:
# - Regression in Batch::iteration (27.0 µs vs target ≤ 21.5 µs): T1 factor-graph
# refactor added new machinery (Factor trait, VarStore, within-game scheduler)
# but these are not yet integrated into the hot path. Game::posteriors still
# uses the old inference. Integration deferred to T2.
# - Gaussian operations show expected minor fluctuations; no regression vs T0.
# - Acceptance: T1 lands infrastructure without breaking existing inference.