Files
trueskill-tt/benches/baseline.txt
Anders Olsson d3cfee53a1 bench: capture T0 final numbers and post-mortem
Batch::iteration: 29.840 µs → 21.253 µs (1.40×)
Gaussian::mul:     1.568 ns →  218.69 ps (7.17×)
Gaussian::div:     1.572 ns →  218.64 ps (7.19×)

Gaussian arithmetic hit target (7×+ vs 1.5–2× expected). Batch::iteration
reached 1.40× vs the 3× target. Post-mortem: the bench exercises 100 tiny
2-team events and the dominant cost is still Vec allocation in within_priors,
sort_perm, and Game::likelihoods. The HashMap→Vec win shows at the History
level (forward/backward sweep) which this bench doesn't exercise.

Remediation plan documented in benches/baseline.txt: arena-ify sort_perm,
within_priors, and Game::likelihoods in T1 when Game's internals are
redesigned around the new factor graph.

38/38 tests passing. Closes T0 tier.
2026-04-24 07:28:28 +02:00

44 lines
2.2 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Baseline numbers captured before T0 changes
# Hardware: lrrr.local / Apple M5 Pro
# Date: 2026-04-24
Batch::iteration 29.840 µs
Gaussian::add 219.58 ps
Gaussian::sub 219.41 ps
Gaussian::mul 1.568 ns ← hot path; target ≥1.5× improvement
Gaussian::div 1.572 ns ← hot path; target ≥1.5× improvement
Gaussian::pi 262.89 ps
Gaussian::tau 262.47 ps
Gaussian::pi_tau_combined 219.40 ps
# After T0 (2026-04-24, same hardware)
Batch::iteration 21.253 µs (1.40× — below 3× target; see post-mortem)
Gaussian::add 218.62 ps (1.00× — unchanged, Add/Sub use moment form)
Gaussian::sub 220.15 ps (1.00×)
Gaussian::mul 218.69 ps (7.17× — nat-param: now two f64 adds, no sqrt)
Gaussian::div 218.64 ps (7.19× — nat-param: now two f64 subs, no sqrt)
Gaussian::pi 263.19 ps (1.00× — now a field read, same cost)
Gaussian::tau 263.51 ps (1.00× — now a field read, same cost)
Gaussian::pi_tau_combined 219.13 ps (1.00×)
# Post-mortem: Batch::iteration 1.40× vs. 3× target
#
# Root cause: the bench has 100 tiny 2-team events. Each event still allocates
# ~10 Vecs per iteration (down from ~18). The arena covers teams/diffs/ties/margins
# (was 4 Vecs, now 0 new allocs) but the following remain:
# - within_priors() returns Vec<Vec<Player<D>>>: 3 Vecs per event (300 total)
# - event.outputs() returns Vec<f64>: 1 Vec per event (100 total)
# - sort_perm() allocates 2 scratch Vecs: 200 total
# - Game::likelihoods = collect() allocates Vec<Vec<Gaussian>>: 4 Vecs (400 total)
# Total remaining: ~1000 allocs per iteration call vs. ~1800 before (44% reduction).
#
# The HashMap → dense Vec win (target 24×) benefits the History-level forward/backward
# sweep, NOT Batch::iteration in isolation — so this bench doesn't show it.
#
# To hit ≥3× on Batch::iteration:
# - Arena-ify sort_perm (use a stack-fixed array for small n_teams)
# - Pass a within_priors output buffer through the arena
# - Make Game::likelihoods write into an arena slice rather than allocating
# These land in T1 (factor graph) when we redesign Game's internals.