trueskill-tt

Author	SHA1	Message	Date
Anders Olsson	db633bdafe	bench,docs: capture T3 final numbers and update CHANGELOG Batch::iteration sequential: 23.23 µs (no regression vs T2 baseline). Gaussian ops unchanged. End-to-end history_converge benchmark on Apple M5 Pro: Workload seq rayon speedup 500 events / 100 competitors / 10 per slice 4.03 ms 4.24 ms 1.0x 2000 events / 200 competitors / 20 per slice 20.18 ms 19.82 ms 1.0x 5000 events / 50000 competitors / 1 slice 11.88 ms 9.10 ms 1.3x The spec's >=2x target is not achieved on realistic workloads. T3's within-slice color-group parallelism only shows material benefit when a slice holds many events AND the competitor pool is large enough to give the greedy coloring room to partition. Typical TrueSkill workloads don't fit that profile. Cross-slice parallelism (dirty-bit slice skipping, spec Section 5) is the natural next step for real-workload speedup. Determinism verified: bit-identical posteriors across RAYON_NUM_THREADS={1, 2, 4, 8}. Closes T3 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 14:58:24 +02:00
Anders Olsson	f0d6211387	perf(game): revert Task 10 SmallVec changes — caused sequential regression The Vec<Vec<_>> → SmallVec<[SmallVec<[_;8]>;8]> change in Task 10 regressed Batch::iteration from 23.29 µs to 29.73 µs (+28%). The SmallVec was motivated by reducing parallel-path allocations but it hurt the sequential path substantially. Reverting game.rs + time_slice.rs + history.rs storage back to the T2 Vec<Vec<_>> shape. The parallel rayon path (unsafe direct-write + thread_local ScratchArena + RAYON_THRESHOLD=64 fallback) stays — it is independent of Game's internal storage. Benchmarks after revert: Batch::iteration (seq, no rayon): 23.23 µs (restored ≈T2) Batch::iteration (rayon): 24.57 µs history_converge/500x100@10: 4.03 ms seq, 4.24 ms rayon — 1.0× history_converge/2000x200@20: 20.18 ms seq, 19.82 ms rayon — 1.0× history_converge/1v1-5000x50000@5000: 11.88 ms seq, 9.10 ms rayon — 1.3× Part of T3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 14:55:37 +02:00
Anders Olsson	be515c3d8d	bench(history): end-to-end History::converge benchmark + rayon perf fix Adds benches/history_converge.rs with three workloads: - 500 events / 100 competitors / 10 events per slice - 2000 events / 200 competitors / 20 events per slice - 5000 events / 50000 competitors / 5000 events per slice (gate workload) Investigation found the original rayon path used a compute/apply split with EventOutput heap allocation per event, causing 3-23x regression. Root cause: per-event allocations caused heavy allocator contention across rayon threads. Fixes: - Replace EventOutput/two-phase approach with direct unsafe parallel write. Events in a color group have disjoint agent index sets; concurrent writes to SkillStore land on different Vec slots — no data race. - Add RAYON_THRESHOLD=64: color groups below this size fall back to sequential to avoid rayon overhead on small slices. - Game internals: switch likelihoods/teams to SmallVec<[_;8]> to avoid heap allocation for ≤8-team / ≤8-player-per-team games. Add type aliases Teams<T,D> and Likelihoods to satisfy clippy::type_complexity. - within_priors() and outputs() now return SmallVec; callers updated to use ranked_with_arena_sv() directly (avoiding Vec→SmallVec conversion). Sequential baseline (Apple M5 Pro, 2026-04-24): 500x100@10perslice: 4.72 ms 2000x200@20perslice: 23.17 ms 1v1-5000x50000@5000perslice: 13.89 ms With --features rayon (RAYON_NUM_THREADS=5, P-cores on M5 Pro): 500x100@10perslice: 4.82 ms (1.0× — below threshold) 2000x200@20perslice: 23.09 ms (1.0× — below threshold) 1v1-5000x50000@5000perslice: 6.97 ms (2.0× speedup — GATE ACHIEVED) T3 acceptance gate: >=2× speedup on at least one workload — ACHIEVED. 74 tests pass under both feature configs. Part of T3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 14:47:29 +02:00
Anders Olsson	d2aab82c1e	T0 + T1 + T2: engine redesign through new API surface (#1 ) Implements tiers T0, T1, T2 of `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`. All three tiers have landed together on this branch because they build on one another; this PR rolls them up for a single review pass. Per-tier plans: - T0: `docs/superpowers/plans/2026-04-23-t0-numerical-parity.md` - T1: `docs/superpowers/plans/2026-04-24-t1-factor-graph.md` - T2: `docs/superpowers/plans/2026-04-24-t2-new-api-surface.md` ## Summary ### T0 — Numerical parity (internal) - `Gaussian` switched to natural-parameter storage `(pi, tau)`; mul/div now ~7× faster (218 ps vs 1.57 ns). - `HashMap<Index, _>` → dense `Vec<_>` keyed by `Index.0` (via `AgentStore<D>`, `SkillStore`). - `ScratchArena` eliminates per-event allocations in `Game::likelihoods`. - `InferenceError` seed type added (1 variant). - 38 → 53 tests passing through T1. - Benchmark: `Batch::iteration` 29.84 → 21.25 µs. ### T1 — Factor graph machinery (internal) - `Factor` trait + `BuiltinFactor` enum (TeamSum / RankDiff / Trunc) driving within-game inference. - `VarStore` flat storage for variable marginals. - `Schedule` trait + `EpsilonOrMax` impl replacing the hand-rolled EP loop. - `Game::likelihoods` rebuilt on the factor-graph machinery; iteration counts and goldens preserved to within 1e-6. - 53 tests passing. - Benchmark: `Batch::iteration` 23.01 µs (slight regression absorbed in T2). ### T2 — New API surface (breaking) Renames: - `IndexMap → KeyTable`, `Player → Rating`, `Agent → Competitor`, `Batch → TimeSlice` New types: - `Time` trait with `Untimed` ZST and `i64` impls; `Drift<T>`, `Rating<T, D>`, `Competitor<T, D>`, `TimeSlice<T>`, `History<T, D, O, K>` all generic. - `Event<T, K>`, `Team<K>`, `Member<K>`, `Outcome` (`Ranked` variant; `#[non_exhaustive]`). - `Observer<T>` trait + `NullObserver`. - `ConvergenceOptions`, `ConvergenceReport`. - `GameOptions`, `OwnedGame<T, D>`. Three-tier ingestion: - `history.record_winner(&K, &K, T)` / `record_draw(&K, &K, T)` — 1v1 convenience. - `history.add_events(iter)` — typed bulk. - `history.event(T).team([...]).weights([...]).ranking([...]).commit()` — fluent. Query API: `current_skill`, `learning_curve`, `learning_curves` (keyed on `K`), `log_evidence`, `log_evidence_for`, `predict_quality`, `predict_outcome`. Game constructors: `ranked`, `one_v_one`, `free_for_all`, `custom` — all returning `Result<_, InferenceError>`. `factors` module: `Factor`, `Schedule`, `VarStore`, `VarId`, `BuiltinFactor`, `EpsilonOrMax`, `ScheduleReport`, `TeamSumFactor`, `RankDiffFactor`, `TruncFactor` now public. Errors: `InferenceError` gains `MismatchedShape`, `InvalidProbability`, `ConvergenceFailed`; boundary panics converted to `Result`. Removed (breaking): `History::convergence(iters, eps, verbose)`, `HistoryBuilder::gamma(f64)`, `HistoryBuilder::time(bool)`, `History.time: bool`, `learning_curves_by_index`, nested-Vec public `add_events`. ## Behavior change (documented in CHANGELOG) `Time = Untimed` has `elapsed_to → 0`, so no drift accumulates between slices. The old `time=false` mode implicitly forced `elapsed=1` on reappearance via an `i64::MAX` sentinel — that quirk is not reproducible under a typed time axis. Tests that depended on it now use `History::<i64, _>` with explicit `1..=n` timestamps. One test (`test_env_ttt`) had 3 Gaussian goldens updated to reflect the corrected semantics; documented in commit `33a7d90`. ## Final numbers \| Metric \| Before T0 \| After T2 \| Delta \| \|---\|---\|---\|---\| \| `Batch::iteration` \| 29.84 µs \| 21.36 µs \| -28% \| \| `Gaussian::mul` \| 1.57 ns \| 219 ps \| -86% \| \| `Gaussian::div` \| 1.57 ns \| 219 ps \| -86% \| \| Tests passing \| 38 \| 90 \| +52 \| All other Gaussian ops unchanged (~219 ps add/sub, ~264 ps pi/tau reads). ## Test plan - [x] `cargo test --features approx` — 90/90 pass (68 lib + 10 api_shape + 6 game + 4 record_winner + 2 equivalence) - [x] `cargo clippy --all-targets --features approx -- -D warnings` — clean - [x] `cargo +nightly fmt --check` — clean - [x] `cargo bench --bench batch` — 21.36 µs - [x] `cargo bench --bench gaussian` — unchanged from T1 - [x] `cargo run --example atp --features approx` — rewritten in new API, runs clean - [x] Historical Game-level goldens preserved in `tests/equivalence.rs` - [x] Public API matches spec Section 4 (verified by integration tests in `tests/api_shape.rs`) ## Commit history ~45 commits total across T0 + T1 + T2. Each task is self-contained and individually tested; the branch is bisectable. See `git log main..t2-new-api-surface` for the full list. ## Deferred to later tiers - `Outcome::Scored` + `MarginFactor` — T4 - `Damped` / `Residual` schedules — T4 - `Send + Sync` bounds + Rayon parallelism — T3 - N-team `predict_outcome` — T4 - `Game::custom` full ergonomics — T4 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #1 Co-authored-by: Anders Olsson <anders.e.olsson@gmail.com> Co-committed-by: Anders Olsson <anders.e.olsson@gmail.com>	2026-04-24 11:20:04 +00:00
Anders Olsson	04d5478ee4	style: cargo fmt	2026-04-23 20:23:13 +02:00
Anders Olsson	dc47964310	added benchmark	2026-03-23 14:55:18 +01:00
Anders Olsson	a1f282a1c8	feat: added a Drift trait and a "default" ConstantDrift implementation	2026-03-16 12:06:04 +01:00
Anders Olsson	d152e356f1	Remove unnecessary allocations	2023-10-24 16:10:40 +02:00
Anders Olsson	e3906aebaa	Small refactor	2022-12-27 22:37:12 +01:00
Anders Olsson	2b83ee5ef9	Added benchmark for Batch	2022-12-19 07:42:08 +01:00

10 Commits