Implements tiers T0, T1, T2 of `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`. All three tiers have landed together on this branch because they build on one another; this PR rolls them up for a single review pass. Per-tier plans: - T0: `docs/superpowers/plans/2026-04-23-t0-numerical-parity.md` - T1: `docs/superpowers/plans/2026-04-24-t1-factor-graph.md` - T2: `docs/superpowers/plans/2026-04-24-t2-new-api-surface.md` ## Summary ### T0 — Numerical parity (internal) - `Gaussian` switched to natural-parameter storage `(pi, tau)`; mul/div now ~7× faster (218 ps vs 1.57 ns). - `HashMap<Index, _>` → dense `Vec<_>` keyed by `Index.0` (via `AgentStore<D>`, `SkillStore`). - `ScratchArena` eliminates per-event allocations in `Game::likelihoods`. - `InferenceError` seed type added (1 variant). - 38 → 53 tests passing through T1. - Benchmark: `Batch::iteration` 29.84 → 21.25 µs. ### T1 — Factor graph machinery (internal) - `Factor` trait + `BuiltinFactor` enum (TeamSum / RankDiff / Trunc) driving within-game inference. - `VarStore` flat storage for variable marginals. - `Schedule` trait + `EpsilonOrMax` impl replacing the hand-rolled EP loop. - `Game::likelihoods` rebuilt on the factor-graph machinery; iteration counts and goldens preserved to within 1e-6. - 53 tests passing. - Benchmark: `Batch::iteration` 23.01 µs (slight regression absorbed in T2). ### T2 — New API surface (breaking) **Renames:** - `IndexMap → KeyTable`, `Player → Rating`, `Agent → Competitor`, `Batch → TimeSlice` **New types:** - `Time` trait with `Untimed` ZST and `i64` impls; `Drift<T>`, `Rating<T, D>`, `Competitor<T, D>`, `TimeSlice<T>`, `History<T, D, O, K>` all generic. - `Event<T, K>`, `Team<K>`, `Member<K>`, `Outcome` (`Ranked` variant; `#[non_exhaustive]`). - `Observer<T>` trait + `NullObserver`. - `ConvergenceOptions`, `ConvergenceReport`. - `GameOptions`, `OwnedGame<T, D>`. **Three-tier ingestion:** - `history.record_winner(&K, &K, T)` / `record_draw(&K, &K, T)` — 1v1 convenience. - `history.add_events(iter)` — typed bulk. - `history.event(T).team([...]).weights([...]).ranking([...]).commit()` — fluent. **Query API:** `current_skill`, `learning_curve`, `learning_curves` (keyed on `K`), `log_evidence`, `log_evidence_for`, `predict_quality`, `predict_outcome`. **Game constructors:** `ranked`, `one_v_one`, `free_for_all`, `custom` — all returning `Result<_, InferenceError>`. **`factors` module:** `Factor`, `Schedule`, `VarStore`, `VarId`, `BuiltinFactor`, `EpsilonOrMax`, `ScheduleReport`, `TeamSumFactor`, `RankDiffFactor`, `TruncFactor` now public. **Errors:** `InferenceError` gains `MismatchedShape`, `InvalidProbability`, `ConvergenceFailed`; boundary panics converted to `Result`. **Removed (breaking):** `History::convergence(iters, eps, verbose)`, `HistoryBuilder::gamma(f64)`, `HistoryBuilder::time(bool)`, `History.time: bool`, `learning_curves_by_index`, nested-Vec public `add_events`. ## Behavior change (documented in CHANGELOG) `Time = Untimed` has `elapsed_to → 0`, so no drift accumulates between slices. The old `time=false` mode implicitly forced `elapsed=1` on reappearance via an `i64::MAX` sentinel — that quirk is not reproducible under a typed time axis. Tests that depended on it now use `History::<i64, _>` with explicit `1..=n` timestamps. One test (`test_env_ttt`) had 3 Gaussian goldens updated to reflect the corrected semantics; documented in commit `33a7d90`. ## Final numbers | Metric | Before T0 | After T2 | Delta | |---|---|---|---| | `Batch::iteration` | 29.84 µs | 21.36 µs | **-28%** | | `Gaussian::mul` | 1.57 ns | 219 ps | **-86%** | | `Gaussian::div` | 1.57 ns | 219 ps | **-86%** | | Tests passing | 38 | 90 | +52 | All other Gaussian ops unchanged (~219 ps add/sub, ~264 ps pi/tau reads). ## Test plan - [x] `cargo test --features approx` — 90/90 pass (68 lib + 10 api_shape + 6 game + 4 record_winner + 2 equivalence) - [x] `cargo clippy --all-targets --features approx -- -D warnings` — clean - [x] `cargo +nightly fmt --check` — clean - [x] `cargo bench --bench batch` — 21.36 µs - [x] `cargo bench --bench gaussian` — unchanged from T1 - [x] `cargo run --example atp --features approx` — rewritten in new API, runs clean - [x] Historical Game-level goldens preserved in `tests/equivalence.rs` - [x] Public API matches spec Section 4 (verified by integration tests in `tests/api_shape.rs`) ## Commit history ~45 commits total across T0 + T1 + T2. Each task is self-contained and individually tested; the branch is bisectable. See `git log main..t2-new-api-surface` for the full list. ## Deferred to later tiers - `Outcome::Scored` + `MarginFactor` — T4 - `Damped` / `Residual` schedules — T4 - `Send + Sync` bounds + Rayon parallelism — T3 - N-team `predict_outcome` — T4 - `Game::custom` full ergonomics — T4 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #1 Co-authored-by: Anders Olsson <anders.e.olsson@gmail.com> Co-committed-by: Anders Olsson <anders.e.olsson@gmail.com>
6.2 KiB
6.2 KiB
Changelog
All notable changes to this project will be documented in this file.
Unreleased — T2 new API surface
Breaking: every renamed type and the new public API land together per
docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md
Section 7 "T2".
Breaking renames
Batch→TimeSlicePlayer→Rating(and the.playerfield onCompetitoris now.rating)Agent→CompetitorIndexMap→KeyTableHistoryfield.batches→.time_slices
New types
Timetrait withUntimedZST andi64impls (generic time axis).Drift<T: Time>— generified from the oldDrifttrait.Event<T, K>,Team<K>,Member<K>— typed bulk-ingest event shape.Outcome(#[non_exhaustive]) —Ranked(SmallVec<[u32; 4]>)with convenience constructorswinner,draw,ranking.Scoredlands in T4.Observer<T: Time>trait +NullObserverZST — structured progress callbacks.ConvergenceOptions,ConvergenceReport— configuration and post-hoc summary.GameOptions,OwnedGame<T, D>— ergonomic Game constructors without lifetime gymnastics.factorsmodule — re-exportsFactor,BuiltinFactor,VarId,VarStore,Schedule,EpsilonOrMax,ScheduleReport, and the three built-in factor types (TeamSumFactor,RankDiffFactor,TruncFactor) as public API.
New History API
- Three-tier ingestion:
- Tier 1 (bulk):
add_events<I: IntoIterator<Item = Event<T, K>>>(events) -> Result - Tier 2 (one-off):
record_winner(&K, &K, T),record_draw(&K, &K, T) - Tier 3 (fluent):
event(T).team([...]).weights([...]).ranking([...]).commit()
- Tier 1 (bulk):
converge() -> Result<ConvergenceReport, InferenceError>— replacesconvergence(iters, eps, verbose).current_skill(&K),learning_curve(&K),learning_curves()(now keyed onK).log_evidence()zero-arg,log_evidence_for(&[&K]).predict_quality(&[&[&K]]),predict_outcome(&[&[&K]])(2-team only in T2; N-team deferred to T4).intern(&Q)/lookup(&Q)expose the internalKeyTable<K>for power users.History<T, D, O, K>is now fully generic with defaults<i64, ConstantDrift, NullObserver, &'static str>.
New Game API
Game::ranked(&[&[Rating]], Outcome, &GameOptions) -> Result<OwnedGame, _>.Game::one_v_one(&Rating, &Rating, Outcome) -> Result<(Gaussian, Gaussian), _>.Game::free_for_all(&[&Rating], Outcome, &GameOptions) -> Result<OwnedGame, _>.Game::custom(...)minimal escape hatch for user-defined factor graphs (#[doc(hidden)]— full ergonomics in T4).Game::log_evidence()andOwnedGame::log_evidence()accessors.
Errors
InferenceErrornow carriesMismatchedShape { kind, expected, got },InvalidProbability { value },ConvergenceFailed { last_step, iterations }, andNegativePrecision { pi }. Shape and bounds validation at the API boundary now returnsErrrather than panicking.
Removed (breaking)
History::convergence(iters, eps, verbose)— useconverge().HistoryBuilder::gamma(f64)— use.drift(ConstantDrift(g)).HistoryBuilder::time(bool)andHistory.time: bool— use theTimetype parameter.- The nested-
Vec<Vec<Vec<_>>>publicadd_eventssignature — use typedadd_events(iter). learning_curves_by_index()— uselearning_curves().
Performance
Batch::iteration bench: 21.36 µs (T1 was 22.88 µs on the same hardware, a
~7% improvement from the typed-path being slightly more direct). Gaussian
operations unchanged.
Notes
Time = Untimedreturnselapsed_to → 0— behavior change from the oldtime=falsemode, which implicitly generatedelapsed=1per event via ani64::MAXsentinel inAgent.last_time. Tests that relied on the oldtime=falsesemantics now useHistory::<i64, _>with explicit1..=ntimestamps.
0.1.0 - 2026-04-23
Features
- feat: added a Drift trait and a "default" ConstantDrift implementation
Miscellaneous Tasks
- chore: added cliff.toml, release.toml and rustfmt.toml
- chore: clean up
Other (unconventional)
- Initial commit.
- Begin working on batch.
- Passing tests for Batch
- Working on History struct. First test is passing.
- More test passing for History
- Added more functions to History
- Remove Display impl, better to use Debug
- Use flatten instead of flat_map
- Handle case where there is no time
- It works, or so it seems
- Use PlayerIndex instead of String
- Inline a lot of functions
- Refactor some code
- Refactor some stuff
- Port from julia version instead
- More things, better things, awesome
- More tests, more code
- More things, more tests
- Fix tests
- More tests
- More tests
- Added builder for History, and start migrating test to use builder instead.
- Update test to use builder
- Remove unused code
- Use and Index struct instead of str and String for player id
- Update example so now it works, and thats, well, good
- Update test to use assert_ulps_eq
- Fixed test
- Change time to use i64 instead of u64
- Small change
- Clean up example
- Update crates and added methods to get a key or all keys in an IndexMap
- Added a get function to IndexMap
- Agents doens't have to be behind a mutable reference in within_prior
- Agents doens't have to be behind a mutable reference in within_priors
- Refactor so we can see if there is any way to improve the performance
- Fix clippy warning
- More refactoring
- Remove warnings and refactor some code
- Added benchmark for Batch
- Added default implementation for TeamMessage
- Remove unused mut reference
- Make it more rusty
- More rustifying
- Small refactor
- Rename d to diff, and t to team
- Added more links to readme
- Fix broken link in README
- Update crates
- Clean up
- Dry my eyes
- Remove unnecessary allocations
- Fix clippy warning
- Refactor history
- Rename variables
- Move stuff around
- Added quality function
- Make quality a free standing function instead
- Improve performance
- Change assert to debug_assert
- Added todo to readme, and documentation for quality function
- Basic test for quality
- Ignore temp folder
- Update edition
- Small changes for new 2024 edition
- remove notepad
- added benchmark
Styling
- style: cargo fmt