Files

Anders Olsson d2aab82c1e T0 + T1 + T2: engine redesign through new API surface (#1 )

Implements tiers T0, T1, T2 of `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`. All three tiers have landed together on this branch because they build on one another; this PR rolls them up for a single review pass.

Per-tier plans:
- T0: `docs/superpowers/plans/2026-04-23-t0-numerical-parity.md`
- T1: `docs/superpowers/plans/2026-04-24-t1-factor-graph.md`
- T2: `docs/superpowers/plans/2026-04-24-t2-new-api-surface.md`

## Summary

### T0 — Numerical parity (internal)

- `Gaussian` switched to natural-parameter storage `(pi, tau)`; mul/div now ~7× faster (218 ps vs 1.57 ns).
- `HashMap<Index, _>` → dense `Vec<_>` keyed by `Index.0` (via `AgentStore<D>`, `SkillStore`).
- `ScratchArena` eliminates per-event allocations in `Game::likelihoods`.
- `InferenceError` seed type added (1 variant).
- 38 → 53 tests passing through T1.
- Benchmark: `Batch::iteration` 29.84 → 21.25 µs.

### T1 — Factor graph machinery (internal)

- `Factor` trait + `BuiltinFactor` enum (TeamSum / RankDiff / Trunc) driving within-game inference.
- `VarStore` flat storage for variable marginals.
- `Schedule` trait + `EpsilonOrMax` impl replacing the hand-rolled EP loop.
- `Game::likelihoods` rebuilt on the factor-graph machinery; iteration counts and goldens preserved to within 1e-6.
- 53 tests passing.
- Benchmark: `Batch::iteration` 23.01 µs (slight regression absorbed in T2).

### T2 — New API surface (breaking)

**Renames:**
- `IndexMap → KeyTable`, `Player → Rating`, `Agent → Competitor`, `Batch → TimeSlice`

**New types:**
- `Time` trait with `Untimed` ZST and `i64` impls; `Drift<T>`, `Rating<T, D>`, `Competitor<T, D>`, `TimeSlice<T>`, `History<T, D, O, K>` all generic.
- `Event<T, K>`, `Team<K>`, `Member<K>`, `Outcome` (`Ranked` variant; `#[non_exhaustive]`).
- `Observer<T>` trait + `NullObserver`.
- `ConvergenceOptions`, `ConvergenceReport`.
- `GameOptions`, `OwnedGame<T, D>`.

**Three-tier ingestion:**
- `history.record_winner(&K, &K, T)` / `record_draw(&K, &K, T)` — 1v1 convenience.
- `history.add_events(iter)` — typed bulk.
- `history.event(T).team([...]).weights([...]).ranking([...]).commit()` — fluent.

**Query API:** `current_skill`, `learning_curve`, `learning_curves` (keyed on `K`), `log_evidence`, `log_evidence_for`, `predict_quality`, `predict_outcome`.

**Game constructors:** `ranked`, `one_v_one`, `free_for_all`, `custom` — all returning `Result<_, InferenceError>`.

**`factors` module:** `Factor`, `Schedule`, `VarStore`, `VarId`, `BuiltinFactor`, `EpsilonOrMax`, `ScheduleReport`, `TeamSumFactor`, `RankDiffFactor`, `TruncFactor` now public.

**Errors:** `InferenceError` gains `MismatchedShape`, `InvalidProbability`, `ConvergenceFailed`; boundary panics converted to `Result`.

**Removed (breaking):** `History::convergence(iters, eps, verbose)`, `HistoryBuilder::gamma(f64)`, `HistoryBuilder::time(bool)`, `History.time: bool`, `learning_curves_by_index`, nested-Vec public `add_events`.

## Behavior change (documented in CHANGELOG)

`Time = Untimed` has `elapsed_to → 0`, so no drift accumulates between slices. The old `time=false` mode implicitly forced `elapsed=1` on reappearance via an `i64::MAX` sentinel — that quirk is not reproducible under a typed time axis. Tests that depended on it now use `History::<i64, _>` with explicit `1..=n` timestamps. One test (`test_env_ttt`) had 3 Gaussian goldens updated to reflect the corrected semantics; documented in commit `33a7d90`.

## Final numbers

| Metric | Before T0 | After T2 | Delta |
|---|---|---|---|
| `Batch::iteration` | 29.84 µs | 21.36 µs | **-28%** |
| `Gaussian::mul` | 1.57 ns | 219 ps | **-86%** |
| `Gaussian::div` | 1.57 ns | 219 ps | **-86%** |
| Tests passing | 38 | 90 | +52 |

All other Gaussian ops unchanged (~219 ps add/sub, ~264 ps pi/tau reads).

## Test plan

- [x] `cargo test --features approx` — 90/90 pass (68 lib + 10 api_shape + 6 game + 4 record_winner + 2 equivalence)
- [x] `cargo clippy --all-targets --features approx -- -D warnings` — clean
- [x] `cargo +nightly fmt --check` — clean
- [x] `cargo bench --bench batch` — 21.36 µs
- [x] `cargo bench --bench gaussian` — unchanged from T1
- [x] `cargo run --example atp --features approx` — rewritten in new API, runs clean
- [x] Historical Game-level goldens preserved in `tests/equivalence.rs`
- [x] Public API matches spec Section 4 (verified by integration tests in `tests/api_shape.rs`)

## Commit history

~45 commits total across T0 + T1 + T2. Each task is self-contained and individually tested; the branch is bisectable. See `git log main..t2-new-api-surface` for the full list.

## Deferred to later tiers

- `Outcome::Scored` + `MarginFactor` — T4
- `Damped` / `Residual` schedules — T4
- `Send + Sync` bounds + Rayon parallelism — T3
- N-team `predict_outcome` — T4
- `Game::custom` full ergonomics — T4

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #1
Co-authored-by: Anders Olsson <anders.e.olsson@gmail.com>
Co-committed-by: Anders Olsson <anders.e.olsson@gmail.com>

2026-04-24 11:20:04 +00:00

6.2 KiB

Raw Blame History

Changelog

All notable changes to this project will be documented in this file.

Unreleased — T2 new API surface

Breaking: every renamed type and the new public API land together per docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md Section 7 "T2".

Breaking renames

Batch → TimeSlice
Player → Rating (and the .player field on Competitor is now .rating)
Agent → Competitor
IndexMap → KeyTable
History field .batches → .time_slices

New types

Time trait with Untimed ZST and i64 impls (generic time axis).
Drift<T: Time> — generified from the old Drift trait.
Event<T, K>, Team<K>, Member<K> — typed bulk-ingest event shape.
Outcome (#[non_exhaustive]) — Ranked(SmallVec<[u32; 4]>) with convenience constructors winner, draw, ranking. Scored lands in T4.
Observer<T: Time> trait + NullObserver ZST — structured progress callbacks.
ConvergenceOptions, ConvergenceReport — configuration and post-hoc summary.
GameOptions, OwnedGame<T, D> — ergonomic Game constructors without lifetime gymnastics.
factors module — re-exports Factor, BuiltinFactor, VarId, VarStore, Schedule, EpsilonOrMax, ScheduleReport, and the three built-in factor types (TeamSumFactor, RankDiffFactor, TruncFactor) as public API.

New `History` API

Three-tier ingestion:
- Tier 1 (bulk): add_events<I: IntoIterator<Item = Event<T, K>>>(events) -> Result
- Tier 2 (one-off): record_winner(&K, &K, T), record_draw(&K, &K, T)
- Tier 3 (fluent): event(T).team([...]).weights([...]).ranking([...]).commit()
converge() -> Result<ConvergenceReport, InferenceError> — replaces convergence(iters, eps, verbose).
current_skill(&K), learning_curve(&K), learning_curves() (now keyed on K).
log_evidence() zero-arg, log_evidence_for(&[&K]).
predict_quality(&[&[&K]]), predict_outcome(&[&[&K]]) (2-team only in T2; N-team deferred to T4).
intern(&Q) / lookup(&Q) expose the internal KeyTable<K> for power users.
History<T, D, O, K> is now fully generic with defaults <i64, ConstantDrift, NullObserver, &'static str>.

New `Game` API

Game::ranked(&[&[Rating]], Outcome, &GameOptions) -> Result<OwnedGame, _>.
Game::one_v_one(&Rating, &Rating, Outcome) -> Result<(Gaussian, Gaussian), _>.
Game::free_for_all(&[&Rating], Outcome, &GameOptions) -> Result<OwnedGame, _>.
Game::custom(...) minimal escape hatch for user-defined factor graphs (#[doc(hidden)] — full ergonomics in T4).
Game::log_evidence() and OwnedGame::log_evidence() accessors.

Errors

InferenceError now carries MismatchedShape { kind, expected, got }, InvalidProbability { value }, ConvergenceFailed { last_step, iterations }, and NegativePrecision { pi }. Shape and bounds validation at the API boundary now returns Err rather than panicking.

Removed (breaking)

History::convergence(iters, eps, verbose) — use converge().
HistoryBuilder::gamma(f64) — use .drift(ConstantDrift(g)).
HistoryBuilder::time(bool) and History.time: bool — use the Time type parameter.
The nested-Vec<Vec<Vec<_>>> public add_events signature — use typed add_events(iter).
learning_curves_by_index() — use learning_curves().

Performance

Batch::iteration bench: 21.36 µs (T1 was 22.88 µs on the same hardware, a ~7% improvement from the typed-path being slightly more direct). Gaussian operations unchanged.

Notes

Time = Untimed returns elapsed_to → 0 — behavior change from the old time=false mode, which implicitly generated elapsed=1 per event via an i64::MAX sentinel in Agent.last_time. Tests that relied on the old time=false semantics now use History::<i64, _> with explicit 1..=n timestamps.

0.1.0 - 2026-04-23

Features

feat: added a Drift trait and a "default" ConstantDrift implementation

Miscellaneous Tasks

chore: added cliff.toml, release.toml and rustfmt.toml
chore: clean up

Other (unconventional)

Initial commit.
Begin working on batch.
Passing tests for Batch
Working on History struct. First test is passing.
More test passing for History
Added more functions to History
Remove Display impl, better to use Debug
Use flatten instead of flat_map
Handle case where there is no time
It works, or so it seems
Use PlayerIndex instead of String
Inline a lot of functions
Refactor some code
Refactor some stuff
Port from julia version instead
More things, better things, awesome
More tests, more code
More things, more tests
Fix tests
More tests
More tests
Added builder for History, and start migrating test to use builder instead.
Update test to use builder
Remove unused code
Use and Index struct instead of str and String for player id
Update example so now it works, and thats, well, good
Update test to use assert_ulps_eq
Fixed test
Change time to use i64 instead of u64
Small change
Clean up example
Update crates and added methods to get a key or all keys in an IndexMap
Added a get function to IndexMap
Agents doens't have to be behind a mutable reference in within_prior
Agents doens't have to be behind a mutable reference in within_priors
Refactor so we can see if there is any way to improve the performance
Fix clippy warning
More refactoring
Remove warnings and refactor some code
Added benchmark for Batch
Added default implementation for TeamMessage
Remove unused mut reference
Make it more rusty
More rustifying
Small refactor
Rename d to diff, and t to team
Added more links to readme
Fix broken link in README
Update crates
Clean up
Dry my eyes
Remove unnecessary allocations
Fix clippy warning
Refactor history
Rename variables
Move stuff around
Added quality function
Make quality a free standing function instead
Improve performance
Change assert to debug_assert
Added todo to readme, and documentation for quality function
Basic test for quality
Ignore temp folder
Update edition
Small changes for new 2024 edition
remove notepad
added benchmark

Styling

style: cargo fmt

6.2 KiB Raw Blame History