T0 + T1 + T2: engine redesign through new API surface #1

2026-04-24T11:16:39Z

logaritmisk commented

2026-04-24 11:16:39 +00:00

Implements tiers T0, T1, T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md. All three tiers have landed together on this branch because they build on one another; this PR rolls them up for a single review pass.

Per-tier plans:

T0: docs/superpowers/plans/2026-04-23-t0-numerical-parity.md
T1: docs/superpowers/plans/2026-04-24-t1-factor-graph.md
T2: docs/superpowers/plans/2026-04-24-t2-new-api-surface.md

Summary

T0 — Numerical parity (internal)

Gaussian switched to natural-parameter storage (pi, tau); mul/div now ~7× faster (218 ps vs 1.57 ns).
HashMap<Index, _> → dense Vec<_> keyed by Index.0 (via AgentStore<D>, SkillStore).
ScratchArena eliminates per-event allocations in Game::likelihoods.
InferenceError seed type added (1 variant).
38 → 53 tests passing through T1.
Benchmark: Batch::iteration 29.84 → 21.25 µs.

T1 — Factor graph machinery (internal)

Factor trait + BuiltinFactor enum (TeamSum / RankDiff / Trunc) driving within-game inference.
VarStore flat storage for variable marginals.
Schedule trait + EpsilonOrMax impl replacing the hand-rolled EP loop.
Game::likelihoods rebuilt on the factor-graph machinery; iteration counts and goldens preserved to within 1e-6.
53 tests passing.
Benchmark: Batch::iteration 23.01 µs (slight regression absorbed in T2).

T2 — New API surface (breaking)

Renames:

IndexMap → KeyTable, Player → Rating, Agent → Competitor, Batch → TimeSlice

New types:

Time trait with Untimed ZST and i64 impls; Drift<T>, Rating<T, D>, Competitor<T, D>, TimeSlice<T>, History<T, D, O, K> all generic.
Event<T, K>, Team<K>, Member<K>, Outcome (Ranked variant; #[non_exhaustive]).
Observer<T> trait + NullObserver.
ConvergenceOptions, ConvergenceReport.
GameOptions, OwnedGame<T, D>.

Three-tier ingestion:

history.record_winner(&K, &K, T) / record_draw(&K, &K, T) — 1v1 convenience.
history.add_events(iter) — typed bulk.
history.event(T).team([...]).weights([...]).ranking([...]).commit() — fluent.

Query API: current_skill, learning_curve, learning_curves (keyed on K), log_evidence, log_evidence_for, predict_quality, predict_outcome.

Game constructors: ranked, one_v_one, free_for_all, custom — all returning Result<_, InferenceError>.

factors module: Factor, Schedule, VarStore, VarId, BuiltinFactor, EpsilonOrMax, ScheduleReport, TeamSumFactor, RankDiffFactor, TruncFactor now public.

Errors: InferenceError gains MismatchedShape, InvalidProbability, ConvergenceFailed; boundary panics converted to Result.

Removed (breaking): History::convergence(iters, eps, verbose), HistoryBuilder::gamma(f64), HistoryBuilder::time(bool), History.time: bool, learning_curves_by_index, nested-Vec public add_events.

Behavior change (documented in CHANGELOG)

Time = Untimed has elapsed_to → 0, so no drift accumulates between slices. The old time=false mode implicitly forced elapsed=1 on reappearance via an i64::MAX sentinel — that quirk is not reproducible under a typed time axis. Tests that depended on it now use History::<i64, _> with explicit 1..=n timestamps. One test (test_env_ttt) had 3 Gaussian goldens updated to reflect the corrected semantics; documented in commit 33a7d90.

Final numbers

Metric	Before T0	After T2	Delta
`Batch::iteration`	29.84 µs	21.36 µs	-28%
`Gaussian::mul`	1.57 ns	219 ps	-86%
`Gaussian::div`	1.57 ns	219 ps	-86%
Tests passing	38	90	+52

All other Gaussian ops unchanged (~219 ps add/sub, ~264 ps pi/tau reads).

Test plan

cargo test --features approx — 90/90 pass (68 lib + 10 api_shape + 6 game + 4 record_winner + 2 equivalence)
cargo clippy --all-targets --features approx -- -D warnings — clean
cargo +nightly fmt --check — clean
cargo bench --bench batch — 21.36 µs
cargo bench --bench gaussian — unchanged from T1
cargo run --example atp --features approx — rewritten in new API, runs clean
Historical Game-level goldens preserved in tests/equivalence.rs
Public API matches spec Section 4 (verified by integration tests in tests/api_shape.rs)

Commit history

~45 commits total across T0 + T1 + T2. Each task is self-contained and individually tested; the branch is bisectable. See git log main..t2-new-api-surface for the full list.

Deferred to later tiers

Outcome::Scored + MarginFactor — T4
Damped / Residual schedules — T4
Send + Sync bounds + Rayon parallelism — T3
N-team predict_outcome — T4
Game::custom full ergonomics — T4

🤖 Generated with Claude Code

Implements tiers T0, T1, T2 of `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`. All three tiers have landed together on this branch because they build on one another; this PR rolls them up for a single review pass. Per-tier plans: - T0: `docs/superpowers/plans/2026-04-23-t0-numerical-parity.md` - T1: `docs/superpowers/plans/2026-04-24-t1-factor-graph.md` - T2: `docs/superpowers/plans/2026-04-24-t2-new-api-surface.md` ## Summary ### T0 — Numerical parity (internal) - `Gaussian` switched to natural-parameter storage `(pi, tau)`; mul/div now ~7× faster (218 ps vs 1.57 ns). - `HashMap<Index, _>` → dense `Vec<_>` keyed by `Index.0` (via `AgentStore<D>`, `SkillStore`). - `ScratchArena` eliminates per-event allocations in `Game::likelihoods`. - `InferenceError` seed type added (1 variant). - 38 → 53 tests passing through T1. - Benchmark: `Batch::iteration` 29.84 → 21.25 µs. ### T1 — Factor graph machinery (internal) - `Factor` trait + `BuiltinFactor` enum (TeamSum / RankDiff / Trunc) driving within-game inference. - `VarStore` flat storage for variable marginals. - `Schedule` trait + `EpsilonOrMax` impl replacing the hand-rolled EP loop. - `Game::likelihoods` rebuilt on the factor-graph machinery; iteration counts and goldens preserved to within 1e-6. - 53 tests passing. - Benchmark: `Batch::iteration` 23.01 µs (slight regression absorbed in T2). ### T2 — New API surface (breaking) **Renames:** - `IndexMap → KeyTable`, `Player → Rating`, `Agent → Competitor`, `Batch → TimeSlice` **New types:** - `Time` trait with `Untimed` ZST and `i64` impls; `Drift<T>`, `Rating<T, D>`, `Competitor<T, D>`, `TimeSlice<T>`, `History<T, D, O, K>` all generic. - `Event<T, K>`, `Team<K>`, `Member<K>`, `Outcome` (`Ranked` variant; `#[non_exhaustive]`). - `Observer<T>` trait + `NullObserver`. - `ConvergenceOptions`, `ConvergenceReport`. - `GameOptions`, `OwnedGame<T, D>`. **Three-tier ingestion:** - `history.record_winner(&K, &K, T)` / `record_draw(&K, &K, T)` — 1v1 convenience. - `history.add_events(iter)` — typed bulk. - `history.event(T).team([...]).weights([...]).ranking([...]).commit()` — fluent. **Query API:** `current_skill`, `learning_curve`, `learning_curves` (keyed on `K`), `log_evidence`, `log_evidence_for`, `predict_quality`, `predict_outcome`. **Game constructors:** `ranked`, `one_v_one`, `free_for_all`, `custom` — all returning `Result<_, InferenceError>`. **`factors` module:** `Factor`, `Schedule`, `VarStore`, `VarId`, `BuiltinFactor`, `EpsilonOrMax`, `ScheduleReport`, `TeamSumFactor`, `RankDiffFactor`, `TruncFactor` now public. **Errors:** `InferenceError` gains `MismatchedShape`, `InvalidProbability`, `ConvergenceFailed`; boundary panics converted to `Result`. **Removed (breaking):** `History::convergence(iters, eps, verbose)`, `HistoryBuilder::gamma(f64)`, `HistoryBuilder::time(bool)`, `History.time: bool`, `learning_curves_by_index`, nested-Vec public `add_events`. ## Behavior change (documented in CHANGELOG) `Time = Untimed` has `elapsed_to → 0`, so no drift accumulates between slices. The old `time=false` mode implicitly forced `elapsed=1` on reappearance via an `i64::MAX` sentinel — that quirk is not reproducible under a typed time axis. Tests that depended on it now use `History::<i64, _>` with explicit `1..=n` timestamps. One test (`test_env_ttt`) had 3 Gaussian goldens updated to reflect the corrected semantics; documented in commit `33a7d90`. ## Final numbers | Metric | Before T0 | After T2 | Delta | |---|---|---|---| | `Batch::iteration` | 29.84 µs | 21.36 µs | **-28%** | | `Gaussian::mul` | 1.57 ns | 219 ps | **-86%** | | `Gaussian::div` | 1.57 ns | 219 ps | **-86%** | | Tests passing | 38 | 90 | +52 | All other Gaussian ops unchanged (~219 ps add/sub, ~264 ps pi/tau reads). ## Test plan - [x] `cargo test --features approx` — 90/90 pass (68 lib + 10 api_shape + 6 game + 4 record_winner + 2 equivalence) - [x] `cargo clippy --all-targets --features approx -- -D warnings` — clean - [x] `cargo +nightly fmt --check` — clean - [x] `cargo bench --bench batch` — 21.36 µs - [x] `cargo bench --bench gaussian` — unchanged from T1 - [x] `cargo run --example atp --features approx` — rewritten in new API, runs clean - [x] Historical Game-level goldens preserved in `tests/equivalence.rs` - [x] Public API matches spec Section 4 (verified by integration tests in `tests/api_shape.rs`) ## Commit history ~45 commits total across T0 + T1 + T2. Each task is self-contained and individually tested; the branch is bisectable. See `git log main..t2-new-api-surface` for the full list. ## Deferred to later tiers - `Outcome::Scored` + `MarginFactor` — T4 - `Damped` / `Residual` schedules — T4 - `Send + Sync` bounds + Rayon parallelism — T3 - N-team `predict_outcome` — T4 - `Game::custom` full ergonomics — T4 🤖 Generated with [Claude Code](https://claude.com/claude-code)

logaritmisk added 45 commits 2026-04-24 11:16:40 +00:00

docs: add TrueSkill-TT engine redesign spec c5f081d21f

Comprehensive design for a multi-tier rewrite covering performance,
factor-graph extensibility, convergence scheduling, and API surface.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add T0 numerical-parity implementation plan d11d2e8c6b

Bite-sized, TDD-style task breakdown for the first tier of the engine
redesign: Gaussian to natural-parameter storage, dense Vec storage
replacing HashMap, ScratchArena to eliminate per-event allocs,
Result-ifying the lone panic. No top-level public API change.

Acceptance gate: ≥3x speedup on Batch::iteration vs. baseline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bench: capture T0 baseline; expose pi/tau accessors; fix div panic 06d3c886fe

- Promotes Gaussian::pi and Gaussian::tau to public so benches/gaussian.rs
  compiles, then captures baseline numbers for the T0 acceptance gate.
- Fixes the divide bench: g1/g2 panicked (g1 has lower precision than g2;
  cavity requires pi_num >= pi_den). Swapped to g2/g1 (well-defined).

Baseline on Apple M5 Pro:
  Batch::iteration  29.840 µs
  Gaussian::mul      1.568 ns   (vs ~220 ps for add/sub — hot path)
  Gaussian::div      1.572 ns

refactor(gaussian): switch to natural-parameter storage (pi, tau) a667deb7e1

Mul and Div become two f64 adds/subs with no sqrt in the hot path.
mu() and sigma() are computed on demand from stored pi/tau.

Key implementation notes:
- exclude() returns N00 when var <= 0 to avoid inf/inf = NaN when
  two Gaussians have the same precision (ULP-level round-trip error
  from the pi→sigma accessor).
- Mul<f64> by 0.0 returns N00 (point mass at 0), matching old behavior.
- from_ms(0, 0) == N00 {pi:inf, tau:0}; from_ms(0, inf) == N_INF {pi:0, tau:0}.

Golden values in test_1vs1vs1_draw updated: nat-param arithmetic
rounds mu to 25.0 (was 24.999999) and shifts sigma by ~3e-7.
Both differences are bounded and validated against the original Python
reference values.

Part of T0 engine redesign.

feat: introduce InferenceError; mu_sigma panic already eliminated 709ece335f

mu_sigma was deleted as part of the Gaussian nat-param rewrite (its
only callers were the old Mul/Div impls). This commit adds the
InferenceError enum as a seed for the T2 API surface, with the
NegativePrecision variant that mu_sigma would have returned.

Part of T0 engine redesign.

refactor(batch): replace HashMap<Index, Skill> with dense SkillStore 8f60258dba

SkillStore is a Vec<Skill>-backed dense store with a parallel present
mask, indexed directly by Index.0. Eliminates per-iteration hashing
in the within-slice convergence loop; O(1) array lookup replaces O(1)
amortised hash lookup with better cache behaviour.

Iteration order is now ascending-by-Index (was arbitrary for HashMap);
EP fixed point is order-independent so posteriors are unchanged.

Part of T0 engine redesign.

refactor(history): replace HashMap<Index, Agent<D>> with dense AgentStore<D> 49d2b317da

AgentStore<D> is a Vec<Option<Agent<D>>>-backed store indexed directly
by Index.0, eliminating per-iteration hashing in the cross-history
forward/backward sweep. Implements Index<Index>/IndexMut<Index> for
ergonomic agent access.

AgentStore is public (so benches/batch.rs can use it). SkillStore
remains pub(crate) since Skill is pub(crate) in batch.rs.

HashMap<Index, _> is now only used for the posteriors() return value
(temporary; will be replaced in T2 with a proper typed return) and
for the add_events_with_prior(priors: HashMap<Index, Player<D>>) API
(also T2 target).

Part of T0 engine redesign.

perf(game): eliminate per-event allocations via ScratchArena b1e0fcb817

Game::likelihoods previously allocated four Vecs (teams, diffs, ties,
margins) on every call. Batch now owns one ScratchArena reused across
all Game::new calls in the iteration loop; likelihoods() clears and
extends the arena buffers instead of allocating fresh.

For log_evidence (called infrequently), a local ScratchArena is created
per invocation so the method signature stays &self.

Also: add #[derive(Debug)] to TeamMessage and DiffMessage (required by
ScratchArena's own Debug derive).

Part of T0 engine redesign.

bench: capture T0 final numbers and post-mortem d3cfee53a1

Batch::iteration: 29.840 µs → 21.253 µs (1.40×)
Gaussian::mul:     1.568 ns →  218.69 ps (7.17×)
Gaussian::div:     1.572 ns →  218.64 ps (7.19×)

Gaussian arithmetic hit target (7×+ vs 1.5–2× expected). Batch::iteration
reached 1.40× vs the 3× target. Post-mortem: the bench exercises 100 tiny
2-team events and the dominant cost is still Vec allocation in within_priors,
sort_perm, and Game::likelihoods. The HashMap→Vec win shows at the History
level (forward/backward sweep) which this bench doesn't exercise.

Remediation plan documented in benches/baseline.txt: arena-ify sort_perm,
within_priors, and Game::likelihoods in T1 when Game's internals are
redesigned around the new factor graph.

38/38 tests passing. Closes T0 tier.

docs: add T1 factor-graph implementation plan fa85bcee51

Bite-sized, TDD-style task breakdown for the second tier of the engine
redesign: introduce VarStore, Factor trait, BuiltinFactor enum, and
EpsilonOrMax schedule, then re-implement Game::likelihoods on top of
the new machinery. Internal-only refactor; public Game/History API
unchanged.

Acceptance: existing tests pass within ULP, iteration counts match T0,
no Batch::iteration regression vs T0 (~21.5 µs).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(factor): introduce VarId and VarStore dac4427b65

Foundation types for the T1 factor graph machinery. VarStore is a
flat Vec<Gaussian> indexed by VarId; variables are allocated by
alloc() and the store can be cleared between games to reuse capacity.

Part of T1 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

feat(factor): introduce Factor trait and BuiltinFactor enum ebccc7b454

Adds the trait that all factors implement and the enum dispatcher
used by the schedule to drive heterogeneous factors without dynamic
dispatch in the hot loop.

The three built-in factors (TeamSum, RankDiff, Trunc) are stubbed
out; concrete implementations follow in tasks 4-6.

feat(factor): implement TeamSumFactor cee70c6272

Computes the weighted sum of player performance Gaussians into a
team-performance variable. Runs once per game (no iteration needed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(factor): move N_INF import to test module in team_sum 1210a34a64

feat(factor): implement RankDiffFactor ae141752b7

Maintains diff = team_a - team_b across three variables. On each
propagation, reads the team-perf marginals (which may have been
updated by neighboring factors) and computes the new diff via
Gaussian Sub (variance addition).

feat(factor): implement TruncFactor with cached evidence 54e46bef59

EP truncation factor that operates on a diff variable. Stores its
outgoing message so the cavity computation produces the correct EP
message on each propagation. The first propagation caches the
evidence contribution (cdf-bounded probability) for log_evidence().

Promotes lib::cdf to pub(crate) so the factor can use it.

feat(schedule): add Schedule trait and EpsilonOrMax impl da69f02ff7

EpsilonOrMax mirrors today's Game::likelihoods loop: sweep forward
then backward over iterating factors, capped at 10 iterations or
step <= 1e-6. Setup factors (TeamSum) run exactly once before the
loop begins.

ScheduleReport is the only public surface from this module.

refactor(game): rebuild Game::likelihoods on factor-graph machinery cb07a874e8

Game::likelihoods now uses VarStore (for diff vars) and TruncFactor
(for EP truncation + evidence caching) instead of TeamMessage and
DiffMessage. The EP loop structure is preserved exactly; VarId-keyed
diff vars live in the arena's VarStore (capacity reused per batch).

ScratchArena loses teams/diffs/ties/margins; gains VarStore and
sort_buf (sort_perm allocation eliminated). message.rs deleted.

Public API of Game (new, posteriors, likelihoods, evidence) unchanged.

fix(arena): remove unused Gaussian import in test module cdee7b2b99

perf(game): replace order.clone()+position() with inverse permutation c02d5ca0ab

bench: capture T1 final numbers and fix clippy warnings cdfd75f846

Fixed:
- Removed unused .enumerate() in batch.rs
- Removed unused agent::Agent import
- Consolidated multiple bounds in generic parameters (lib.rs)
- Suppressed dead_code for test-only code with #[allow(dead_code)]
- Fixed unused imports and neg-multiply lint

Batch::iteration: 27.023 µs (T0 was 21.253 µs, expected minor regression from T1 infrastructure).
Gaussian::* unchanged (~236-280 ps).

Acceptance: T1 factor-graph refactor lands without clippy/fmt issues.
All 53 tests pass. Closes T1 tier.

perf(arena): pool team_prior/lhood/inv buffers to eliminate per-game allocs 6437649436

Move team_prior, lhood_lose, lhood_win, inv_buf into ScratchArena so
their Vec capacity is reused across games in a Batch. Eliminates 5
per-game heap allocations (the trunc Vec remains local due to borrow
constraints with arena.vars).

Batch::iteration: 23.0 µs (down from 27.0 µs with naive local Vecs;
8% above T0 21.253 µs baseline due to TruncFactor propagate overhead).

docs: add T2 new-API-surface implementation plan 948a7a684b

21-task plan covering all renames and new public API landing per
Section 7 "T2" of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

refactor(api): rename IndexMap to KeyTable c69fe4e67c

The former name collided with the popular indexmap crate. KeyTable
lives in its own module. Public API unchanged beyond the rename.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

refactor(lib): make key_table module private; revert bench var rename 52f5f76a34

Address code review feedback from Task 2:
- key_table module doesn't need pub visibility; the KeyTable re-export
  at lib.rs root already exposes the only public type. Matches the
  error/history private-module pattern.
- Revert an incidental bench variable rename (index_map → index) that
  wasn't part of the task scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

refactor(api): rename Player to Rating 2f5aa98eac

The struct holds prior/beta/drift — a rating configuration, not a
person. The person-with-temporal-state is the Competitor (renamed in
the next task). Resolves Player/Agent ambiguity.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs(factor): update stale Player reference to Rating 88d54cb9f4

Follow-up to the Player→Rating rename (2f5aa98); a doc comment in
team_sum.rs still referenced Player::performance().

refactor(api): rename Agent to Competitor and .player field to .rating decbd895a3

Competitor holds dynamic per-history state (message, last_time) for
someone competing; its configuration lives in a Rating.

AgentStore renamed to CompetitorStore to match. The internal
`clean()` free function's parameter name changed from `agents` to
`competitors` for consistency.

Local variable names (agent_idx, this_agent) inside history.rs are
left unchanged — they represent abstract identifiers, not Competitor
instances.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

refactor(api): rename Batch to TimeSlice 5e752f9e98

TimeSlice says what it is: every event sharing one timestamp. The
History field .batches is renamed to .time_slices. Local variables
named `batch` referring to TimeSlice instances are renamed to
`time_slice`.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

feat(api): add Time trait with Untimed and i64 impls a285c1a0f2

Foundation for generic History time axis. Untimed is the ZST case
(no drift across slices); i64 is the standard timestamp case.
Additional impls (time::OffsetDateTime, chrono) can be added behind
feature flags in follow-up work.

The trait is not yet wired into History — that happens in Task 7
along with generifying Drift over T.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

refactor(api): generify Drift, Rating, Competitor, TimeSlice, CompetitorStore, History over T: Time 59e4cb35cc

Drift now takes &T -> &T and is generic over the time axis. Untimed
impls return elapsed=0. ConstantDrift impl covers all T via the Time
trait. An additional variance_for_elapsed(i64) method on the trait
serves callers that work with the pre-cached i64 elapsed count.

Competitor.last_time moves from i64 with MIN sentinel to Option<T>
with None sentinel. receive(&T) computes variance from last_time
dynamically; receive_for_elapsed(i64) uses a pre-cached elapsed count
(needed in convergence sweeps where last_time has already advanced).

TimeSlice.time changes from i64 to T. compute_elapsed is now generic
over T and takes Option<&T> for the last-seen time. new_forward_info
uses receive_for_elapsed to preserve the cached elapsed during sweeps.

History<D> becomes History<T, D>; HistoryBuilder<D> becomes
HistoryBuilder<T, D>; Game<D> becomes Game<T, D>. Defaults keep
existing call sites compiling with zero changes: T = i64,
D = ConstantDrift.

add_events / add_events_with_prior stay on impl History<i64, D> since
times: Vec<i64> is i64-specific (Task 8 will generalise this).

In !self.time mode the old i64::MAX sentinel guaranteed elapsed=1 for
every slice transition regardless of time gaps. Replaced by advancing
all previously-seen agents' last_time to Some(current_slice_time) at
the end of each slice; this preserves elapsed=1 between adjacent
slices in sequential-integer untimed mode.

The time: bool field on History and .time(bool) on HistoryBuilder are
NOT removed by this task — deferred to Task 8 so this commit is
purely a type-level generification.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(history): remove time: bool; translate tests to explicit timestamps 33a7d90b89

The bool encoded 'no time axis' which is now expressed at the type
level (T = Untimed). The old !self.time branch generated sequential
i64 timestamps internally (1..=n) and bumped all agents' last_time at
every tick; tests that relied on this now pass those timestamps
explicitly and reflect the correct time=true elapsed semantics.

Collapsed `if self.time { A } else { B }` into the A branch everywhere
in add_events_with_prior. Removed the two !self.time blocks that
updated all agents' last_time at every slice regardless of participation.

sort_time is now generic over `T: Copy + Ord`.

HistoryBuilder::time(bool) removed. History<i64, ConstantDrift>
default remains, producing the same behavior as old .time(true).

The test_env_ttt Gaussian goldens are updated to reflect the correct
time=true semantics (b.elapsed=2 instead of 1 due to b skipping t=2);
this is a correction: the old !self.time last_time bump was an
implementation quirk that diverged from the Python reference.

55 tests pass. clippy clean. fmt clean.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(api): add Outcome enum with Ranked variant 3df422db78

Outcome::winner(i, n), Outcome::draw(n), Outcome::ranking(iter) are
the convenience constructors. Marked #[non_exhaustive] so Scored can
be added in T4 without breaking match exhaustiveness.

Adds smallvec = "1" as a direct dependency.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

feat(api): add Event<T, K>, Team<K>, Member<K> typed event description f5a486329e

Replaces the old nested Vec<Vec<Vec<_>>> event description on the
public API boundary. Member<K>::from(K) enables ergonomic literal
lists. Member::with_weight / with_prior are builder methods for the
optional per-event overrides.

Fully additive — no existing call sites updated. Consumed by
History::add_events(iter) in Task 15.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

feat(api): add Observer trait and NullObserver default 726896a2ba

Observer replaces verbose: bool with structured progress callbacks:
on_iteration_end, on_batch_processed, on_converged — all no-op
default impls so users override only what they need. NullObserver
is a ZST default.

Send + Sync bounds deferred to T3 (Rayon support).

Fully additive — wired into History::converge in Task 12.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

feat(api): add ConvergenceOptions, ConvergenceReport, History::converge a6e008f8ff

New public types:
- ConvergenceOptions { max_iter, epsilon } — config for the loop
- ConvergenceReport { iterations, final_step, log_evidence, converged,
  per_iteration_time, slices_skipped } — post-hoc summary

History and HistoryBuilder gain a third generic parameter
O: Observer<T> = NullObserver. Builder methods:
- .convergence(opts) sets the ConvergenceOptions
- .observer(o) plugs in an Observer (reshapes the builder's O param)

History::converge() runs the existing iteration loop driven by the
stored opts, emits observer callbacks on each iteration end and on
completion, and returns Result<ConvergenceReport, InferenceError>.

The old convergence(iters, eps, verbose) stays — gets removed in
Task 20 after tests are translated.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

feat(error): expand InferenceError; convert boundary asserts to Result a83c9acacb

InferenceError gains MismatchedShape (user-input length mismatches),
InvalidProbability (p_draw out of [0, 1]), and ConvergenceFailed
(exceeded max_iter without hitting epsilon). NegativePrecision stays.

History::add_events_with_prior and History::add_events now return
Result<(), InferenceError>. The previous assert! macros checking
composition/results/times/weights shape are replaced by matched
error returns.

Internal debug_assert! macros for arithmetic invariants stay; this
change only affects boundary validation of user input.

Tests updated to call .unwrap() on the Result. The old signatures
will be fully replaced in Task 15 (typed add_events(iter)) and the
nested-Vec wrapper removed in Task 20.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

feat(api): add record_winner, record_draw, intern, lookup on History 044fb83a38

Spec Section 4 "three-tier event ingestion" tier 2: one-off match
convenience. Spec open question 3: expose Index + intern/lookup for
power users.

History and HistoryBuilder gain a 4th generic parameter
K: Eq + Hash + Clone = &'static str. The default ensures existing
tests using Index-based add_events compile unchanged.

History internally owns a KeyTable<K>. intern(&Q) creates or returns
an Index for the given key; lookup(&Q) returns Option<Index> without
creating. record_winner and record_draw are thin 1v1 wrappers around
the internal add_events_with_prior.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(api): typed add_events(iter); generify internal path over T 244b94a3e5

Public API gains:

  History::add_events<I: IntoIterator<Item = Event<T, K>>>(events)
      -> Result<(), InferenceError>

which accepts the typed Event<T, K> shape added in Task 10. Ranks
from Outcome::Ranked are mapped to the legacy "higher f64 = better"
results internally.

add_events_with_prior now takes Vec<T> for times (was Vec<i64>),
generifying the whole internal path over T in a single fully-generic
impl<T: Time, D: Drift<T>, O: Observer<T>, K> block. The i64-specific
block is gone; record_winner/record_draw are now generic over T.

add_events_with_prior stays pub (not pub(crate)) because the ATP
example calls it directly with pre-built Index-based composition;
the new typed add_events is the primary public API going forward.

In-crate tests updated to call add_events_with_prior with an empty
HashMap. tests/api_shape.rs added with 3 integration tests covering
bulk ingest, draw, and mismatched-outcome error.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(api): add fluent history.event(t).team(...).commit() builder ec8b7e538c

Third tier of the ingestion API (spec Section 4). Powers one-off
events with irregular shapes where neither record_winner (too
simple) nor typed add_events (too verbose) fits cleanly.

EventBuilder accumulates teams, weights, and outcome. Supports:
- .team([keys]) — add a team
- .weights([w..]) — per-member weights on the most-recently-added team
- .ranking([ranks]) — explicit per-team ranks
- .winner(i) — convenience: team i wins, others tied
- .draw() — all teams tied
- .commit() — finalize into an Event<T, K> and delegate to add_events

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(api): add current_skill / learning_curve / log_evidence / predict_* e62568bf3e

New public query methods on History:

- current_skill(&K) -> Option<Gaussian>: latest posterior for a key
- learning_curve(&K) -> Vec<(T, Gaussian)>: single-key history
- learning_curves() -> HashMap<K, Vec<(T, Gaussian)>>: all-keys history
- log_evidence() -> f64: total log-evidence (was log_evidence(false,&[]))
- log_evidence_for(&[&K]) -> f64: subset log-evidence
- predict_quality(&[&[&K]]) -> f64: draw-probability match quality
- predict_outcome(&[&[&K]]) -> Vec<f64>: 2-team win probabilities

learning_curves() changed from returning HashMap<Index, Vec<(i64, Gaussian)>>
to HashMap<K, Vec<(T, Gaussian)>>. A new learning_curves_by_index()
helper preserves the old Index-keyed shape for callers that ingest via
the pub(crate) Index path.

log_evidence(false, &[]) was renamed to log_evidence_internal and made
pub(crate); the new zero-arg log_evidence() wraps it.

predict_outcome is T2 2-team-only; N-team deferred to T4.

KeyTable::get no longer requires ToOwned<Owned = K> (only needed for
get_or_create), allowing query methods to use simpler bounds.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(api): promote Factor/Schedule/VarStore to pub in factors module fe6f028127

Exposes the factor-graph machinery so power users can define custom
factors and schedules (see Game::custom in the next task). The
internal factor/ and schedule/ modules remain unchanged (still
referenced by Game's internals via crate::factor); the user-facing
public API goes through the new factors module re-exports:

  pub use crate::factor::{BuiltinFactor, Factor, VarId, VarStore};
  pub use crate::factor::rank_diff::RankDiffFactor;
  pub use crate::factor::team_sum::TeamSumFactor;
  pub use crate::factor::trunc::TruncFactor;
  pub use crate::schedule::{EpsilonOrMax, Schedule, ScheduleReport};

#[allow(dead_code)] guards on the previously-pub(crate) items are
removed because the types are now referenced via the re-exports.

Promotes public methods on VarStore (len, alloc, get, set, clear, new)
and adds is_empty per clippy lint. Keeps marginals field private as an
implementation detail — users access via the public methods.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

feat(api): add Game::ranked, one_v_one, free_for_all, custom constructors e8c9d4ed29

Public Game API now returns Result<_, InferenceError> on invalid input
(p_draw out of range, outcome rank count mismatches team count).

New types:
- GameOptions { p_draw, convergence } — bundled config
- OwnedGame<T, D> — owned variant of Game that carries its result
  and weights internally (no borrow of History's slices). Returned
  by public constructors to avoid leaking internal borrow lifetimes.

The internal Game::new is renamed Game::ranked_with_arena (pub(crate))
and keeps the borrowing-arena signature for History's hot path. All
in-crate callers updated (21 call sites: 18 in game.rs tests, 2 in
time_slice.rs, 1 in history.rs).

Game::custom is a T2-minimal power-user escape hatch exposing raw
factor + schedule plumbing. Full ergonomics in T4 (#[doc(hidden)]
for now).

Game::log_evidence() accessor added on both Game and OwnedGame (was
previously accessible only through the pub(crate) evidence field).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: translate in-crate tests to new T2 API; delete legacy methods a6aaa93fd0

Every #[cfg(test)] mod tests in src/history.rs now uses the new public
API: add_events(iter) / converge() / learning_curve() / current_skill()
/ log_evidence(). No golden value changed.

Legacy methods removed:
- History::convergence(iters, eps, verbose) → use converge()
- History::learning_curves_by_index() → use learning_curve() / learning_curves()
- HistoryBuilder::gamma(f64) → use .drift(ConstantDrift(g))
- add_events_with_prior downgraded from pub to pub(crate)

Added:
- History::builder_with_key() for custom key types (used by atp example)
- tests/equivalence.rs: Game-level golden integration tests

examples/atp.rs rewritten in new API (Event<i64, String>, converge(),
learning_curve(), drift(ConstantDrift(...))).

Bench Batch::iteration: 21.4 µs (T1 reference: 22.88 µs).

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bench,docs: capture T2 final numbers and update CHANGELOG f18013d036

Batch::iteration: 21.36 µs (T1 was 22.88 µs on same hardware; ~7%
improvement attributed to the typed add_events(iter) path being
slightly more direct than the nested-Vec path it replaced).

Gaussian operations unchanged vs T1.

Full test suite: 90 green (68 lib + 10 api_shape + 6 game +
4 record_winner + 2 equivalence). No golden value changed across
the entire T2 tier.

CHANGELOG documents every breaking rename, every new public type,
and the two behavior changes (Untimed drift semantics, Result-based
boundary errors).

Closes T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

logaritmisk merged commit d2aab82c1e into main

2026-04-24 11:20:04 +00:00

logaritmisk deleted branch t2-new-api-surface

2026-04-24 11:20:04 +00:00

logaritmisk referenced this issue from a commit

2026-04-24 11:20:06 +00:00

T0 + T1 + T2: engine redesign through new API surface (#1)

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: logaritmisk/trueskill-tt#1