trueskill-tt

Author	SHA1	Message	Date
logaritmisk	b46e7f068d	feat(outcome): per-event score_sigma override on Outcome::Scored Outcome::Scored shape changes from tuple to struct: { scores, sigma: Option<f64> }. New constructor scores_with_sigma sets sigma=Some(s) and debug-asserts s > 0.0; existing scores(I) constructor keeps its signature and builds with sigma=None internally. team_count, as_scores, as_ranks accessor pattern matches updated. History::add_events resolves sigma.unwrap_or(self.score_sigma) at the ingest arm, so downstream EventKind::Scored stays a plain f64 and TimeSlice / run_chain need zero changes. Breaking change to the public Outcome::Scored variant shape (acceptable in 0.1.x). Bit-equal for callers using the no-override path because the resolution falls through to self.score_sigma exactly as before.	2026-05-08 21:27:09 +02:00
logaritmisk	68be7ab5b7	test(history): end-to-end ConvergenceOptions propagation tests Two integration tests on a 4-team ranked event: - max_iter=1 set on HistoryBuilder produces measurably different posteriors than default, proving the inner loop honors the propagated max_iter - alpha=0.5 with extra iterations reaches the same fixed point as alpha=1.0, proving damping doesn't break correctness on the History path Also updates the alpha doc comment to clarify it applies only to the within-game EP loop, not the outer cross-history sweep.	2026-05-08 15:34:58 +02:00
logaritmisk	824b7f50b0	feat(time_slice): inference callsites read self.convergence The three Game::*_with_arena callsites in time_slice.rs (in TimeSlice::iteration's sequential branch, TimeSlice::log_evidence's run_event closure, and Event::iteration_direct via parameter) now use the propagated ConvergenceOptions instead of hardcoded ::default(). sweep_color_groups (both rayon and non-rayon paths) forwards self.convergence into Event::iteration_direct. Damped EP (alpha < 1.0) and custom max_iter / epsilon set on HistoryBuilder::convergence(opts) now actually reach the within-game inference loop. Bit-equal for users on default options. Removes the temporary #[allow(dead_code)] on TimeSlice::convergence that was added in the prior commit.	2026-05-08 15:32:25 +02:00
logaritmisk	872f91797d	refactor(time_slice): add convergence field, rename iterate_to_convergence TimeSlice<T> gains a pub(crate) convergence: ConvergenceOptions field set at construction. TimeSlice::new now takes it as a third parameter (breaking change to the pub constructor, acceptable in 0.1.x). History::add_events_with_prior passes self.convergence so the propagated value reaches every TimeSlice. The pre-existing convergence-the-method is renamed to iterate_to_convergence to disambiguate from the new convergence-the-field. The field is wired but not yet read by inference -- the three Game::*_with_arena callsites in time_slice.rs still hardcode ConvergenceOptions::default(). Task 2 changes that. Bit-equal because the propagated value equals the hardcoded value end-to-end. Also updated benches/batch.rs which has a fourth TimeSlice::new callsite (not enumerated in the plan -- only src/ files were).	2026-05-08 15:29:39 +02:00
logaritmisk	dbce69f350	test(game): integration tests for ConvergenceOptions behavior Two end-to-end tests on a 4-team ranked game: - max_iter=1 produces measurably different posteriors than the default, proving run_chain reads convergence.max_iter - alpha=0.5 with extra iterations reaches the same fixed point as alpha=1.0, proving damping doesn't break convergence on benign graphs	2026-05-08 15:13:23 +02:00
logaritmisk	0705986929	feat(game): plumb ConvergenceOptions through to run_chain Game and OwnedGame gain a convergence: ConvergenceOptions field set at construction. Game::{ranked,scored} forward options.convergence into OwnedGame::{new,new_scored} (previously dropped on the floor). {ranked,scored}_with_arena take it as a parameter. run_chain reads self.convergence.{epsilon, max_iter, alpha} instead of hardcoded 1e-6 / 10 / undamped. DiffFactor::propagate gains an alpha parameter and dispatches into Trunc/MarginFactor::propagate_with_alpha. In-tree callsites in src/time_slice.rs and src/history.rs pass ConvergenceOptions::default(). Pre-existing T2 fallout in tests, benches, and the atp example (struct literals missing the new alpha field) is fixed by adding alpha: 1.0 so the workspace builds clean. Default alpha is 1.0, so all 96 lib + 27 integration test goldens remain bit-equal.	2026-05-08 15:10:35 +02:00
logaritmisk	aacaa60baa	feat(factor): add MarginFactor::propagate_with_alpha for EP damping Mirrors TruncFactor: inherent damped-propagate method, trait impl delegates with α=1.0. Existing goldens unchanged because cavity*new_msg equals the previous marginal write when α=1.0.	2026-05-08 15:03:45 +02:00
logaritmisk	fcfe0ffe37	feat(factor): add TruncFactor::propagate_with_alpha for EP damping Inherent method that applies α-damping to the outgoing message via Gaussian::damp_natural. The Factor trait impl delegates with α=1.0, preserving today's behavior bit-equal. Variable write switched from `trunc` to `cavity * damped` — algebraically identical when α=1.0 (cavity * new_msg = trunc by construction); reflects partial-update math when α<1.0.	2026-05-08 15:02:09 +02:00
logaritmisk	0fa4e7d277	feat(convergence): add ConvergenceOptions::alpha damping field Adds an EP damping coefficient defaulting to 1.0 (undamped). Will be read by run_chain in a follow-up commit. By itself this commit changes no behavior — existing constructors using ..Default::default() pick up the new field automatically.	2026-05-08 15:00:34 +02:00
logaritmisk	0dd7dab266	feat(gaussian): add damp_natural helper for EP damping Computes α·new + (1−α)·self in natural-parameter space. Will be used by TruncFactor and MarginFactor to support opt-in EP damping via ConvergenceOptions::alpha.	2026-05-08 14:59:18 +02:00
logaritmisk	f6a83e4dc6	refactor: make BuiltinFactor::log_evidence match exhaustive Replace the `_ => 0.0` wildcard with explicit `Self::TeamSum(_) \| Self::RankDiff(_) => 0.0`. No behavioral change; future variants now produce a compile error instead of being silently absorbed by the wildcard.	2026-05-08 14:37:13 +02:00
logaritmisk	68b589b965	refactor: dedupe Game::likelihoods and likelihoods_scored via run_chain Both methods were 95-line near-duplicates differing only in the closure that builds the per-diff DiffFactor. Extract the shared body as a private run_chain<F>(&self, arena, make_link) helper that returns (evidence, likelihoods); the two callers shrink to ~10 lines each. Pure code-shape change: posteriors and evidence remain bit-equal; all existing tests (lib + integration) pass unchanged.	2026-05-08 14:36:35 +02:00
logaritmiskandClaude Opus 4.7	8b53cacd64	T4 (MarginFactor): scored outcomes via Gaussian-margin EP evidence Adds soft Gaussian-observation evidence on the per-pair diff variable, enabling continuous score margins as a richer alternative to ranks. Public API: - `Outcome::Scored([scores])` (non-breaking enum extension under `#[non_exhaustive]`). - `Game::scored(teams, outcome, options)` constructor parallel to `Game::ranked`. - `EventBuilder::scores([...])` fluent helper. - `HistoryBuilder::score_sigma(σ)` knob (default 1.0, validated > 0). - `GameOptions::score_sigma`. - `EventKind` re-exported from `lib.rs` (annotated `#[non_exhaustive]`). - New `InferenceError::InvalidParameter { name, value }` variant. Internals: - `MarginFactor` (`factor/margin.rs`): Gaussian observation factor that closes in one EP step; cavity-cached log-evidence mirrors `TruncFactor`. - `BuiltinFactor::Margin` dispatch arm. - `DiffFactor` enum in `game.rs` lets `Game::likelihoods` and the new `likelihoods_scored` share the per-pair link abstraction. - Per-event `EventKind { Ranked, Scored { score_sigma } }` routed through `TimeSlice::add_events`, `iteration_direct`, and `log_evidence`. Tests: 88 lib + 27 integration (4 new in `tests/scored.rs`); existing goldens byte-identical. Bench: `benches/scored.rs` baseline ~960µs for 60 events × 20-player pool with default convergence. Plan: docs/superpowers/plans/2026-04-27-t4-margin-factor.md Spec item marked Done. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 08:47:36 +02:00
logaritmisk	6bf3e7e294	T3: rayon-backed concurrency (opt-in) (#2 ) Implements T3 of `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md` Section 6. Plan: `docs/superpowers/plans/2026-04-24-t3-concurrency.md` (11 tasks). ## Summary ### Breaking - `Send + Sync` bounds added to public traits: `Time`, `Drift<T>`, `Observer<T>`, `Factor`, `Schedule`. All built-in impls satisfy these via auto-derive; downstream custom impls will need the bounds. ### New - Opt-in `rayon` cargo feature. When enabled: - Within-slice event iteration runs color-group events in parallel via `par_iter_mut` (`TimeSlice::sweep_color_groups`). - `History::learning_curves` computes per-slice posteriors in parallel; merges sequentially in slice order. - `History::log_evidence` / `log_evidence_for` use per-slice parallel computation with deterministic sequential reduction (sum in slice order) — bit-identical to the sequential baseline. - `ColorGroups` infrastructure (`src/color_group.rs`) with greedy graph coloring. Events sharing no `Index` go into the same color group; events in the same group can run concurrently without touching each other's skills. - `tests/determinism.rs` asserts bit-identical posteriors across `RAYON_NUM_THREADS={1, 2, 4, 8}`. - `benches/history_converge.rs` measures end-to-end convergence on three workload shapes. ## Performance ### Sequential (no rayon, default build) \| Metric \| Before T3 \| After T3 \| Delta \| \|---\|---\|---\|---\| \| `Batch::iteration` \| 22.88 µs \| 23.23 µs \| +1.5% (noise) \| \| `Gaussian::` \| ≈218–264 ps \| ≈236 ps \| within noise \| No sequential regression.* Default build is as fast as T2. ### Parallel (`--features rayon`, Apple M5 Pro, auto thread count) \| Workload \| Sequential \| Parallel \| Speedup \| \|---\|---:\|---:\|---:\| \| 500 events / 100 competitors / 10 per slice \| 4.03 ms \| 4.24 ms \| 1.0× \| \| 2000 events / 200 competitors / 20 per slice \| 20.18 ms \| 19.82 ms \| 1.0× \| \| 5000 events / 50000 competitors / 1 slice \| 11.88 ms \| 9.10 ms \| 1.3× \| ### ⚠️ The spec's >=2× target was not met on realistic workloads. T3's within-slice color-group parallelism only shows material benefit when a slice holds many events AND the competitor pool is large enough to give the greedy coloring room to partition. Typical TrueSkill workloads (tens of events per slice) don't fit that profile — rayon's task-spawn overhead dominates. Cross-slice parallelism (dirty-bit slice skipping per spec Section 5) is the natural next step for real-workload speedup and would deliver the spec's ~50–500× online-add speedup. Deferred to a future tier. ## Determinism `tests/determinism.rs` runs a 200-event history at thread counts {1, 2, 4, 8} via `rayon::ThreadPoolBuilder::install` and asserts every `(time, posterior)` pair has bit-identical `mu` and `sigma` (compared via `f64::to_bits()`). Passes. ## Internals - Parallel path uses an `unsafe` block to concurrently write to `SkillStore` from color-group-disjoint events. Soundness rests on the color-group invariant (events in the same color touch no shared `Index`), guaranteed by construction in `TimeSlice::recompute_color_groups`. Sequential path unchanged from T2. - `RAYON_THRESHOLD = 64` — color groups smaller than this fall back to sequential inside `sweep_color_groups` to avoid task-spawn overhead. - Thread-local `ScratchArena` per rayon worker thread. ## Test plan - [x] `cargo test --features approx` — 96 tests pass (74 lib + 22 integration) - [x] `cargo test --features approx,rayon` — 97 tests pass (+1 determinism) - [x] `cargo clippy --all-targets --features approx -- -D warnings` — clean - [x] `cargo clippy --all-targets --features approx,rayon -- -D warnings` — clean - [x] `cargo +nightly fmt --check` — clean - [x] `cargo bench --bench batch --features approx` — 23.23 µs (no regression vs T2) - [x] `cargo bench --bench history_converge --features approx,rayon` — runs on all three workloads - [x] Bit-identical posteriors across `RAYON_NUM_THREADS={1, 2, 4, 8}` — verified ## Commit history 13 commits on `t3-concurrency`. Each task is self-contained and bisectable. See `git log main..t3-concurrency` for the full list. ## Deferred - Cross-slice parallelism (dirty-bit slice skipping) — the path that would actually speed up typical TrueSkill workloads. - Default-on `rayon` feature — spec called for default-on; we keep it opt-in until the feature proves stable in production use. - Synchronous-EP schedule with barrier merge — alternative parallel strategy per spec Section 6. - `MarginFactor` / `Outcome::Scored` — T4. - `Damped` / `Residual` schedules — T4. - N-team `predict_outcome` — T4. - `Game::custom` full ergonomics — T4. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #2 Co-authored-by: Anders Olsson <anders.e.olsson@gmail.com> Co-committed-by: Anders Olsson <anders.e.olsson@gmail.com>	2026-04-24 13:01:01 +00:00
logaritmisk	d2aab82c1e	T0 + T1 + T2: engine redesign through new API surface (#1 ) Implements tiers T0, T1, T2 of `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`. All three tiers have landed together on this branch because they build on one another; this PR rolls them up for a single review pass. Per-tier plans: - T0: `docs/superpowers/plans/2026-04-23-t0-numerical-parity.md` - T1: `docs/superpowers/plans/2026-04-24-t1-factor-graph.md` - T2: `docs/superpowers/plans/2026-04-24-t2-new-api-surface.md` ## Summary ### T0 — Numerical parity (internal) - `Gaussian` switched to natural-parameter storage `(pi, tau)`; mul/div now ~7× faster (218 ps vs 1.57 ns). - `HashMap<Index, _>` → dense `Vec<_>` keyed by `Index.0` (via `AgentStore<D>`, `SkillStore`). - `ScratchArena` eliminates per-event allocations in `Game::likelihoods`. - `InferenceError` seed type added (1 variant). - 38 → 53 tests passing through T1. - Benchmark: `Batch::iteration` 29.84 → 21.25 µs. ### T1 — Factor graph machinery (internal) - `Factor` trait + `BuiltinFactor` enum (TeamSum / RankDiff / Trunc) driving within-game inference. - `VarStore` flat storage for variable marginals. - `Schedule` trait + `EpsilonOrMax` impl replacing the hand-rolled EP loop. - `Game::likelihoods` rebuilt on the factor-graph machinery; iteration counts and goldens preserved to within 1e-6. - 53 tests passing. - Benchmark: `Batch::iteration` 23.01 µs (slight regression absorbed in T2). ### T2 — New API surface (breaking) Renames: - `IndexMap → KeyTable`, `Player → Rating`, `Agent → Competitor`, `Batch → TimeSlice` New types: - `Time` trait with `Untimed` ZST and `i64` impls; `Drift<T>`, `Rating<T, D>`, `Competitor<T, D>`, `TimeSlice<T>`, `History<T, D, O, K>` all generic. - `Event<T, K>`, `Team<K>`, `Member<K>`, `Outcome` (`Ranked` variant; `#[non_exhaustive]`). - `Observer<T>` trait + `NullObserver`. - `ConvergenceOptions`, `ConvergenceReport`. - `GameOptions`, `OwnedGame<T, D>`. Three-tier ingestion: - `history.record_winner(&K, &K, T)` / `record_draw(&K, &K, T)` — 1v1 convenience. - `history.add_events(iter)` — typed bulk. - `history.event(T).team([...]).weights([...]).ranking([...]).commit()` — fluent. Query API: `current_skill`, `learning_curve`, `learning_curves` (keyed on `K`), `log_evidence`, `log_evidence_for`, `predict_quality`, `predict_outcome`. Game constructors: `ranked`, `one_v_one`, `free_for_all`, `custom` — all returning `Result<_, InferenceError>`. `factors` module: `Factor`, `Schedule`, `VarStore`, `VarId`, `BuiltinFactor`, `EpsilonOrMax`, `ScheduleReport`, `TeamSumFactor`, `RankDiffFactor`, `TruncFactor` now public. Errors: `InferenceError` gains `MismatchedShape`, `InvalidProbability`, `ConvergenceFailed`; boundary panics converted to `Result`. Removed (breaking): `History::convergence(iters, eps, verbose)`, `HistoryBuilder::gamma(f64)`, `HistoryBuilder::time(bool)`, `History.time: bool`, `learning_curves_by_index`, nested-Vec public `add_events`. ## Behavior change (documented in CHANGELOG) `Time = Untimed` has `elapsed_to → 0`, so no drift accumulates between slices. The old `time=false` mode implicitly forced `elapsed=1` on reappearance via an `i64::MAX` sentinel — that quirk is not reproducible under a typed time axis. Tests that depended on it now use `History::<i64, _>` with explicit `1..=n` timestamps. One test (`test_env_ttt`) had 3 Gaussian goldens updated to reflect the corrected semantics; documented in commit `33a7d90`. ## Final numbers \| Metric \| Before T0 \| After T2 \| Delta \| \|---\|---\|---\|---\| \| `Batch::iteration` \| 29.84 µs \| 21.36 µs \| -28% \| \| `Gaussian::mul` \| 1.57 ns \| 219 ps \| -86% \| \| `Gaussian::div` \| 1.57 ns \| 219 ps \| -86% \| \| Tests passing \| 38 \| 90 \| +52 \| All other Gaussian ops unchanged (~219 ps add/sub, ~264 ps pi/tau reads). ## Test plan - [x] `cargo test --features approx` — 90/90 pass (68 lib + 10 api_shape + 6 game + 4 record_winner + 2 equivalence) - [x] `cargo clippy --all-targets --features approx -- -D warnings` — clean - [x] `cargo +nightly fmt --check` — clean - [x] `cargo bench --bench batch` — 21.36 µs - [x] `cargo bench --bench gaussian` — unchanged from T1 - [x] `cargo run --example atp --features approx` — rewritten in new API, runs clean - [x] Historical Game-level goldens preserved in `tests/equivalence.rs` - [x] Public API matches spec Section 4 (verified by integration tests in `tests/api_shape.rs`) ## Commit history ~45 commits total across T0 + T1 + T2. Each task is self-contained and individually tested; the branch is bisectable. See `git log main..t2-new-api-surface` for the full list. ## Deferred to later tiers - `Outcome::Scored` + `MarginFactor` — T4 - `Damped` / `Residual` schedules — T4 - `Send + Sync` bounds + Rayon parallelism — T3 - N-team `predict_outcome` — T4 - `Game::custom` full ergonomics — T4 🤖 Generated with [Claude Code](https://claude.com/claude-code) Reviewed-on: #1 Co-authored-by: Anders Olsson <anders.e.olsson@gmail.com> Co-committed-by: Anders Olsson <anders.e.olsson@gmail.com>	2026-04-24 11:20:04 +00:00
logaritmisk	04d5478ee4	style: cargo fmt	2026-04-23 20:23:13 +02:00
logaritmisk	a1f282a1c8	feat: added a Drift trait and a "default" ConstantDrift implementation	2026-03-16 12:06:04 +01:00
logaritmisk	853f177fa8	Small changes for new 2024 edition	2025-02-21 14:09:58 +01:00
logaritmisk	2366c45f6a	Basic test for quality	2024-04-03 10:25:10 +02:00
logaritmisk	3a22b20a17	Added todo to readme, and documentation for quality function	2024-04-03 09:53:07 +02:00
logaritmisk	02ae2f0977	Change assert to debug_assert	2024-04-03 09:44:41 +02:00
Anders Olsson	db743bc417	Improve performance	2023-10-31 10:02:07 +01:00
Anders Olsson	7e2576085f	Make quality a free standing function instead	2023-10-26 11:11:54 +02:00
Anders Olsson	062c9d3765	Added quality function	2023-10-26 11:09:30 +02:00
Anders Olsson	755a5ea668	Move stuff around	2023-10-26 11:01:14 +02:00
Anders Olsson	72e06eb536	Rename variables	2023-10-26 08:26:28 +02:00
Anders Olsson	e3eebb507c	Refactor history	2023-10-26 08:18:15 +02:00
Anders Olsson	d8dfbba251	Fix clippy warning	2023-10-25 08:16:45 +02:00
Anders Olsson	d152e356f1	Remove unnecessary allocations	2023-10-24 16:10:40 +02:00
Anders Olsson	59c256edad	Dry my eyes	2023-10-24 09:50:16 +02:00
Anders Olsson	efa235be59	Clean up	2023-10-24 09:44:42 +02:00
logaritmisk	05f178641c	Rename d to diff, and t to team	2022-12-29 20:38:36 +01:00
logaritmisk	e3906aebaa	Small refactor	2022-12-27 22:37:12 +01:00
logaritmisk	8e25826f91	More rustifying	2022-12-27 22:30:20 +01:00
logaritmisk	9b6cb9e7eb	Make it more rusty	2022-12-27 22:24:01 +01:00
logaritmisk	fdddf56156	Remove unused mut reference	2022-12-27 22:11:04 +01:00
logaritmisk	b93194f762	Added default implementation for TeamMessage	2022-12-20 11:08:27 +01:00
logaritmisk	2b83ee5ef9	Added benchmark for Batch	2022-12-19 07:42:08 +01:00
logaritmisk	2bdd3d9b89	Remove warnings and refactor some code	2022-12-16 19:46:01 +01:00
logaritmisk	912a282cd8	More refactoring	2022-12-16 15:57:56 +01:00
logaritmisk	51467f7b69	Fix clippy warning	2022-12-16 15:51:58 +01:00
logaritmisk	6dd84f7fd2	Refactor so we can see if there is any way to improve the performance	2022-12-16 15:38:29 +01:00
logaritmisk	5eb8e62d6e	Agents doens't have to be behind a mutable reference in within_priors	2022-12-15 20:13:25 +01:00
logaritmisk	8bea1b5399	Agents doens't have to be behind a mutable reference in within_prior	2022-12-15 20:12:55 +01:00
logaritmisk	6546cb54b5	Added a get function to IndexMap	2022-12-12 13:25:07 +01:00
logaritmisk	13e6454d3d	Update crates and added methods to get a key or all keys in an IndexMap	2022-12-12 10:50:59 +01:00
logaritmisk	22c61d47b1	Change time to use i64 instead of u64	2022-06-28 23:18:55 +02:00
logaritmisk	6125a81696	Fixed test	2022-06-27 11:49:47 +02:00
logaritmisk	2467d7e027	Update test to use assert_ulps_eq	2022-06-27 11:46:59 +02:00
logaritmisk	cd1079a811	Use and Index struct instead of str and String for player id	2022-06-27 10:16:12 +02:00

1 2