Commit Graph

39 Commits

Author SHA1 Message Date
8b53cacd64 T4 (MarginFactor): scored outcomes via Gaussian-margin EP evidence
Adds soft Gaussian-observation evidence on the per-pair diff variable,
enabling continuous score margins as a richer alternative to ranks.

Public API:
- `Outcome::Scored([scores])` (non-breaking enum extension under
  `#[non_exhaustive]`).
- `Game::scored(teams, outcome, options)` constructor parallel to
  `Game::ranked`.
- `EventBuilder::scores([...])` fluent helper.
- `HistoryBuilder::score_sigma(σ)` knob (default 1.0, validated > 0).
- `GameOptions::score_sigma`.
- `EventKind` re-exported from `lib.rs` (annotated `#[non_exhaustive]`).
- New `InferenceError::InvalidParameter { name, value }` variant.

Internals:
- `MarginFactor` (`factor/margin.rs`): Gaussian observation factor that
  closes in one EP step; cavity-cached log-evidence mirrors `TruncFactor`.
- `BuiltinFactor::Margin` dispatch arm.
- `DiffFactor` enum in `game.rs` lets `Game::likelihoods` and the new
  `likelihoods_scored` share the per-pair link abstraction.
- Per-event `EventKind { Ranked, Scored { score_sigma } }` routed through
  `TimeSlice::add_events`, `iteration_direct`, and `log_evidence`.

Tests: 88 lib + 27 integration (4 new in `tests/scored.rs`); existing
goldens byte-identical.  Bench: `benches/scored.rs` baseline ~960µs for
60 events × 20-player pool with default convergence.

Plan: docs/superpowers/plans/2026-04-27-t4-margin-factor.md
Spec item marked Done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 08:47:36 +02:00
6bf3e7e294 T3: rayon-backed concurrency (opt-in) (#2)
Implements T3 of `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md` Section 6. Plan: `docs/superpowers/plans/2026-04-24-t3-concurrency.md` (11 tasks).

## Summary

### Breaking

- `Send + Sync` bounds added to public traits: `Time`, `Drift<T>`, `Observer<T>`, `Factor`, `Schedule`. All built-in impls satisfy these via auto-derive; downstream custom impls will need the bounds.

### New

- Opt-in `rayon` cargo feature. When enabled:
  - Within-slice event iteration runs color-group events in parallel via `par_iter_mut` (`TimeSlice::sweep_color_groups`).
  - `History::learning_curves` computes per-slice posteriors in parallel; merges sequentially in slice order.
  - `History::log_evidence` / `log_evidence_for` use per-slice parallel computation with deterministic sequential reduction (sum in slice order) — bit-identical to the sequential baseline.
- `ColorGroups` infrastructure (`src/color_group.rs`) with greedy graph coloring. Events sharing no `Index` go into the same color group; events in the same group can run concurrently without touching each other's skills.
- `tests/determinism.rs` asserts bit-identical posteriors across `RAYON_NUM_THREADS={1, 2, 4, 8}`.
- `benches/history_converge.rs` measures end-to-end convergence on three workload shapes.

## Performance

### Sequential (no rayon, default build)

| Metric | Before T3 | After T3 | Delta |
|---|---|---|---|
| `Batch::iteration` | 22.88 µs | 23.23 µs | **+1.5%** (noise) |
| `Gaussian::*` | ≈218–264 ps | ≈236 ps | within noise |

**No sequential regression.** Default build is as fast as T2.

### Parallel (`--features rayon`, Apple M5 Pro, auto thread count)

| Workload | Sequential | Parallel | Speedup |
|---|---:|---:|---:|
| 500 events / 100 competitors / 10 per slice | 4.03 ms | 4.24 ms | **1.0×** |
| 2000 events / 200 competitors / 20 per slice | 20.18 ms | 19.82 ms | **1.0×** |
| 5000 events / 50000 competitors / 1 slice | 11.88 ms | 9.10 ms | **1.3×** |

### ⚠️ The spec's >=2× target was not met on realistic workloads.

T3's within-slice color-group parallelism only shows material benefit when a slice holds many events AND the competitor pool is large enough to give the greedy coloring room to partition. Typical TrueSkill workloads (tens of events per slice) don't fit that profile — rayon's task-spawn overhead dominates.

**Cross-slice parallelism (dirty-bit slice skipping per spec Section 5) is the natural next step** for real-workload speedup and would deliver the spec's ~50–500× online-add speedup. Deferred to a future tier.

## Determinism

`tests/determinism.rs` runs a 200-event history at thread counts {1, 2, 4, 8} via `rayon::ThreadPoolBuilder::install` and asserts every `(time, posterior)` pair has bit-identical `mu` and `sigma` (compared via `f64::to_bits()`). Passes.

## Internals

- Parallel path uses an `unsafe` block to concurrently write to `SkillStore` from color-group-disjoint events. Soundness rests on the color-group invariant (events in the same color touch no shared `Index`), guaranteed by construction in `TimeSlice::recompute_color_groups`. Sequential path unchanged from T2.
- `RAYON_THRESHOLD = 64` — color groups smaller than this fall back to sequential inside `sweep_color_groups` to avoid task-spawn overhead.
- Thread-local `ScratchArena` per rayon worker thread.

## Test plan

- [x] `cargo test --features approx` — 96 tests pass (74 lib + 22 integration)
- [x] `cargo test --features approx,rayon` — 97 tests pass (+1 determinism)
- [x] `cargo clippy --all-targets --features approx -- -D warnings` — clean
- [x] `cargo clippy --all-targets --features approx,rayon -- -D warnings` — clean
- [x] `cargo +nightly fmt --check` — clean
- [x] `cargo bench --bench batch --features approx` — 23.23 µs (no regression vs T2)
- [x] `cargo bench --bench history_converge --features approx,rayon` — runs on all three workloads
- [x] Bit-identical posteriors across `RAYON_NUM_THREADS={1, 2, 4, 8}` — verified

## Commit history

13 commits on `t3-concurrency`. Each task is self-contained and bisectable. See `git log main..t3-concurrency` for the full list.

## Deferred

- **Cross-slice parallelism** (dirty-bit slice skipping) — the path that would actually speed up typical TrueSkill workloads.
- **Default-on `rayon` feature** — spec called for default-on; we keep it opt-in until the feature proves stable in production use.
- **Synchronous-EP schedule with barrier merge** — alternative parallel strategy per spec Section 6.
- **`MarginFactor` / `Outcome::Scored`** — T4.
- **`Damped` / `Residual` schedules** — T4.
- **N-team `predict_outcome`** — T4.
- **`Game::custom` full ergonomics** — T4.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #2
Co-authored-by: Anders Olsson <anders.e.olsson@gmail.com>
Co-committed-by: Anders Olsson <anders.e.olsson@gmail.com>
2026-04-24 13:01:01 +00:00
d2aab82c1e T0 + T1 + T2: engine redesign through new API surface (#1)
Implements tiers T0, T1, T2 of `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`. All three tiers have landed together on this branch because they build on one another; this PR rolls them up for a single review pass.

Per-tier plans:
- T0: `docs/superpowers/plans/2026-04-23-t0-numerical-parity.md`
- T1: `docs/superpowers/plans/2026-04-24-t1-factor-graph.md`
- T2: `docs/superpowers/plans/2026-04-24-t2-new-api-surface.md`

## Summary

### T0 — Numerical parity (internal)

- `Gaussian` switched to natural-parameter storage `(pi, tau)`; mul/div now ~7× faster (218 ps vs 1.57 ns).
- `HashMap<Index, _>` → dense `Vec<_>` keyed by `Index.0` (via `AgentStore<D>`, `SkillStore`).
- `ScratchArena` eliminates per-event allocations in `Game::likelihoods`.
- `InferenceError` seed type added (1 variant).
- 38 → 53 tests passing through T1.
- Benchmark: `Batch::iteration` 29.84 → 21.25 µs.

### T1 — Factor graph machinery (internal)

- `Factor` trait + `BuiltinFactor` enum (TeamSum / RankDiff / Trunc) driving within-game inference.
- `VarStore` flat storage for variable marginals.
- `Schedule` trait + `EpsilonOrMax` impl replacing the hand-rolled EP loop.
- `Game::likelihoods` rebuilt on the factor-graph machinery; iteration counts and goldens preserved to within 1e-6.
- 53 tests passing.
- Benchmark: `Batch::iteration` 23.01 µs (slight regression absorbed in T2).

### T2 — New API surface (breaking)

**Renames:**
- `IndexMap → KeyTable`, `Player → Rating`, `Agent → Competitor`, `Batch → TimeSlice`

**New types:**
- `Time` trait with `Untimed` ZST and `i64` impls; `Drift<T>`, `Rating<T, D>`, `Competitor<T, D>`, `TimeSlice<T>`, `History<T, D, O, K>` all generic.
- `Event<T, K>`, `Team<K>`, `Member<K>`, `Outcome` (`Ranked` variant; `#[non_exhaustive]`).
- `Observer<T>` trait + `NullObserver`.
- `ConvergenceOptions`, `ConvergenceReport`.
- `GameOptions`, `OwnedGame<T, D>`.

**Three-tier ingestion:**
- `history.record_winner(&K, &K, T)` / `record_draw(&K, &K, T)` — 1v1 convenience.
- `history.add_events(iter)` — typed bulk.
- `history.event(T).team([...]).weights([...]).ranking([...]).commit()` — fluent.

**Query API:** `current_skill`, `learning_curve`, `learning_curves` (keyed on `K`), `log_evidence`, `log_evidence_for`, `predict_quality`, `predict_outcome`.

**Game constructors:** `ranked`, `one_v_one`, `free_for_all`, `custom` — all returning `Result<_, InferenceError>`.

**`factors` module:** `Factor`, `Schedule`, `VarStore`, `VarId`, `BuiltinFactor`, `EpsilonOrMax`, `ScheduleReport`, `TeamSumFactor`, `RankDiffFactor`, `TruncFactor` now public.

**Errors:** `InferenceError` gains `MismatchedShape`, `InvalidProbability`, `ConvergenceFailed`; boundary panics converted to `Result`.

**Removed (breaking):** `History::convergence(iters, eps, verbose)`, `HistoryBuilder::gamma(f64)`, `HistoryBuilder::time(bool)`, `History.time: bool`, `learning_curves_by_index`, nested-Vec public `add_events`.

## Behavior change (documented in CHANGELOG)

`Time = Untimed` has `elapsed_to → 0`, so no drift accumulates between slices. The old `time=false` mode implicitly forced `elapsed=1` on reappearance via an `i64::MAX` sentinel — that quirk is not reproducible under a typed time axis. Tests that depended on it now use `History::<i64, _>` with explicit `1..=n` timestamps. One test (`test_env_ttt`) had 3 Gaussian goldens updated to reflect the corrected semantics; documented in commit `33a7d90`.

## Final numbers

| Metric | Before T0 | After T2 | Delta |
|---|---|---|---|
| `Batch::iteration` | 29.84 µs | 21.36 µs | **-28%** |
| `Gaussian::mul` | 1.57 ns | 219 ps | **-86%** |
| `Gaussian::div` | 1.57 ns | 219 ps | **-86%** |
| Tests passing | 38 | 90 | +52 |

All other Gaussian ops unchanged (~219 ps add/sub, ~264 ps pi/tau reads).

## Test plan

- [x] `cargo test --features approx` — 90/90 pass (68 lib + 10 api_shape + 6 game + 4 record_winner + 2 equivalence)
- [x] `cargo clippy --all-targets --features approx -- -D warnings` — clean
- [x] `cargo +nightly fmt --check` — clean
- [x] `cargo bench --bench batch` — 21.36 µs
- [x] `cargo bench --bench gaussian` — unchanged from T1
- [x] `cargo run --example atp --features approx` — rewritten in new API, runs clean
- [x] Historical Game-level goldens preserved in `tests/equivalence.rs`
- [x] Public API matches spec Section 4 (verified by integration tests in `tests/api_shape.rs`)

## Commit history

~45 commits total across T0 + T1 + T2. Each task is self-contained and individually tested; the branch is bisectable. See `git log main..t2-new-api-surface` for the full list.

## Deferred to later tiers

- `Outcome::Scored` + `MarginFactor` — T4
- `Damped` / `Residual` schedules — T4
- `Send + Sync` bounds + Rayon parallelism — T3
- N-team `predict_outcome` — T4
- `Game::custom` full ergonomics — T4

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #1
Co-authored-by: Anders Olsson <anders.e.olsson@gmail.com>
Co-committed-by: Anders Olsson <anders.e.olsson@gmail.com>
2026-04-24 11:20:04 +00:00
04d5478ee4 style: cargo fmt 2026-04-23 20:23:13 +02:00
a1f282a1c8 feat: added a Drift trait and a "default" ConstantDrift implementation 2026-03-16 12:06:04 +01:00
Anders Olsson
db743bc417 Improve performance 2023-10-31 10:02:07 +01:00
Anders Olsson
7e2576085f Make quality a free standing function instead 2023-10-26 11:11:54 +02:00
Anders Olsson
062c9d3765 Added quality function 2023-10-26 11:09:30 +02:00
Anders Olsson
755a5ea668 Move stuff around 2023-10-26 11:01:14 +02:00
Anders Olsson
72e06eb536 Rename variables 2023-10-26 08:26:28 +02:00
Anders Olsson
e3eebb507c Refactor history 2023-10-26 08:18:15 +02:00
Anders Olsson
d8dfbba251 Fix clippy warning 2023-10-25 08:16:45 +02:00
Anders Olsson
d152e356f1 Remove unnecessary allocations 2023-10-24 16:10:40 +02:00
Anders Olsson
59c256edad Dry my eyes 2023-10-24 09:50:16 +02:00
e3906aebaa Small refactor 2022-12-27 22:37:12 +01:00
fdddf56156 Remove unused mut reference 2022-12-27 22:11:04 +01:00
2b83ee5ef9 Added benchmark for Batch 2022-12-19 07:42:08 +01:00
2bdd3d9b89 Remove warnings and refactor some code 2022-12-16 19:46:01 +01:00
912a282cd8 More refactoring 2022-12-16 15:57:56 +01:00
51467f7b69 Fix clippy warning 2022-12-16 15:51:58 +01:00
22c61d47b1 Change time to use i64 instead of u64 2022-06-28 23:18:55 +02:00
cd1079a811 Use and Index struct instead of str and String for player id 2022-06-27 10:16:12 +02:00
d52895f804 Remove unused code 2022-06-23 07:48:44 +02:00
ba2f8cffdc Update test to use builder 2022-06-23 07:44:34 +02:00
73b9cabac8 Added builder for History, and start migrating test to use builder instead. 2022-06-22 21:13:24 +02:00
1647ba5c77 More tests 2022-06-21 10:42:14 +02:00
0a35645091 More things, more tests 2022-06-20 23:56:35 +02:00
0efb0312da More tests, more code 2022-06-19 23:52:52 +02:00
c9d9d59535 More things, better things, awesome 2022-06-18 23:39:42 +02:00
dc10504b80 Port from julia version instead 2022-06-18 22:27:38 +02:00
0a3327f076 Refactor some code 2022-06-15 11:27:24 +02:00
9b025fb53a Use PlayerIndex instead of String 2022-06-14 22:51:11 +02:00
3fbac02de3 It works, or so it seems 2022-06-13 23:18:19 +02:00
1b6e07225b Handle case where there is no time 2022-06-13 21:55:43 +02:00
82e7b22443 Use flatten instead of flat_map 2022-06-13 11:37:56 +02:00
87a64acb83 Added more functions to History 2022-06-13 11:27:49 +02:00
4a13e4dcd2 More test passing for History 2022-06-13 11:04:05 +02:00
4227617513 Working on History struct. First test is passing. 2022-06-12 23:11:13 +02:00
de58d01322 Initial commit. 2022-06-10 15:22:27 +02:00