diff --git a/CHANGELOG.md b/CHANGELOG.md index e5136db..4f73828 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,149 +2,13 @@ All notable changes to this project will be documented in this file. -## Unreleased — T3 concurrency +## 0.1.1 - 2026-04-27 -Adds rayon-backed parallel paths per Section 6 of -`docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`. +### Other (unconventional) -### Breaking - -- `Send + Sync` bounds added to public traits: `Time`, `Drift`, - `Observer`, `Factor`, `Schedule`. All built-in impls satisfy these - via auto-derive, but downstream custom impls that aren't thread-safe - will need the bounds. - -### New - -- Opt-in `rayon` cargo feature. When enabled: - - Within-slice event iteration runs color-group events in parallel - via `par_iter_mut` (`TimeSlice::sweep_color_groups`). - - `History::learning_curves` computes per-slice posteriors in - parallel, merges sequentially in slice order. - - `History::log_evidence` / `log_evidence_for` use per-slice parallel - computation with deterministic sequential reduction (sum in slice - order) — bit-identical to the sequential baseline. -- `ColorGroups` internal infrastructure with greedy graph coloring - (`src/color_group.rs`). Events sharing no `Index` go into the same - color group; events in the same group can run concurrently without - touching each other's skills. -- `tests/determinism.rs` asserts bit-identical posteriors across - `RAYON_NUM_THREADS={1, 2, 4, 8}`. -- `benches/history_converge.rs` measures end-to-end convergence on - three workload shapes. - -### Performance notes - -- Default build (no rayon): `Batch::iteration` 23.23 µs — no regression - vs T2. -- With `--features rayon`: - - 500 events / 100 competitors / 10 per slice: 1.0× speedup. - - 2000 events / 200 competitors / 20 per slice: 1.0× speedup. - - 5000 events in one slice / 50k competitors: **1.3× speedup.** -- The spec targeted >2× speedup on 8-core offline converge. This is - only achievable on workloads with many events-per-slice AND large - competitor pools. **Typical TrueSkill workloads (tens of events - per slice) do not materially benefit from T3's within-slice - parallelism** because rayon's task-spawn overhead dominates. -- Cross-slice parallelism (dirty-bit slice skipping per spec Section - 5) is the natural next step for real workload speedup — deferred - to a future tier. - -### Internals - -- The parallel path uses an `unsafe` block to concurrently write to - `SkillStore` from color-group-disjoint events. Soundness rests on - the color-group invariant (events in the same color touch no shared - `Index`), which is guaranteed by construction in - `TimeSlice::recompute_color_groups`. Sequential path unchanged. -- `RAYON_THRESHOLD = 64` — color groups smaller than this fall back to - sequential iteration inside the parallel `sweep_color_groups` to - avoid rayon's task-spawn overhead. -- Thread-local `ScratchArena` per rayon worker thread. - -## Unreleased — T2 new API surface - -Breaking: every renamed type and the new public API land together per -`docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md` -Section 7 "T2". - -### Breaking renames - -- `Batch` → `TimeSlice` -- `Player` → `Rating` (and the `.player` field on `Competitor` is now `.rating`) -- `Agent` → `Competitor` -- `IndexMap` → `KeyTable` -- `History` field `.batches` → `.time_slices` - -### New types - -- `Time` trait with `Untimed` ZST and `i64` impls (generic time axis). -- `Drift` — generified from the old `Drift` trait. -- `Event`, `Team`, `Member` — typed bulk-ingest event shape. -- `Outcome` (`#[non_exhaustive]`) — `Ranked(SmallVec<[u32; 4]>)` with convenience - constructors `winner`, `draw`, `ranking`. `Scored` lands in T4. -- `Observer` trait + `NullObserver` ZST — structured progress callbacks. -- `ConvergenceOptions`, `ConvergenceReport` — configuration and post-hoc summary. -- `GameOptions`, `OwnedGame` — ergonomic Game constructors without lifetime - gymnastics. -- `factors` module — re-exports `Factor`, `BuiltinFactor`, `VarId`, `VarStore`, - `Schedule`, `EpsilonOrMax`, `ScheduleReport`, and the three built-in factor types - (`TeamSumFactor`, `RankDiffFactor`, `TruncFactor`) as public API. - -### New `History` API - -- Three-tier ingestion: - - Tier 1 (bulk): `add_events>>(events) -> Result` - - Tier 2 (one-off): `record_winner(&K, &K, T)`, `record_draw(&K, &K, T)` - - Tier 3 (fluent): `event(T).team([...]).weights([...]).ranking([...]).commit()` -- `converge() -> Result` — replaces - `convergence(iters, eps, verbose)`. -- `current_skill(&K)`, `learning_curve(&K)`, `learning_curves()` (now keyed on `K`). -- `log_evidence()` zero-arg, `log_evidence_for(&[&K])`. -- `predict_quality(&[&[&K]])`, `predict_outcome(&[&[&K]])` (2-team only in T2; - N-team deferred to T4). -- `intern(&Q)` / `lookup(&Q)` expose the internal `KeyTable` for power users. -- `History` is now fully generic with defaults - ``. - -### New `Game` API - -- `Game::ranked(&[&[Rating]], Outcome, &GameOptions) -> Result`. -- `Game::one_v_one(&Rating, &Rating, Outcome) -> Result<(Gaussian, Gaussian), _>`. -- `Game::free_for_all(&[&Rating], Outcome, &GameOptions) -> Result`. -- `Game::custom(...)` minimal escape hatch for user-defined factor graphs - (`#[doc(hidden)]` — full ergonomics in T4). -- `Game::log_evidence()` and `OwnedGame::log_evidence()` accessors. - -### Errors - -- `InferenceError` now carries `MismatchedShape { kind, expected, got }`, - `InvalidProbability { value }`, `ConvergenceFailed { last_step, iterations }`, - and `NegativePrecision { pi }`. Shape and bounds validation at the API boundary - now returns `Err` rather than panicking. - -### Removed (breaking) - -- `History::convergence(iters, eps, verbose)` — use `converge()`. -- `HistoryBuilder::gamma(f64)` — use `.drift(ConstantDrift(g))`. -- `HistoryBuilder::time(bool)` and `History.time: bool` — use the `Time` type parameter. -- The nested-`Vec>>` public `add_events` signature — - use typed `add_events(iter)`. -- `learning_curves_by_index()` — use `learning_curves()`. - -### Performance - -`Batch::iteration` bench: **21.36 µs** (T1 was 22.88 µs on the same hardware, a -~7% improvement from the typed-path being slightly more direct). Gaussian -operations unchanged. - -### Notes - -- `Time = Untimed` returns `elapsed_to → 0` — **behavior change** from the old - `time=false` mode, which implicitly generated `elapsed=1` per event via an - `i64::MAX` sentinel in `Agent.last_time`. Tests that relied on the old - `time=false` semantics now use `History::` with explicit - `1..=n` timestamps. +- T0 + T1 + T2: engine redesign through new API surface (#1) +- T3: rayon-backed concurrency (opt-in) (#2) +- T4 (MarginFactor): scored outcomes via Gaussian-margin EP evidence ## 0.1.0 - 2026-04-23 @@ -156,6 +20,8 @@ operations unchanged. - chore: added cliff.toml, release.toml and rustfmt.toml - chore: clean up +- chore: make cargo release add CHANGELOG.md before commit +- chore: do not publish ### Other (unconventional) diff --git a/release.toml b/release.toml index 2af34d1..e32cc02 100644 --- a/release.toml +++ b/release.toml @@ -1,2 +1,2 @@ publish = false -pre-release-hook = ["sh", "-c", "git cliff -o ../CHANGELOG.md --tag {{version}} && git add CHANGELOG.md"] +pre-release-hook = ["sh", "-c", "git cliff -o CHANGELOG.md --tag {{version}} && git add CHANGELOG.md"]