Batch::iteration sequential: 23.23 µs (no regression vs T2 baseline).
Gaussian ops unchanged.
End-to-end history_converge benchmark on Apple M5 Pro:
Workload seq rayon speedup
500 events / 100 competitors / 10 per slice 4.03 ms 4.24 ms 1.0x
2000 events / 200 competitors / 20 per slice 20.18 ms 19.82 ms 1.0x
5000 events / 50000 competitors / 1 slice 11.88 ms 9.10 ms 1.3x
The spec's >=2x target is not achieved on realistic workloads. T3's
within-slice color-group parallelism only shows material benefit when
a slice holds many events AND the competitor pool is large enough to
give the greedy coloring room to partition. Typical TrueSkill
workloads don't fit that profile. Cross-slice parallelism (dirty-bit
slice skipping, spec Section 5) is the natural next step for
real-workload speedup.
Determinism verified: bit-identical posteriors across
RAYON_NUM_THREADS={1, 2, 4, 8}.
Closes T3 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
235 lines
8.8 KiB
Markdown
235 lines
8.8 KiB
Markdown
# Changelog
|
||
|
||
All notable changes to this project will be documented in this file.
|
||
|
||
## Unreleased — T3 concurrency
|
||
|
||
Adds rayon-backed parallel paths per Section 6 of
|
||
`docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`.
|
||
|
||
### Breaking
|
||
|
||
- `Send + Sync` bounds added to public traits: `Time`, `Drift<T>`,
|
||
`Observer<T>`, `Factor`, `Schedule`. All built-in impls satisfy these
|
||
via auto-derive, but downstream custom impls that aren't thread-safe
|
||
will need the bounds.
|
||
|
||
### New
|
||
|
||
- Opt-in `rayon` cargo feature. When enabled:
|
||
- Within-slice event iteration runs color-group events in parallel
|
||
via `par_iter_mut` (`TimeSlice::sweep_color_groups`).
|
||
- `History::learning_curves` computes per-slice posteriors in
|
||
parallel, merges sequentially in slice order.
|
||
- `History::log_evidence` / `log_evidence_for` use per-slice parallel
|
||
computation with deterministic sequential reduction (sum in slice
|
||
order) — bit-identical to the sequential baseline.
|
||
- `ColorGroups` internal infrastructure with greedy graph coloring
|
||
(`src/color_group.rs`). Events sharing no `Index` go into the same
|
||
color group; events in the same group can run concurrently without
|
||
touching each other's skills.
|
||
- `tests/determinism.rs` asserts bit-identical posteriors across
|
||
`RAYON_NUM_THREADS={1, 2, 4, 8}`.
|
||
- `benches/history_converge.rs` measures end-to-end convergence on
|
||
three workload shapes.
|
||
|
||
### Performance notes
|
||
|
||
- Default build (no rayon): `Batch::iteration` 23.23 µs — no regression
|
||
vs T2.
|
||
- With `--features rayon`:
|
||
- 500 events / 100 competitors / 10 per slice: 1.0× speedup.
|
||
- 2000 events / 200 competitors / 20 per slice: 1.0× speedup.
|
||
- 5000 events in one slice / 50k competitors: **1.3× speedup.**
|
||
- The spec targeted >2× speedup on 8-core offline converge. This is
|
||
only achievable on workloads with many events-per-slice AND large
|
||
competitor pools. **Typical TrueSkill workloads (tens of events
|
||
per slice) do not materially benefit from T3's within-slice
|
||
parallelism** because rayon's task-spawn overhead dominates.
|
||
- Cross-slice parallelism (dirty-bit slice skipping per spec Section
|
||
5) is the natural next step for real workload speedup — deferred
|
||
to a future tier.
|
||
|
||
### Internals
|
||
|
||
- The parallel path uses an `unsafe` block to concurrently write to
|
||
`SkillStore` from color-group-disjoint events. Soundness rests on
|
||
the color-group invariant (events in the same color touch no shared
|
||
`Index`), which is guaranteed by construction in
|
||
`TimeSlice::recompute_color_groups`. Sequential path unchanged.
|
||
- `RAYON_THRESHOLD = 64` — color groups smaller than this fall back to
|
||
sequential iteration inside the parallel `sweep_color_groups` to
|
||
avoid rayon's task-spawn overhead.
|
||
- Thread-local `ScratchArena` per rayon worker thread.
|
||
|
||
## Unreleased — T2 new API surface
|
||
|
||
Breaking: every renamed type and the new public API land together per
|
||
`docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`
|
||
Section 7 "T2".
|
||
|
||
### Breaking renames
|
||
|
||
- `Batch` → `TimeSlice`
|
||
- `Player` → `Rating` (and the `.player` field on `Competitor` is now `.rating`)
|
||
- `Agent` → `Competitor`
|
||
- `IndexMap` → `KeyTable`
|
||
- `History` field `.batches` → `.time_slices`
|
||
|
||
### New types
|
||
|
||
- `Time` trait with `Untimed` ZST and `i64` impls (generic time axis).
|
||
- `Drift<T: Time>` — generified from the old `Drift` trait.
|
||
- `Event<T, K>`, `Team<K>`, `Member<K>` — typed bulk-ingest event shape.
|
||
- `Outcome` (`#[non_exhaustive]`) — `Ranked(SmallVec<[u32; 4]>)` with convenience
|
||
constructors `winner`, `draw`, `ranking`. `Scored` lands in T4.
|
||
- `Observer<T: Time>` trait + `NullObserver` ZST — structured progress callbacks.
|
||
- `ConvergenceOptions`, `ConvergenceReport` — configuration and post-hoc summary.
|
||
- `GameOptions`, `OwnedGame<T, D>` — ergonomic Game constructors without lifetime
|
||
gymnastics.
|
||
- `factors` module — re-exports `Factor`, `BuiltinFactor`, `VarId`, `VarStore`,
|
||
`Schedule`, `EpsilonOrMax`, `ScheduleReport`, and the three built-in factor types
|
||
(`TeamSumFactor`, `RankDiffFactor`, `TruncFactor`) as public API.
|
||
|
||
### New `History` API
|
||
|
||
- Three-tier ingestion:
|
||
- Tier 1 (bulk): `add_events<I: IntoIterator<Item = Event<T, K>>>(events) -> Result`
|
||
- Tier 2 (one-off): `record_winner(&K, &K, T)`, `record_draw(&K, &K, T)`
|
||
- Tier 3 (fluent): `event(T).team([...]).weights([...]).ranking([...]).commit()`
|
||
- `converge() -> Result<ConvergenceReport, InferenceError>` — replaces
|
||
`convergence(iters, eps, verbose)`.
|
||
- `current_skill(&K)`, `learning_curve(&K)`, `learning_curves()` (now keyed on `K`).
|
||
- `log_evidence()` zero-arg, `log_evidence_for(&[&K])`.
|
||
- `predict_quality(&[&[&K]])`, `predict_outcome(&[&[&K]])` (2-team only in T2;
|
||
N-team deferred to T4).
|
||
- `intern(&Q)` / `lookup(&Q)` expose the internal `KeyTable<K>` for power users.
|
||
- `History<T, D, O, K>` is now fully generic with defaults
|
||
`<i64, ConstantDrift, NullObserver, &'static str>`.
|
||
|
||
### New `Game` API
|
||
|
||
- `Game::ranked(&[&[Rating]], Outcome, &GameOptions) -> Result<OwnedGame, _>`.
|
||
- `Game::one_v_one(&Rating, &Rating, Outcome) -> Result<(Gaussian, Gaussian), _>`.
|
||
- `Game::free_for_all(&[&Rating], Outcome, &GameOptions) -> Result<OwnedGame, _>`.
|
||
- `Game::custom(...)` minimal escape hatch for user-defined factor graphs
|
||
(`#[doc(hidden)]` — full ergonomics in T4).
|
||
- `Game::log_evidence()` and `OwnedGame::log_evidence()` accessors.
|
||
|
||
### Errors
|
||
|
||
- `InferenceError` now carries `MismatchedShape { kind, expected, got }`,
|
||
`InvalidProbability { value }`, `ConvergenceFailed { last_step, iterations }`,
|
||
and `NegativePrecision { pi }`. Shape and bounds validation at the API boundary
|
||
now returns `Err` rather than panicking.
|
||
|
||
### Removed (breaking)
|
||
|
||
- `History::convergence(iters, eps, verbose)` — use `converge()`.
|
||
- `HistoryBuilder::gamma(f64)` — use `.drift(ConstantDrift(g))`.
|
||
- `HistoryBuilder::time(bool)` and `History.time: bool` — use the `Time` type parameter.
|
||
- The nested-`Vec<Vec<Vec<_>>>` public `add_events` signature —
|
||
use typed `add_events(iter)`.
|
||
- `learning_curves_by_index()` — use `learning_curves()`.
|
||
|
||
### Performance
|
||
|
||
`Batch::iteration` bench: **21.36 µs** (T1 was 22.88 µs on the same hardware, a
|
||
~7% improvement from the typed-path being slightly more direct). Gaussian
|
||
operations unchanged.
|
||
|
||
### Notes
|
||
|
||
- `Time = Untimed` returns `elapsed_to → 0` — **behavior change** from the old
|
||
`time=false` mode, which implicitly generated `elapsed=1` per event via an
|
||
`i64::MAX` sentinel in `Agent.last_time`. Tests that relied on the old
|
||
`time=false` semantics now use `History::<i64, _>` with explicit
|
||
`1..=n` timestamps.
|
||
|
||
## 0.1.0 - 2026-04-23
|
||
|
||
### Features
|
||
|
||
- feat: added a Drift trait and a "default" ConstantDrift implementation
|
||
|
||
### Miscellaneous Tasks
|
||
|
||
- chore: added cliff.toml, release.toml and rustfmt.toml
|
||
- chore: clean up
|
||
|
||
### Other (unconventional)
|
||
|
||
- Initial commit.
|
||
- Begin working on batch.
|
||
- Passing tests for Batch
|
||
- Working on History struct. First test is passing.
|
||
- More test passing for History
|
||
- Added more functions to History
|
||
- Remove Display impl, better to use Debug
|
||
- Use flatten instead of flat_map
|
||
- Handle case where there is no time
|
||
- It works, or so it seems
|
||
- Use PlayerIndex instead of String
|
||
- Inline a lot of functions
|
||
- Refactor some code
|
||
- Refactor some stuff
|
||
- Port from julia version instead
|
||
- More things, better things, awesome
|
||
- More tests, more code
|
||
- More things, more tests
|
||
- Fix tests
|
||
- More tests
|
||
- More tests
|
||
- Added builder for History, and start migrating test to use builder instead.
|
||
- Update test to use builder
|
||
- Remove unused code
|
||
- Use and Index struct instead of str and String for player id
|
||
- Update example so now it works, and thats, well, good
|
||
- Update test to use assert_ulps_eq
|
||
- Fixed test
|
||
- Change time to use i64 instead of u64
|
||
- Small change
|
||
- Clean up example
|
||
- Update crates and added methods to get a key or all keys in an IndexMap
|
||
- Added a get function to IndexMap
|
||
- Agents doens't have to be behind a mutable reference in within_prior
|
||
- Agents doens't have to be behind a mutable reference in within_priors
|
||
- Refactor so we can see if there is any way to improve the performance
|
||
- Fix clippy warning
|
||
- More refactoring
|
||
- Remove warnings and refactor some code
|
||
- Added benchmark for Batch
|
||
- Added default implementation for TeamMessage
|
||
- Remove unused mut reference
|
||
- Make it more rusty
|
||
- More rustifying
|
||
- Small refactor
|
||
- Rename d to diff, and t to team
|
||
- Added more links to readme
|
||
- Fix broken link in README
|
||
- Update crates
|
||
- Clean up
|
||
- Dry my eyes
|
||
- Remove unnecessary allocations
|
||
- Fix clippy warning
|
||
- Refactor history
|
||
- Rename variables
|
||
- Move stuff around
|
||
- Added quality function
|
||
- Make quality a free standing function instead
|
||
- Improve performance
|
||
- Change assert to debug_assert
|
||
- Added todo to readme, and documentation for quality function
|
||
- Basic test for quality
|
||
- Ignore temp folder
|
||
- Update edition
|
||
- Small changes for new 2024 edition
|
||
- remove notepad
|
||
- added benchmark
|
||
|
||
### Styling
|
||
|
||
- style: cargo fmt
|
||
|
||
<!-- generated by git-cliff -->
|