Files
trueskill-tt/CHANGELOG.md
Anders Olsson db633bdafe bench,docs: capture T3 final numbers and update CHANGELOG
Batch::iteration sequential: 23.23 µs (no regression vs T2 baseline).
Gaussian ops unchanged.

End-to-end history_converge benchmark on Apple M5 Pro:
  Workload                                        seq       rayon    speedup
  500 events / 100 competitors / 10 per slice     4.03 ms   4.24 ms  1.0x
  2000 events / 200 competitors / 20 per slice   20.18 ms  19.82 ms  1.0x
  5000 events / 50000 competitors / 1 slice      11.88 ms   9.10 ms  1.3x

The spec's >=2x target is not achieved on realistic workloads. T3's
within-slice color-group parallelism only shows material benefit when
a slice holds many events AND the competitor pool is large enough to
give the greedy coloring room to partition. Typical TrueSkill
workloads don't fit that profile. Cross-slice parallelism (dirty-bit
slice skipping, spec Section 5) is the natural next step for
real-workload speedup.

Determinism verified: bit-identical posteriors across
RAYON_NUM_THREADS={1, 2, 4, 8}.

Closes T3 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:58:24 +02:00

8.8 KiB
Raw Blame History

Changelog

All notable changes to this project will be documented in this file.

Unreleased — T3 concurrency

Adds rayon-backed parallel paths per Section 6 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Breaking

  • Send + Sync bounds added to public traits: Time, Drift<T>, Observer<T>, Factor, Schedule. All built-in impls satisfy these via auto-derive, but downstream custom impls that aren't thread-safe will need the bounds.

New

  • Opt-in rayon cargo feature. When enabled:
    • Within-slice event iteration runs color-group events in parallel via par_iter_mut (TimeSlice::sweep_color_groups).
    • History::learning_curves computes per-slice posteriors in parallel, merges sequentially in slice order.
    • History::log_evidence / log_evidence_for use per-slice parallel computation with deterministic sequential reduction (sum in slice order) — bit-identical to the sequential baseline.
  • ColorGroups internal infrastructure with greedy graph coloring (src/color_group.rs). Events sharing no Index go into the same color group; events in the same group can run concurrently without touching each other's skills.
  • tests/determinism.rs asserts bit-identical posteriors across RAYON_NUM_THREADS={1, 2, 4, 8}.
  • benches/history_converge.rs measures end-to-end convergence on three workload shapes.

Performance notes

  • Default build (no rayon): Batch::iteration 23.23 µs — no regression vs T2.
  • With --features rayon:
    • 500 events / 100 competitors / 10 per slice: 1.0× speedup.
    • 2000 events / 200 competitors / 20 per slice: 1.0× speedup.
    • 5000 events in one slice / 50k competitors: 1.3× speedup.
  • The spec targeted >2× speedup on 8-core offline converge. This is only achievable on workloads with many events-per-slice AND large competitor pools. Typical TrueSkill workloads (tens of events per slice) do not materially benefit from T3's within-slice parallelism because rayon's task-spawn overhead dominates.
  • Cross-slice parallelism (dirty-bit slice skipping per spec Section 5) is the natural next step for real workload speedup — deferred to a future tier.

Internals

  • The parallel path uses an unsafe block to concurrently write to SkillStore from color-group-disjoint events. Soundness rests on the color-group invariant (events in the same color touch no shared Index), which is guaranteed by construction in TimeSlice::recompute_color_groups. Sequential path unchanged.
  • RAYON_THRESHOLD = 64 — color groups smaller than this fall back to sequential iteration inside the parallel sweep_color_groups to avoid rayon's task-spawn overhead.
  • Thread-local ScratchArena per rayon worker thread.

Unreleased — T2 new API surface

Breaking: every renamed type and the new public API land together per docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md Section 7 "T2".

Breaking renames

  • BatchTimeSlice
  • PlayerRating (and the .player field on Competitor is now .rating)
  • AgentCompetitor
  • IndexMapKeyTable
  • History field .batches.time_slices

New types

  • Time trait with Untimed ZST and i64 impls (generic time axis).
  • Drift<T: Time> — generified from the old Drift trait.
  • Event<T, K>, Team<K>, Member<K> — typed bulk-ingest event shape.
  • Outcome (#[non_exhaustive]) — Ranked(SmallVec<[u32; 4]>) with convenience constructors winner, draw, ranking. Scored lands in T4.
  • Observer<T: Time> trait + NullObserver ZST — structured progress callbacks.
  • ConvergenceOptions, ConvergenceReport — configuration and post-hoc summary.
  • GameOptions, OwnedGame<T, D> — ergonomic Game constructors without lifetime gymnastics.
  • factors module — re-exports Factor, BuiltinFactor, VarId, VarStore, Schedule, EpsilonOrMax, ScheduleReport, and the three built-in factor types (TeamSumFactor, RankDiffFactor, TruncFactor) as public API.

New History API

  • Three-tier ingestion:
    • Tier 1 (bulk): add_events<I: IntoIterator<Item = Event<T, K>>>(events) -> Result
    • Tier 2 (one-off): record_winner(&K, &K, T), record_draw(&K, &K, T)
    • Tier 3 (fluent): event(T).team([...]).weights([...]).ranking([...]).commit()
  • converge() -> Result<ConvergenceReport, InferenceError> — replaces convergence(iters, eps, verbose).
  • current_skill(&K), learning_curve(&K), learning_curves() (now keyed on K).
  • log_evidence() zero-arg, log_evidence_for(&[&K]).
  • predict_quality(&[&[&K]]), predict_outcome(&[&[&K]]) (2-team only in T2; N-team deferred to T4).
  • intern(&Q) / lookup(&Q) expose the internal KeyTable<K> for power users.
  • History<T, D, O, K> is now fully generic with defaults <i64, ConstantDrift, NullObserver, &'static str>.

New Game API

  • Game::ranked(&[&[Rating]], Outcome, &GameOptions) -> Result<OwnedGame, _>.
  • Game::one_v_one(&Rating, &Rating, Outcome) -> Result<(Gaussian, Gaussian), _>.
  • Game::free_for_all(&[&Rating], Outcome, &GameOptions) -> Result<OwnedGame, _>.
  • Game::custom(...) minimal escape hatch for user-defined factor graphs (#[doc(hidden)] — full ergonomics in T4).
  • Game::log_evidence() and OwnedGame::log_evidence() accessors.

Errors

  • InferenceError now carries MismatchedShape { kind, expected, got }, InvalidProbability { value }, ConvergenceFailed { last_step, iterations }, and NegativePrecision { pi }. Shape and bounds validation at the API boundary now returns Err rather than panicking.

Removed (breaking)

  • History::convergence(iters, eps, verbose) — use converge().
  • HistoryBuilder::gamma(f64) — use .drift(ConstantDrift(g)).
  • HistoryBuilder::time(bool) and History.time: bool — use the Time type parameter.
  • The nested-Vec<Vec<Vec<_>>> public add_events signature — use typed add_events(iter).
  • learning_curves_by_index() — use learning_curves().

Performance

Batch::iteration bench: 21.36 µs (T1 was 22.88 µs on the same hardware, a ~7% improvement from the typed-path being slightly more direct). Gaussian operations unchanged.

Notes

  • Time = Untimed returns elapsed_to → 0behavior change from the old time=false mode, which implicitly generated elapsed=1 per event via an i64::MAX sentinel in Agent.last_time. Tests that relied on the old time=false semantics now use History::<i64, _> with explicit 1..=n timestamps.

0.1.0 - 2026-04-23

Features

  • feat: added a Drift trait and a "default" ConstantDrift implementation

Miscellaneous Tasks

  • chore: added cliff.toml, release.toml and rustfmt.toml
  • chore: clean up

Other (unconventional)

  • Initial commit.
  • Begin working on batch.
  • Passing tests for Batch
  • Working on History struct. First test is passing.
  • More test passing for History
  • Added more functions to History
  • Remove Display impl, better to use Debug
  • Use flatten instead of flat_map
  • Handle case where there is no time
  • It works, or so it seems
  • Use PlayerIndex instead of String
  • Inline a lot of functions
  • Refactor some code
  • Refactor some stuff
  • Port from julia version instead
  • More things, better things, awesome
  • More tests, more code
  • More things, more tests
  • Fix tests
  • More tests
  • More tests
  • Added builder for History, and start migrating test to use builder instead.
  • Update test to use builder
  • Remove unused code
  • Use and Index struct instead of str and String for player id
  • Update example so now it works, and thats, well, good
  • Update test to use assert_ulps_eq
  • Fixed test
  • Change time to use i64 instead of u64
  • Small change
  • Clean up example
  • Update crates and added methods to get a key or all keys in an IndexMap
  • Added a get function to IndexMap
  • Agents doens't have to be behind a mutable reference in within_prior
  • Agents doens't have to be behind a mutable reference in within_priors
  • Refactor so we can see if there is any way to improve the performance
  • Fix clippy warning
  • More refactoring
  • Remove warnings and refactor some code
  • Added benchmark for Batch
  • Added default implementation for TeamMessage
  • Remove unused mut reference
  • Make it more rusty
  • More rustifying
  • Small refactor
  • Rename d to diff, and t to team
  • Added more links to readme
  • Fix broken link in README
  • Update crates
  • Clean up
  • Dry my eyes
  • Remove unnecessary allocations
  • Fix clippy warning
  • Refactor history
  • Rename variables
  • Move stuff around
  • Added quality function
  • Make quality a free standing function instead
  • Improve performance
  • Change assert to debug_assert
  • Added todo to readme, and documentation for quality function
  • Basic test for quality
  • Ignore temp folder
  • Update edition
  • Small changes for new 2024 edition
  • remove notepad
  • added benchmark

Styling

  • style: cargo fmt