Batch::iteration sequential: 23.23 µs (no regression vs T2 baseline).
Gaussian ops unchanged.
End-to-end history_converge benchmark on Apple M5 Pro:
Workload seq rayon speedup
500 events / 100 competitors / 10 per slice 4.03 ms 4.24 ms 1.0x
2000 events / 200 competitors / 20 per slice 20.18 ms 19.82 ms 1.0x
5000 events / 50000 competitors / 1 slice 11.88 ms 9.10 ms 1.3x
The spec's >=2x target is not achieved on realistic workloads. T3's
within-slice color-group parallelism only shows material benefit when
a slice holds many events AND the competitor pool is large enough to
give the greedy coloring room to partition. Typical TrueSkill
workloads don't fit that profile. Cross-slice parallelism (dirty-bit
slice skipping, spec Section 5) is the natural next step for
real-workload speedup.
Determinism verified: bit-identical posteriors across
RAYON_NUM_THREADS={1, 2, 4, 8}.
Closes T3 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.8 KiB
8.8 KiB
Changelog
All notable changes to this project will be documented in this file.
Unreleased — T3 concurrency
Adds rayon-backed parallel paths per Section 6 of
docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
Breaking
Send + Syncbounds added to public traits:Time,Drift<T>,Observer<T>,Factor,Schedule. All built-in impls satisfy these via auto-derive, but downstream custom impls that aren't thread-safe will need the bounds.
New
- Opt-in
rayoncargo feature. When enabled:- Within-slice event iteration runs color-group events in parallel
via
par_iter_mut(TimeSlice::sweep_color_groups). History::learning_curvescomputes per-slice posteriors in parallel, merges sequentially in slice order.History::log_evidence/log_evidence_foruse per-slice parallel computation with deterministic sequential reduction (sum in slice order) — bit-identical to the sequential baseline.
- Within-slice event iteration runs color-group events in parallel
via
ColorGroupsinternal infrastructure with greedy graph coloring (src/color_group.rs). Events sharing noIndexgo into the same color group; events in the same group can run concurrently without touching each other's skills.tests/determinism.rsasserts bit-identical posteriors acrossRAYON_NUM_THREADS={1, 2, 4, 8}.benches/history_converge.rsmeasures end-to-end convergence on three workload shapes.
Performance notes
- Default build (no rayon):
Batch::iteration23.23 µs — no regression vs T2. - With
--features rayon:- 500 events / 100 competitors / 10 per slice: 1.0× speedup.
- 2000 events / 200 competitors / 20 per slice: 1.0× speedup.
- 5000 events in one slice / 50k competitors: 1.3× speedup.
- The spec targeted >2× speedup on 8-core offline converge. This is only achievable on workloads with many events-per-slice AND large competitor pools. Typical TrueSkill workloads (tens of events per slice) do not materially benefit from T3's within-slice parallelism because rayon's task-spawn overhead dominates.
- Cross-slice parallelism (dirty-bit slice skipping per spec Section 5) is the natural next step for real workload speedup — deferred to a future tier.
Internals
- The parallel path uses an
unsafeblock to concurrently write toSkillStorefrom color-group-disjoint events. Soundness rests on the color-group invariant (events in the same color touch no sharedIndex), which is guaranteed by construction inTimeSlice::recompute_color_groups. Sequential path unchanged. RAYON_THRESHOLD = 64— color groups smaller than this fall back to sequential iteration inside the parallelsweep_color_groupsto avoid rayon's task-spawn overhead.- Thread-local
ScratchArenaper rayon worker thread.
Unreleased — T2 new API surface
Breaking: every renamed type and the new public API land together per
docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md
Section 7 "T2".
Breaking renames
Batch→TimeSlicePlayer→Rating(and the.playerfield onCompetitoris now.rating)Agent→CompetitorIndexMap→KeyTableHistoryfield.batches→.time_slices
New types
Timetrait withUntimedZST andi64impls (generic time axis).Drift<T: Time>— generified from the oldDrifttrait.Event<T, K>,Team<K>,Member<K>— typed bulk-ingest event shape.Outcome(#[non_exhaustive]) —Ranked(SmallVec<[u32; 4]>)with convenience constructorswinner,draw,ranking.Scoredlands in T4.Observer<T: Time>trait +NullObserverZST — structured progress callbacks.ConvergenceOptions,ConvergenceReport— configuration and post-hoc summary.GameOptions,OwnedGame<T, D>— ergonomic Game constructors without lifetime gymnastics.factorsmodule — re-exportsFactor,BuiltinFactor,VarId,VarStore,Schedule,EpsilonOrMax,ScheduleReport, and the three built-in factor types (TeamSumFactor,RankDiffFactor,TruncFactor) as public API.
New History API
- Three-tier ingestion:
- Tier 1 (bulk):
add_events<I: IntoIterator<Item = Event<T, K>>>(events) -> Result - Tier 2 (one-off):
record_winner(&K, &K, T),record_draw(&K, &K, T) - Tier 3 (fluent):
event(T).team([...]).weights([...]).ranking([...]).commit()
- Tier 1 (bulk):
converge() -> Result<ConvergenceReport, InferenceError>— replacesconvergence(iters, eps, verbose).current_skill(&K),learning_curve(&K),learning_curves()(now keyed onK).log_evidence()zero-arg,log_evidence_for(&[&K]).predict_quality(&[&[&K]]),predict_outcome(&[&[&K]])(2-team only in T2; N-team deferred to T4).intern(&Q)/lookup(&Q)expose the internalKeyTable<K>for power users.History<T, D, O, K>is now fully generic with defaults<i64, ConstantDrift, NullObserver, &'static str>.
New Game API
Game::ranked(&[&[Rating]], Outcome, &GameOptions) -> Result<OwnedGame, _>.Game::one_v_one(&Rating, &Rating, Outcome) -> Result<(Gaussian, Gaussian), _>.Game::free_for_all(&[&Rating], Outcome, &GameOptions) -> Result<OwnedGame, _>.Game::custom(...)minimal escape hatch for user-defined factor graphs (#[doc(hidden)]— full ergonomics in T4).Game::log_evidence()andOwnedGame::log_evidence()accessors.
Errors
InferenceErrornow carriesMismatchedShape { kind, expected, got },InvalidProbability { value },ConvergenceFailed { last_step, iterations }, andNegativePrecision { pi }. Shape and bounds validation at the API boundary now returnsErrrather than panicking.
Removed (breaking)
History::convergence(iters, eps, verbose)— useconverge().HistoryBuilder::gamma(f64)— use.drift(ConstantDrift(g)).HistoryBuilder::time(bool)andHistory.time: bool— use theTimetype parameter.- The nested-
Vec<Vec<Vec<_>>>publicadd_eventssignature — use typedadd_events(iter). learning_curves_by_index()— uselearning_curves().
Performance
Batch::iteration bench: 21.36 µs (T1 was 22.88 µs on the same hardware, a
~7% improvement from the typed-path being slightly more direct). Gaussian
operations unchanged.
Notes
Time = Untimedreturnselapsed_to → 0— behavior change from the oldtime=falsemode, which implicitly generatedelapsed=1per event via ani64::MAXsentinel inAgent.last_time. Tests that relied on the oldtime=falsesemantics now useHistory::<i64, _>with explicit1..=ntimestamps.
0.1.0 - 2026-04-23
Features
- feat: added a Drift trait and a "default" ConstantDrift implementation
Miscellaneous Tasks
- chore: added cliff.toml, release.toml and rustfmt.toml
- chore: clean up
Other (unconventional)
- Initial commit.
- Begin working on batch.
- Passing tests for Batch
- Working on History struct. First test is passing.
- More test passing for History
- Added more functions to History
- Remove Display impl, better to use Debug
- Use flatten instead of flat_map
- Handle case where there is no time
- It works, or so it seems
- Use PlayerIndex instead of String
- Inline a lot of functions
- Refactor some code
- Refactor some stuff
- Port from julia version instead
- More things, better things, awesome
- More tests, more code
- More things, more tests
- Fix tests
- More tests
- More tests
- Added builder for History, and start migrating test to use builder instead.
- Update test to use builder
- Remove unused code
- Use and Index struct instead of str and String for player id
- Update example so now it works, and thats, well, good
- Update test to use assert_ulps_eq
- Fixed test
- Change time to use i64 instead of u64
- Small change
- Clean up example
- Update crates and added methods to get a key or all keys in an IndexMap
- Added a get function to IndexMap
- Agents doens't have to be behind a mutable reference in within_prior
- Agents doens't have to be behind a mutable reference in within_priors
- Refactor so we can see if there is any way to improve the performance
- Fix clippy warning
- More refactoring
- Remove warnings and refactor some code
- Added benchmark for Batch
- Added default implementation for TeamMessage
- Remove unused mut reference
- Make it more rusty
- More rustifying
- Small refactor
- Rename d to diff, and t to team
- Added more links to readme
- Fix broken link in README
- Update crates
- Clean up
- Dry my eyes
- Remove unnecessary allocations
- Fix clippy warning
- Refactor history
- Rename variables
- Move stuff around
- Added quality function
- Make quality a free standing function instead
- Improve performance
- Change assert to debug_assert
- Added todo to readme, and documentation for quality function
- Basic test for quality
- Ignore temp folder
- Update edition
- Small changes for new 2024 edition
- remove notepad
- added benchmark
Styling
- style: cargo fmt