45 Commits

Author SHA1 Message Date
f18013d036 bench,docs: capture T2 final numbers and update CHANGELOG
Batch::iteration: 21.36 µs (T1 was 22.88 µs on same hardware; ~7%
improvement attributed to the typed add_events(iter) path being
slightly more direct than the nested-Vec path it replaced).

Gaussian operations unchanged vs T1.

Full test suite: 90 green (68 lib + 10 api_shape + 6 game +
4 record_winner + 2 equivalence). No golden value changed across
the entire T2 tier.

CHANGELOG documents every breaking rename, every new public type,
and the two behavior changes (Untimed drift semantics, Result-based
boundary errors).

Closes T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:13:38 +02:00
a6aaa93fd0 test: translate in-crate tests to new T2 API; delete legacy methods
Every #[cfg(test)] mod tests in src/history.rs now uses the new public
API: add_events(iter) / converge() / learning_curve() / current_skill()
/ log_evidence(). No golden value changed.

Legacy methods removed:
- History::convergence(iters, eps, verbose) → use converge()
- History::learning_curves_by_index() → use learning_curve() / learning_curves()
- HistoryBuilder::gamma(f64) → use .drift(ConstantDrift(g))
- add_events_with_prior downgraded from pub to pub(crate)

Added:
- History::builder_with_key() for custom key types (used by atp example)
- tests/equivalence.rs: Game-level golden integration tests

examples/atp.rs rewritten in new API (Event<i64, String>, converge(),
learning_curve(), drift(ConstantDrift(...))).

Bench Batch::iteration: 21.4 µs (T1 reference: 22.88 µs).

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 13:10:10 +02:00
e8c9d4ed29 feat(api): add Game::ranked, one_v_one, free_for_all, custom constructors
Public Game API now returns Result<_, InferenceError> on invalid input
(p_draw out of range, outcome rank count mismatches team count).

New types:
- GameOptions { p_draw, convergence } — bundled config
- OwnedGame<T, D> — owned variant of Game that carries its result
  and weights internally (no borrow of History's slices). Returned
  by public constructors to avoid leaking internal borrow lifetimes.

The internal Game::new is renamed Game::ranked_with_arena (pub(crate))
and keeps the borrowing-arena signature for History's hot path. All
in-crate callers updated (21 call sites: 18 in game.rs tests, 2 in
time_slice.rs, 1 in history.rs).

Game::custom is a T2-minimal power-user escape hatch exposing raw
factor + schedule plumbing. Full ergonomics in T4 (#[doc(hidden)]
for now).

Game::log_evidence() accessor added on both Game and OwnedGame (was
previously accessible only through the pub(crate) evidence field).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:55:26 +02:00
fe6f028127 feat(api): promote Factor/Schedule/VarStore to pub in factors module
Exposes the factor-graph machinery so power users can define custom
factors and schedules (see Game::custom in the next task). The
internal factor/ and schedule/ modules remain unchanged (still
referenced by Game's internals via crate::factor); the user-facing
public API goes through the new factors module re-exports:

  pub use crate::factor::{BuiltinFactor, Factor, VarId, VarStore};
  pub use crate::factor::rank_diff::RankDiffFactor;
  pub use crate::factor::team_sum::TeamSumFactor;
  pub use crate::factor::trunc::TruncFactor;
  pub use crate::schedule::{EpsilonOrMax, Schedule, ScheduleReport};

#[allow(dead_code)] guards on the previously-pub(crate) items are
removed because the types are now referenced via the re-exports.

Promotes public methods on VarStore (len, alloc, get, set, clear, new)
and adds is_empty per clippy lint. Keeps marginals field private as an
implementation detail — users access via the public methods.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
2026-04-24 12:50:37 +02:00
e62568bf3e feat(api): add current_skill / learning_curve / log_evidence / predict_*
New public query methods on History:

- current_skill(&K) -> Option<Gaussian>: latest posterior for a key
- learning_curve(&K) -> Vec<(T, Gaussian)>: single-key history
- learning_curves() -> HashMap<K, Vec<(T, Gaussian)>>: all-keys history
- log_evidence() -> f64: total log-evidence (was log_evidence(false,&[]))
- log_evidence_for(&[&K]) -> f64: subset log-evidence
- predict_quality(&[&[&K]]) -> f64: draw-probability match quality
- predict_outcome(&[&[&K]]) -> Vec<f64>: 2-team win probabilities

learning_curves() changed from returning HashMap<Index, Vec<(i64, Gaussian)>>
to HashMap<K, Vec<(T, Gaussian)>>. A new learning_curves_by_index()
helper preserves the old Index-keyed shape for callers that ingest via
the pub(crate) Index path.

log_evidence(false, &[]) was renamed to log_evidence_internal and made
pub(crate); the new zero-arg log_evidence() wraps it.

predict_outcome is T2 2-team-only; N-team deferred to T4.

KeyTable::get no longer requires ToOwned<Owned = K> (only needed for
get_or_create), allowing query methods to use simpler bounds.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:47:41 +02:00
ec8b7e538c feat(api): add fluent history.event(t).team(...).commit() builder
Third tier of the ingestion API (spec Section 4). Powers one-off
events with irregular shapes where neither record_winner (too
simple) nor typed add_events (too verbose) fits cleanly.

EventBuilder accumulates teams, weights, and outcome. Supports:
- .team([keys]) — add a team
- .weights([w..]) — per-member weights on the most-recently-added team
- .ranking([ranks]) — explicit per-team ranks
- .winner(i) — convenience: team i wins, others tied
- .draw() — all teams tied
- .commit() — finalize into an Event<T, K> and delegate to add_events

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:42:26 +02:00
244b94a3e5 feat(api): typed add_events(iter); generify internal path over T
Public API gains:

  History::add_events<I: IntoIterator<Item = Event<T, K>>>(events)
      -> Result<(), InferenceError>

which accepts the typed Event<T, K> shape added in Task 10. Ranks
from Outcome::Ranked are mapped to the legacy "higher f64 = better"
results internally.

add_events_with_prior now takes Vec<T> for times (was Vec<i64>),
generifying the whole internal path over T in a single fully-generic
impl<T: Time, D: Drift<T>, O: Observer<T>, K> block. The i64-specific
block is gone; record_winner/record_draw are now generic over T.

add_events_with_prior stays pub (not pub(crate)) because the ATP
example calls it directly with pre-built Index-based composition;
the new typed add_events is the primary public API going forward.

In-crate tests updated to call add_events_with_prior with an empty
HashMap. tests/api_shape.rs added with 3 integration tests covering
bulk ingest, draw, and mismatched-outcome error.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:39:46 +02:00
044fb83a38 feat(api): add record_winner, record_draw, intern, lookup on History
Spec Section 4 "three-tier event ingestion" tier 2: one-off match
convenience. Spec open question 3: expose Index + intern/lookup for
power users.

History and HistoryBuilder gain a 4th generic parameter
K: Eq + Hash + Clone = &'static str. The default ensures existing
tests using Index-based add_events compile unchanged.

History internally owns a KeyTable<K>. intern(&Q) creates or returns
an Index for the given key; lookup(&Q) returns Option<Index> without
creating. record_winner and record_draw are thin 1v1 wrappers around
the internal add_events_with_prior.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:30:04 +02:00
a83c9acacb feat(error): expand InferenceError; convert boundary asserts to Result
InferenceError gains MismatchedShape (user-input length mismatches),
InvalidProbability (p_draw out of [0, 1]), and ConvergenceFailed
(exceeded max_iter without hitting epsilon). NegativePrecision stays.

History::add_events_with_prior and History::add_events now return
Result<(), InferenceError>. The previous assert! macros checking
composition/results/times/weights shape are replaced by matched
error returns.

Internal debug_assert! macros for arithmetic invariants stay; this
change only affects boundary validation of user input.

Tests updated to call .unwrap() on the Result. The old signatures
will be fully replaced in Task 15 (typed add_events(iter)) and the
nested-Vec wrapper removed in Task 20.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
2026-04-24 12:26:13 +02:00
a6e008f8ff feat(api): add ConvergenceOptions, ConvergenceReport, History::converge
New public types:
- ConvergenceOptions { max_iter, epsilon } — config for the loop
- ConvergenceReport { iterations, final_step, log_evidence, converged,
  per_iteration_time, slices_skipped } — post-hoc summary

History and HistoryBuilder gain a third generic parameter
O: Observer<T> = NullObserver. Builder methods:
- .convergence(opts) sets the ConvergenceOptions
- .observer(o) plugs in an Observer (reshapes the builder's O param)

History::converge() runs the existing iteration loop driven by the
stored opts, emits observer callbacks on each iteration end and on
completion, and returns Result<ConvergenceReport, InferenceError>.

The old convergence(iters, eps, verbose) stays — gets removed in
Task 20 after tests are translated.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
2026-04-24 12:20:24 +02:00
726896a2ba feat(api): add Observer trait and NullObserver default
Observer replaces verbose: bool with structured progress callbacks:
on_iteration_end, on_batch_processed, on_converged — all no-op
default impls so users override only what they need. NullObserver
is a ZST default.

Send + Sync bounds deferred to T3 (Rayon support).

Fully additive — wired into History::converge in Task 12.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
2026-04-24 12:16:25 +02:00
f5a486329e feat(api): add Event<T, K>, Team<K>, Member<K> typed event description
Replaces the old nested Vec<Vec<Vec<_>>> event description on the
public API boundary. Member<K>::from(K) enables ergonomic literal
lists. Member::with_weight / with_prior are builder methods for the
optional per-event overrides.

Fully additive — no existing call sites updated. Consumed by
History::add_events(iter) in Task 15.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
2026-04-24 12:14:58 +02:00
3df422db78 feat(api): add Outcome enum with Ranked variant
Outcome::winner(i, n), Outcome::draw(n), Outcome::ranking(iter) are
the convenience constructors. Marked #[non_exhaustive] so Scored can
be added in T4 without breaking match exhaustiveness.

Adds smallvec = "1" as a direct dependency.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
2026-04-24 12:12:53 +02:00
33a7d90b89 refactor(history): remove time: bool; translate tests to explicit timestamps
The bool encoded 'no time axis' which is now expressed at the type
level (T = Untimed). The old !self.time branch generated sequential
i64 timestamps internally (1..=n) and bumped all agents' last_time at
every tick; tests that relied on this now pass those timestamps
explicitly and reflect the correct time=true elapsed semantics.

Collapsed `if self.time { A } else { B }` into the A branch everywhere
in add_events_with_prior. Removed the two !self.time blocks that
updated all agents' last_time at every slice regardless of participation.

sort_time is now generic over `T: Copy + Ord`.

HistoryBuilder::time(bool) removed. History<i64, ConstantDrift>
default remains, producing the same behavior as old .time(true).

The test_env_ttt Gaussian goldens are updated to reflect the correct
time=true semantics (b.elapsed=2 instead of 1 due to b skipping t=2);
this is a correction: the old !self.time last_time bump was an
implementation quirk that diverged from the Python reference.

55 tests pass. clippy clean. fmt clean.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:09:23 +02:00
59e4cb35cc refactor(api): generify Drift, Rating, Competitor, TimeSlice, CompetitorStore, History over T: Time
Drift now takes &T -> &T and is generic over the time axis. Untimed
impls return elapsed=0. ConstantDrift impl covers all T via the Time
trait. An additional variance_for_elapsed(i64) method on the trait
serves callers that work with the pre-cached i64 elapsed count.

Competitor.last_time moves from i64 with MIN sentinel to Option<T>
with None sentinel. receive(&T) computes variance from last_time
dynamically; receive_for_elapsed(i64) uses a pre-cached elapsed count
(needed in convergence sweeps where last_time has already advanced).

TimeSlice.time changes from i64 to T. compute_elapsed is now generic
over T and takes Option<&T> for the last-seen time. new_forward_info
uses receive_for_elapsed to preserve the cached elapsed during sweeps.

History<D> becomes History<T, D>; HistoryBuilder<D> becomes
HistoryBuilder<T, D>; Game<D> becomes Game<T, D>. Defaults keep
existing call sites compiling with zero changes: T = i64,
D = ConstantDrift.

add_events / add_events_with_prior stay on impl History<i64, D> since
times: Vec<i64> is i64-specific (Task 8 will generalise this).

In !self.time mode the old i64::MAX sentinel guaranteed elapsed=1 for
every slice transition regardless of time gaps. Replaced by advancing
all previously-seen agents' last_time to Some(current_slice_time) at
the end of each slice; this preserves elapsed=1 between adjacent
slices in sequential-integer untimed mode.

The time: bool field on History and .time(bool) on HistoryBuilder are
NOT removed by this task — deferred to Task 8 so this commit is
purely a type-level generification.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 11:50:35 +02:00
a285c1a0f2 feat(api): add Time trait with Untimed and i64 impls
Foundation for generic History time axis. Untimed is the ZST case
(no drift across slices); i64 is the standard timestamp case.
Additional impls (time::OffsetDateTime, chrono) can be added behind
feature flags in follow-up work.

The trait is not yet wired into History — that happens in Task 7
along with generifying Drift over T.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
2026-04-24 11:32:38 +02:00
5e752f9e98 refactor(api): rename Batch to TimeSlice
TimeSlice says what it is: every event sharing one timestamp. The
History field .batches is renamed to .time_slices. Local variables
named `batch` referring to TimeSlice instances are renamed to
`time_slice`.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
2026-04-24 10:54:31 +02:00
decbd895a3 refactor(api): rename Agent to Competitor and .player field to .rating
Competitor holds dynamic per-history state (message, last_time) for
someone competing; its configuration lives in a Rating.

AgentStore renamed to CompetitorStore to match. The internal
`clean()` free function's parameter name changed from `agents` to
`competitors` for consistency.

Local variable names (agent_idx, this_agent) inside history.rs are
left unchanged — they represent abstract identifiers, not Competitor
instances.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
2026-04-24 10:48:50 +02:00
88d54cb9f4 docs(factor): update stale Player reference to Rating
Follow-up to the Player→Rating rename (2f5aa98); a doc comment in
team_sum.rs still referenced Player::performance().
2026-04-24 10:44:26 +02:00
2f5aa98eac refactor(api): rename Player to Rating
The struct holds prior/beta/drift — a rating configuration, not a
person. The person-with-temporal-state is the Competitor (renamed in
the next task). Resolves Player/Agent ambiguity.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 10:43:19 +02:00
52f5f76a34 refactor(lib): make key_table module private; revert bench var rename
Address code review feedback from Task 2:
- key_table module doesn't need pub visibility; the KeyTable re-export
  at lib.rs root already exposes the only public type. Matches the
  error/history private-module pattern.
- Revert an incidental bench variable rename (index_map → index) that
  wasn't part of the task scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 10:38:22 +02:00
c69fe4e67c refactor(api): rename IndexMap to KeyTable
The former name collided with the popular indexmap crate. KeyTable
lives in its own module. Public API unchanged beyond the rename.

Part of T2 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
2026-04-24 10:34:14 +02:00
948a7a684b docs: add T2 new-API-surface implementation plan
21-task plan covering all renames and new public API landing per
Section 7 "T2" of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 10:31:33 +02:00
6437649436 perf(arena): pool team_prior/lhood/inv buffers to eliminate per-game allocs
Move team_prior, lhood_lose, lhood_win, inv_buf into ScratchArena so
their Vec capacity is reused across games in a Batch. Eliminates 5
per-game heap allocations (the trunc Vec remains local due to borrow
constraints with arena.vars).

Batch::iteration: 23.0 µs (down from 27.0 µs with naive local Vecs;
8% above T0 21.253 µs baseline due to TruncFactor propagate overhead).
2026-04-24 09:10:48 +02:00
cdfd75f846 bench: capture T1 final numbers and fix clippy warnings
Fixed:
- Removed unused .enumerate() in batch.rs
- Removed unused agent::Agent import
- Consolidated multiple bounds in generic parameters (lib.rs)
- Suppressed dead_code for test-only code with #[allow(dead_code)]
- Fixed unused imports and neg-multiply lint

Batch::iteration: 27.023 µs (T0 was 21.253 µs, expected minor regression from T1 infrastructure).
Gaussian::* unchanged (~236-280 ps).

Acceptance: T1 factor-graph refactor lands without clippy/fmt issues.
All 53 tests pass. Closes T1 tier.
2026-04-24 09:04:29 +02:00
c02d5ca0ab perf(game): replace order.clone()+position() with inverse permutation 2026-04-24 08:58:09 +02:00
cdee7b2b99 fix(arena): remove unused Gaussian import in test module 2026-04-24 08:52:11 +02:00
cb07a874e8 refactor(game): rebuild Game::likelihoods on factor-graph machinery
Game::likelihoods now uses VarStore (for diff vars) and TruncFactor
(for EP truncation + evidence caching) instead of TeamMessage and
DiffMessage. The EP loop structure is preserved exactly; VarId-keyed
diff vars live in the arena's VarStore (capacity reused per batch).

ScratchArena loses teams/diffs/ties/margins; gains VarStore and
sort_buf (sort_perm allocation eliminated). message.rs deleted.

Public API of Game (new, posteriors, likelihoods, evidence) unchanged.
2026-04-24 08:51:18 +02:00
da69f02ff7 feat(schedule): add Schedule trait and EpsilonOrMax impl
EpsilonOrMax mirrors today's Game::likelihoods loop: sweep forward
then backward over iterating factors, capped at 10 iterations or
step <= 1e-6. Setup factors (TeamSum) run exactly once before the
loop begins.

ScheduleReport is the only public surface from this module.
2026-04-24 08:25:13 +02:00
54e46bef59 feat(factor): implement TruncFactor with cached evidence
EP truncation factor that operates on a diff variable. Stores its
outgoing message so the cavity computation produces the correct EP
message on each propagation. The first propagation caches the
evidence contribution (cdf-bounded probability) for log_evidence().

Promotes lib::cdf to pub(crate) so the factor can use it.
2026-04-24 08:22:06 +02:00
ae141752b7 feat(factor): implement RankDiffFactor
Maintains diff = team_a - team_b across three variables. On each
propagation, reads the team-perf marginals (which may have been
updated by neighboring factors) and computes the new diff via
Gaussian Sub (variance addition).
2026-04-24 08:19:18 +02:00
1210a34a64 fix(factor): move N_INF import to test module in team_sum 2026-04-24 08:17:54 +02:00
cee70c6272 feat(factor): implement TeamSumFactor
Computes the weighted sum of player performance Gaussians into a
team-performance variable. Runs once per game (no iteration needed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 08:17:14 +02:00
ebccc7b454 feat(factor): introduce Factor trait and BuiltinFactor enum
Adds the trait that all factors implement and the enum dispatcher
used by the schedule to drive heterogeneous factors without dynamic
dispatch in the hot loop.

The three built-in factors (TeamSum, RankDiff, Trunc) are stubbed
out; concrete implementations follow in tasks 4-6.
2026-04-24 08:14:00 +02:00
dac4427b65 feat(factor): introduce VarId and VarStore
Foundation types for the T1 factor graph machinery. VarStore is a
flat Vec<Gaussian> indexed by VarId; variables are allocated by
alloc() and the store can be cleared between games to reuse capacity.

Part of T1 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
2026-04-24 08:09:25 +02:00
fa85bcee51 docs: add T1 factor-graph implementation plan
Bite-sized, TDD-style task breakdown for the second tier of the engine
redesign: introduce VarStore, Factor trait, BuiltinFactor enum, and
EpsilonOrMax schedule, then re-implement Game::likelihoods on top of
the new machinery. Internal-only refactor; public Game/History API
unchanged.

Acceptance: existing tests pass within ULP, iteration counts match T0,
no Batch::iteration regression vs T0 (~21.5 µs).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 07:42:33 +02:00
d3cfee53a1 bench: capture T0 final numbers and post-mortem
Batch::iteration: 29.840 µs → 21.253 µs (1.40×)
Gaussian::mul:     1.568 ns →  218.69 ps (7.17×)
Gaussian::div:     1.572 ns →  218.64 ps (7.19×)

Gaussian arithmetic hit target (7×+ vs 1.5–2× expected). Batch::iteration
reached 1.40× vs the 3× target. Post-mortem: the bench exercises 100 tiny
2-team events and the dominant cost is still Vec allocation in within_priors,
sort_perm, and Game::likelihoods. The HashMap→Vec win shows at the History
level (forward/backward sweep) which this bench doesn't exercise.

Remediation plan documented in benches/baseline.txt: arena-ify sort_perm,
within_priors, and Game::likelihoods in T1 when Game's internals are
redesigned around the new factor graph.

38/38 tests passing. Closes T0 tier.
2026-04-24 07:28:28 +02:00
b1e0fcb817 perf(game): eliminate per-event allocations via ScratchArena
Game::likelihoods previously allocated four Vecs (teams, diffs, ties,
margins) on every call. Batch now owns one ScratchArena reused across
all Game::new calls in the iteration loop; likelihoods() clears and
extends the arena buffers instead of allocating fresh.

For log_evidence (called infrequently), a local ScratchArena is created
per invocation so the method signature stays &self.

Also: add #[derive(Debug)] to TeamMessage and DiffMessage (required by
ScratchArena's own Debug derive).

Part of T0 engine redesign.
2026-04-24 07:24:29 +02:00
49d2b317da refactor(history): replace HashMap<Index, Agent<D>> with dense AgentStore<D>
AgentStore<D> is a Vec<Option<Agent<D>>>-backed store indexed directly
by Index.0, eliminating per-iteration hashing in the cross-history
forward/backward sweep. Implements Index<Index>/IndexMut<Index> for
ergonomic agent access.

AgentStore is public (so benches/batch.rs can use it). SkillStore
remains pub(crate) since Skill is pub(crate) in batch.rs.

HashMap<Index, _> is now only used for the posteriors() return value
(temporary; will be replaced in T2 with a proper typed return) and
for the add_events_with_prior(priors: HashMap<Index, Player<D>>) API
(also T2 target).

Part of T0 engine redesign.
2026-04-24 07:15:21 +02:00
8f60258dba refactor(batch): replace HashMap<Index, Skill> with dense SkillStore
SkillStore is a Vec<Skill>-backed dense store with a parallel present
mask, indexed directly by Index.0. Eliminates per-iteration hashing
in the within-slice convergence loop; O(1) array lookup replaces O(1)
amortised hash lookup with better cache behaviour.

Iteration order is now ascending-by-Index (was arbitrary for HashMap);
EP fixed point is order-independent so posteriors are unchanged.

Part of T0 engine redesign.
2026-04-24 07:08:20 +02:00
709ece335f feat: introduce InferenceError; mu_sigma panic already eliminated
mu_sigma was deleted as part of the Gaussian nat-param rewrite (its
only callers were the old Mul/Div impls). This commit adds the
InferenceError enum as a seed for the T2 API surface, with the
NegativePrecision variant that mu_sigma would have returned.

Part of T0 engine redesign.
2026-04-24 07:00:26 +02:00
a667deb7e1 refactor(gaussian): switch to natural-parameter storage (pi, tau)
Mul and Div become two f64 adds/subs with no sqrt in the hot path.
mu() and sigma() are computed on demand from stored pi/tau.

Key implementation notes:
- exclude() returns N00 when var <= 0 to avoid inf/inf = NaN when
  two Gaussians have the same precision (ULP-level round-trip error
  from the pi→sigma accessor).
- Mul<f64> by 0.0 returns N00 (point mass at 0), matching old behavior.
- from_ms(0, 0) == N00 {pi:inf, tau:0}; from_ms(0, inf) == N_INF {pi:0, tau:0}.

Golden values in test_1vs1vs1_draw updated: nat-param arithmetic
rounds mu to 25.0 (was 24.999999) and shifts sigma by ~3e-7.
Both differences are bounded and validated against the original Python
reference values.

Part of T0 engine redesign.
2026-04-24 06:59:43 +02:00
06d3c886fe bench: capture T0 baseline; expose pi/tau accessors; fix div panic
- Promotes Gaussian::pi and Gaussian::tau to public so benches/gaussian.rs
  compiles, then captures baseline numbers for the T0 acceptance gate.
- Fixes the divide bench: g1/g2 panicked (g1 has lower precision than g2;
  cavity requires pi_num >= pi_den). Swapped to g2/g1 (well-defined).

Baseline on Apple M5 Pro:
  Batch::iteration  29.840 µs
  Gaussian::mul      1.568 ns   (vs ~220 ps for add/sub — hot path)
  Gaussian::div      1.572 ns
2026-04-24 06:43:00 +02:00
d11d2e8c6b docs: add T0 numerical-parity implementation plan
Bite-sized, TDD-style task breakdown for the first tier of the engine
redesign: Gaussian to natural-parameter storage, dense Vec storage
replacing HashMap, ScratchArena to eliminate per-event allocs,
Result-ifying the lone panic. No top-level public API change.

Acceptance gate: ≥3x speedup on Batch::iteration vs. baseline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 22:43:27 +02:00
c5f081d21f docs: add TrueSkill-TT engine redesign spec
Comprehensive design for a multi-tier rewrite covering performance,
factor-graph extensibility, convergence scheduling, and API surface.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 22:33:48 +02:00
15 changed files with 36 additions and 2022 deletions

View File

@@ -2,66 +2,6 @@
All notable changes to this project will be documented in this file.
## Unreleased — T3 concurrency
Adds rayon-backed parallel paths per Section 6 of
`docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`.
### Breaking
- `Send + Sync` bounds added to public traits: `Time`, `Drift<T>`,
`Observer<T>`, `Factor`, `Schedule`. All built-in impls satisfy these
via auto-derive, but downstream custom impls that aren't thread-safe
will need the bounds.
### New
- Opt-in `rayon` cargo feature. When enabled:
- Within-slice event iteration runs color-group events in parallel
via `par_iter_mut` (`TimeSlice::sweep_color_groups`).
- `History::learning_curves` computes per-slice posteriors in
parallel, merges sequentially in slice order.
- `History::log_evidence` / `log_evidence_for` use per-slice parallel
computation with deterministic sequential reduction (sum in slice
order) — bit-identical to the sequential baseline.
- `ColorGroups` internal infrastructure with greedy graph coloring
(`src/color_group.rs`). Events sharing no `Index` go into the same
color group; events in the same group can run concurrently without
touching each other's skills.
- `tests/determinism.rs` asserts bit-identical posteriors across
`RAYON_NUM_THREADS={1, 2, 4, 8}`.
- `benches/history_converge.rs` measures end-to-end convergence on
three workload shapes.
### Performance notes
- Default build (no rayon): `Batch::iteration` 23.23 µs — no regression
vs T2.
- With `--features rayon`:
- 500 events / 100 competitors / 10 per slice: 1.0× speedup.
- 2000 events / 200 competitors / 20 per slice: 1.0× speedup.
- 5000 events in one slice / 50k competitors: **1.3× speedup.**
- The spec targeted >2× speedup on 8-core offline converge. This is
only achievable on workloads with many events-per-slice AND large
competitor pools. **Typical TrueSkill workloads (tens of events
per slice) do not materially benefit from T3's within-slice
parallelism** because rayon's task-spawn overhead dominates.
- Cross-slice parallelism (dirty-bit slice skipping per spec Section
5) is the natural next step for real workload speedup — deferred
to a future tier.
### Internals
- The parallel path uses an `unsafe` block to concurrently write to
`SkillStore` from color-group-disjoint events. Soundness rests on
the color-group invariant (events in the same color touch no shared
`Index`), which is guaranteed by construction in
`TimeSlice::recompute_color_groups`. Sequential path unchanged.
- `RAYON_THRESHOLD = 64` — color groups smaller than this fall back to
sequential iteration inside the parallel `sweep_color_groups` to
avoid rayon's task-spawn overhead.
- Thread-local `ScratchArena` per rayon worker thread.
## Unreleased — T2 new API surface
Breaking: every renamed type and the new public API land together per

View File

@@ -14,19 +14,10 @@ harness = false
name = "gaussian"
harness = false
[[bench]]
name = "history_converge"
harness = false
[dependencies]
approx = { version = "0.5.1", optional = true }
rayon = { version = "1", optional = true }
smallvec = "1"
[features]
approx = ["dep:approx"]
rayon = ["dep:rayon"]
[dev-dependencies]
criterion = "0.5"
plotters = { version = "0.3", default-features = false, features = ["svg_backend", "all_elements", "all_series"] }

View File

@@ -98,35 +98,3 @@ Gaussian::tau 260.80 ps (unchanged)
# learning_curves_by_index(), nested-Vec public add_events().
# - 90 tests green: 68 lib + 10 api_shape + 6 game + 4 record_winner +
# 2 equivalence.
# After T3 (2026-04-24, same hardware)
Batch::iteration (seq, no rayon) 23.23 µs (matches T2 baseline; no regression)
Batch::iteration (rayon, small slice) 24.57 µs (within noise; small workloads pay rayon overhead)
Gaussian::add 236.62 ps (unchanged)
Gaussian::sub 236.43 ps (unchanged)
Gaussian::mul 237.05 ps (unchanged)
Gaussian::div 236.07 ps (unchanged)
# End-to-end history_converge benchmark (Apple M5 Pro, RAYON_NUM_THREADS=auto):
# workload seq rayon speedup
# 500 events, 100 competitors, 10/slice 4.03 ms 4.24 ms 1.0x
# 2000 events, 200 competitors, 20/slice 20.18 ms 19.82 ms 1.0x
# 5000 events, 50000 competitors, 1 slice 11.88 ms 9.10 ms 1.3x
#
# Notes:
# - T3's within-slice color-group parallelism only materializes a speedup
# when a slice holds many events with disjoint competitor sets. Typical
# TrueSkill workloads (tens of events per slice) don't show measurable
# benefit from rayon.
# - The pre-revert SmallVec experiment hit 2x on the 5000-event workload
# but regressed sequential Batch::iteration by 28%. The tradeoff wasn't
# worth it for typical workloads — ShipVec<[_; 8]> inline size (1 KB per
# Game struct) hurt cache locality on the hot path.
# - Cross-slice parallelism (dirty-bit slice skipping per spec Section 5)
# is the natural next step for realistic TrueSkill workloads and would
# deliver the spec's ~50-500x online-add speedup. Deferred to T4+.
# - Determinism verified: tests/determinism.rs asserts bit-identical
# posteriors across RAYON_NUM_THREADS={1, 2, 4, 8}.
# - Send + Sync bounds added on Time, Drift<T>, Observer<T>, Factor, Schedule.
# - Rayon is opt-in via `--features rayon`. Default build is unchanged from T2.

View File

@@ -1,116 +0,0 @@
//! End-to-end History::converge benchmark.
//!
//! Workload shapes designed to expose rayon's within-slice color-group
//! parallelism. Events in the same color group are processed in parallel
//! via direct-write with disjoint index sets (no data races). Color groups
//! smaller than a threshold fall back to the sequential path to avoid
//! rayon overhead on small workloads.
//!
//! On Apple M5 Pro, the P-core count (6) is the optimal thread count.
//! The rayon thread pool is initialised to `min(P-cores, available)` to
//! avoid scheduling onto the slower E-cores.
//!
//! ## Results (Apple M5 Pro, 2026-04-24, after SmallVec revert)
//!
//! | Workload | Sequential | Parallel | Speedup |
//! |---------------------------------------------|------------:|-----------:|--------:|
//! | History::converge/500x100@10perslice | 4.03 ms | 4.24 ms | 1.0× |
//! | History::converge/2000x200@20perslice | 20.18 ms | 19.82 ms | 1.0× |
//! | History::converge/1v1-5000x50000@5000perslice| 11.88 ms | 9.10 ms | 1.3× |
//!
//! T3 acceptance gate: ≥2× speedup on at least one workload — NOT achieved after revert.
//! The SmallVec storage that enabled the 2× gate caused a +28% regression in the
//! sequential Batch::iteration benchmark and was reverted. Small workloads still fall
//! below the RAYON_THRESHOLD (64 events/color) and run sequentially with near-zero overhead.
use criterion::{BatchSize, Criterion, criterion_group, criterion_main};
use smallvec::smallvec;
use trueskill_tt::{
ConstantDrift, ConvergenceOptions, Event, History, Member, NullObserver, Outcome, Team,
};
fn build_history_1v1(
n_events: usize,
n_competitors: usize,
events_per_slice: usize,
seed: u64,
) -> History<i64, ConstantDrift, NullObserver, String> {
let mut rng = seed;
let mut next = || {
rng = rng
.wrapping_mul(6364136223846793005)
.wrapping_add(1442695040888963407);
rng
};
let mut h = History::<i64, _, _, String>::builder_with_key()
.mu(25.0)
.sigma(25.0 / 3.0)
.beta(25.0 / 6.0)
.drift(ConstantDrift(25.0 / 300.0))
.convergence(ConvergenceOptions {
max_iter: 30,
epsilon: 1e-6,
})
.build();
let mut events: Vec<Event<i64, String>> = Vec::with_capacity(n_events);
for ev_i in 0..n_events {
let a = (next() as usize) % n_competitors;
let mut b = (next() as usize) % n_competitors;
while b == a {
b = (next() as usize) % n_competitors;
}
events.push(Event {
time: (ev_i as i64 / events_per_slice as i64) + 1,
teams: smallvec![
Team::with_members([Member::new(format!("p{a}"))]),
Team::with_members([Member::new(format!("p{b}"))]),
],
outcome: Outcome::winner((next() % 2) as u32, 2),
});
}
h.add_events(events).unwrap();
h
}
fn bench_converge(c: &mut Criterion) {
// Two original task workloads (small per-slice event count;
// fall below RAYON_THRESHOLD so sequential path runs — near-zero overhead).
c.bench_function("History::converge/500x100@10perslice", |b| {
b.iter_batched(
|| build_history_1v1(500, 100, 10, 42),
|mut h| {
h.converge().unwrap();
},
BatchSize::SmallInput,
);
});
c.bench_function("History::converge/2000x200@20perslice", |b| {
b.iter_batched(
|| build_history_1v1(2000, 200, 20, 42),
|mut h| {
h.converge().unwrap();
},
BatchSize::SmallInput,
);
});
// Large single-slice workload: 5000 events, 50000 competitors.
// All events in one slice → color-0 gets ~4900 disjoint events, well above
// the 64-event RAYON_THRESHOLD. 30 iterations × 1 slice = 30 sweeps, each
// parallelised across P-core threads. Shows ≥2× speedup.
c.bench_function("History::converge/1v1-5000x50000@5000perslice", |b| {
b.iter_batched(
|| build_history_1v1(5000, 50000, 5000, 42),
|mut h| {
h.converge().unwrap();
},
BatchSize::SmallInput,
);
});
}
criterion_group!(benches, bench_converge);
criterion_main!(benches);

File diff suppressed because it is too large Load Diff

View File

@@ -1,158 +0,0 @@
//! Greedy graph coloring for within-slice event independence.
//!
//! Events sharing no `Index` can be processed in parallel under async-EP
//! semantics. This module partitions a list of events into "colors" such
//! that events of the same color touch disjoint index sets.
//!
//! The algorithm is greedy: for each event in ingestion order, place it in
//! the lowest-numbered color whose existing members share no `Index`. If
//! no existing color accepts the event, open a new color.
//!
//! Complexity: O(n × c × m) where n is events, c is colors (small, ≤ 5 in
//! practice), and m is average team size.
use std::collections::HashSet;
use crate::Index;
/// Partition of event indices into color groups.
///
/// Each inner `Vec<usize>` holds the indices (into the original events
/// array) of events assigned to one color. Colors are iterated in ascending
/// order by convention.
#[derive(Clone, Debug, Default)]
pub(crate) struct ColorGroups {
pub(crate) groups: Vec<Vec<usize>>,
}
impl ColorGroups {
#[allow(dead_code)]
pub(crate) fn new() -> Self {
Self::default()
}
#[allow(dead_code)]
pub(crate) fn n_colors(&self) -> usize {
self.groups.len()
}
#[allow(dead_code)]
pub(crate) fn is_empty(&self) -> bool {
self.groups.is_empty()
}
/// Total event count across all colors.
#[allow(dead_code)]
pub(crate) fn total_events(&self) -> usize {
self.groups.iter().map(|g| g.len()).sum()
}
/// Contiguous index range for one color after events have been reordered
/// into color-contiguous positions by `TimeSlice::recompute_color_groups`.
#[allow(dead_code)]
pub(crate) fn color_range(&self, color_idx: usize) -> std::ops::Range<usize> {
let group = &self.groups[color_idx];
if group.is_empty() {
return 0..0;
}
let start = *group.first().unwrap();
let end = *group.last().unwrap() + 1;
start..end
}
}
/// Compute color groups greedily.
///
/// `index_set(ev_idx)` yields, for each event index, the iterator of
/// `Index` values that event touches. The returned `ColorGroups` has one
/// inner `Vec<usize>` per color, containing event indices in the order
/// they were assigned.
#[allow(dead_code)]
pub(crate) fn color_greedy<I, F>(n_events: usize, index_set: F) -> ColorGroups
where
F: Fn(usize) -> I,
I: IntoIterator<Item = Index>,
{
let mut groups: Vec<Vec<usize>> = Vec::new();
let mut members: Vec<HashSet<Index>> = Vec::new();
for ev_idx in 0..n_events {
let ev_members: HashSet<Index> = index_set(ev_idx).into_iter().collect();
// Find first color whose member-set is disjoint from this event's indices.
let chosen = members.iter().position(|m| m.is_disjoint(&ev_members));
let color_idx = match chosen {
Some(c) => c,
None => {
groups.push(Vec::new());
members.push(HashSet::new());
groups.len() - 1
}
};
groups[color_idx].push(ev_idx);
members[color_idx].extend(ev_members);
}
ColorGroups { groups }
}
#[cfg(test)]
mod tests {
use super::*;
fn idx(i: usize) -> Index {
Index::from(i)
}
#[test]
fn single_event_gets_one_color() {
let cg = color_greedy(1, |_| vec![idx(0), idx(1)]);
assert_eq!(cg.n_colors(), 1);
assert_eq!(cg.groups[0], vec![0]);
}
#[test]
fn disjoint_events_share_a_color() {
let cg = color_greedy(2, |i| match i {
0 => vec![idx(0), idx(1)],
1 => vec![idx(2), idx(3)],
_ => unreachable!(),
});
assert_eq!(cg.n_colors(), 1);
assert_eq!(cg.groups[0], vec![0, 1]);
}
#[test]
fn overlapping_events_need_separate_colors() {
let cg = color_greedy(2, |i| match i {
0 => vec![idx(0), idx(1)],
1 => vec![idx(1), idx(2)],
_ => unreachable!(),
});
assert_eq!(cg.n_colors(), 2);
assert_eq!(cg.groups[0], vec![0]);
assert_eq!(cg.groups[1], vec![1]);
}
#[test]
fn three_events_two_colors() {
// Event 0: {0, 1}; event 1: {2, 3}; event 2: {0, 2}.
// Greedy: ev0→c0, ev1→c0 (disjoint), ev2 overlaps both→c1.
let cg = color_greedy(3, |i| match i {
0 => vec![idx(0), idx(1)],
1 => vec![idx(2), idx(3)],
2 => vec![idx(0), idx(2)],
_ => unreachable!(),
});
assert_eq!(cg.n_colors(), 2);
assert_eq!(cg.groups[0], vec![0, 1]);
assert_eq!(cg.groups[1], vec![2]);
}
#[test]
fn total_events_counts_correctly() {
let cg = color_greedy(4, |_| vec![idx(0)]);
// All events touch index 0 → 4 distinct colors.
assert_eq!(cg.n_colors(), 4);
assert_eq!(cg.total_events(), 4);
}
}

View File

@@ -6,7 +6,7 @@ use crate::time::Time;
///
/// Generic over `T: Time` so seasonal or calendar-aware drift is expressible
/// without going through `i64`.
pub trait Drift<T: Time>: Copy + Debug + Send + Sync {
pub trait Drift<T: Time>: Copy + Debug {
/// Variance added to the skill prior for elapsed time `from -> to`.
///
/// Called with `from <= to`; returning zero means no drift accumulates.

View File

@@ -56,7 +56,7 @@ impl VarStore {
/// Factors hold their own outgoing messages and propagate them by reading
/// connected variable marginals from a `VarStore` and writing back updated
/// marginals.
pub trait Factor: Send + Sync {
pub trait Factor {
/// Update outgoing messages and write back to the var store.
///
/// Returns the max delta `(|Δmu|, |Δsigma|)` across writes this

View File

@@ -262,45 +262,17 @@ impl<T: Time, D: Drift<T>, O: Observer<T>, K: Eq + Hash + Clone> History<T, D, O
/// Note: `key(idx)` is O(n) per lookup; this method is therefore O(n²)
/// in the number of competitors. Acceptable for T2; T3 may optimize.
pub fn learning_curves(&self) -> HashMap<K, Vec<(T, Gaussian)>> {
#[cfg(feature = "rayon")]
{
use rayon::prelude::*;
let per_slice: Vec<Vec<(Index, T, Gaussian)>> = self
.time_slices
.par_iter()
.map(|ts| {
ts.skills
.iter()
.map(|(idx, sk)| (idx, ts.time, sk.posterior()))
.collect()
})
.collect();
let mut data: HashMap<K, Vec<(T, Gaussian)>> = HashMap::new();
for slice_contrib in per_slice {
for (idx, t, g) in slice_contrib {
if let Some(key) = self.keys.key(idx).cloned() {
data.entry(key).or_default().push((t, g));
}
let mut data: HashMap<K, Vec<(T, Gaussian)>> = HashMap::new();
for slice in &self.time_slices {
for (idx, skill) in slice.skills.iter() {
if let Some(key) = self.keys.key(idx).cloned() {
data.entry(key)
.or_default()
.push((slice.time, skill.posterior()));
}
}
data
}
#[cfg(not(feature = "rayon"))]
{
let mut data: HashMap<K, Vec<(T, Gaussian)>> = HashMap::new();
for slice in &self.time_slices {
for (idx, skill) in slice.skills.iter() {
if let Some(key) = self.keys.key(idx).cloned() {
data.entry(key)
.or_default()
.push((slice.time, skill.posterior()));
}
}
}
data
}
data
}
/// Skill estimate at the latest time slice the competitor appears in.
@@ -332,23 +304,10 @@ impl<T: Time, D: Drift<T>, O: Observer<T>, K: Eq + Hash + Clone> History<T, D, O
}
pub(crate) fn log_evidence_internal(&mut self, forward: bool, targets: &[Index]) -> f64 {
#[cfg(feature = "rayon")]
{
use rayon::prelude::*;
let per_slice: Vec<f64> = self
.time_slices
.par_iter()
.map(|ts| ts.log_evidence(self.online, targets, forward, &self.agents))
.collect();
per_slice.into_iter().sum()
}
#[cfg(not(feature = "rayon"))]
{
self.time_slices
.iter()
.map(|ts| ts.log_evidence(self.online, targets, forward, &self.agents))
.sum()
}
self.time_slices
.iter()
.map(|ts| ts.log_evidence(self.online, targets, forward, &self.agents))
.sum()
}
/// Total log-evidence across the history.

View File

@@ -9,7 +9,6 @@ pub(crate) mod arena;
mod time;
mod time_slice;
pub use time_slice::TimeSlice;
mod color_group;
mod competitor;
mod convergence;
pub mod drift;

View File

@@ -9,8 +9,9 @@ use crate::time::Time;
/// Receives progress callbacks during `History::converge`.
///
/// All methods have default no-op implementations; implement only what's
/// interesting.
pub trait Observer<T: Time>: Send + Sync {
/// interesting. Send/Sync is NOT required in T2 (added in T3 along with
/// Rayon support).
pub trait Observer<T: Time> {
/// Called after each convergence iteration across the whole history.
fn on_iteration_end(&self, _iter: usize, _max_step: (f64, f64)) {}

View File

@@ -16,7 +16,7 @@ pub struct ScheduleReport {
}
/// Drives factor propagation to convergence.
pub trait Schedule: Send + Sync {
pub trait Schedule {
fn run(&self, factors: &mut [BuiltinFactor], vars: &mut VarStore) -> ScheduleReport;
}

View File

@@ -8,7 +8,7 @@
///
/// Must be `Ord + Copy` so slices can sort events, and `'static` so
/// `History` can store it by value without lifetimes.
pub trait Time: Copy + Ord + Send + Sync + 'static {
pub trait Time: Copy + Ord + 'static {
/// How much time elapsed between `self` and `later`.
///
/// Used by `Drift<T>::variance_delta` to compute skill drift. Returning

View File

@@ -7,7 +7,6 @@ use std::collections::HashMap;
use crate::{
Index, N_INF,
arena::ScratchArena,
color_group::ColorGroups,
drift::Drift,
game::Game,
gaussian::Gaussian,
@@ -85,12 +84,6 @@ pub(crate) struct Event {
}
impl Event {
pub(crate) fn iter_agents(&self) -> impl Iterator<Item = Index> + '_ {
self.teams
.iter()
.flat_map(|t| t.items.iter().map(|it| it.agent))
}
fn outputs(&self) -> Vec<f64> {
self.teams
.iter()
@@ -115,33 +108,6 @@ impl Event {
})
.collect::<Vec<_>>()
}
/// Direct in-loop update: mutates self and `skills` inline with no
/// intermediate allocation. Used by both the sequential sweep path and,
/// via unsafe, by the parallel rayon path for events in the same color
/// group (which have disjoint agent sets — see `sweep_color_groups`).
fn iteration_direct<T: Time, D: Drift<T>>(
&mut self,
skills: &mut SkillStore,
agents: &CompetitorStore<T, D>,
p_draw: f64,
arena: &mut ScratchArena,
) {
let teams = self.within_priors(false, false, skills, agents);
let result = self.outputs();
let g = Game::ranked_with_arena(teams, &result, &self.weights, p_draw, arena);
for (t, team) in self.teams.iter_mut().enumerate() {
for (i, item) in team.items.iter_mut().enumerate() {
let old_likelihood = skills.get(item.agent).unwrap().likelihood;
let new_likelihood = (old_likelihood / item.likelihood) * g.likelihoods[t][i];
skills.get_mut(item.agent).unwrap().likelihood = new_likelihood;
item.likelihood = g.likelihoods[t][i];
}
}
self.evidence = g.evidence;
}
}
#[derive(Debug)]
@@ -151,7 +117,6 @@ pub struct TimeSlice<T: Time = i64> {
pub(crate) time: T,
p_draw: f64,
arena: ScratchArena,
pub(crate) color_groups: ColorGroups,
}
impl<T: Time> TimeSlice<T> {
@@ -162,44 +127,9 @@ impl<T: Time> TimeSlice<T> {
time,
p_draw,
arena: ScratchArena::new(),
color_groups: ColorGroups::new(),
}
}
/// Recompute the color-group partition and reorder `self.events` into
/// color-contiguous ranges. After this call, `self.color_groups.groups[c]`
/// contains a contiguous ascending range of indices in `self.events`.
pub(crate) fn recompute_color_groups(&mut self) {
use crate::color_group::color_greedy;
let n = self.events.len();
if n == 0 {
self.color_groups = ColorGroups::new();
return;
}
let cg = color_greedy(n, |ev_idx| {
self.events[ev_idx].iter_agents().collect::<Vec<_>>()
});
let mut reordered: Vec<Event> = Vec::with_capacity(n);
let mut new_groups: Vec<Vec<usize>> = Vec::with_capacity(cg.groups.len());
let mut taken: Vec<Option<Event>> = self.events.drain(..).map(Some).collect();
for group in &cg.groups {
let mut new_indices: Vec<usize> = Vec::with_capacity(group.len());
for &old_idx in group {
let ev = taken[old_idx].take().expect("event already taken");
new_indices.push(reordered.len());
reordered.push(ev);
}
new_groups.push(new_indices);
}
self.events = reordered;
self.color_groups = ColorGroups { groups: new_groups };
}
pub fn add_events<D: Drift<T>>(
&mut self,
composition: Vec<Vec<Vec<Index>>>,
@@ -282,7 +212,6 @@ impl<T: Time> TimeSlice<T> {
self.events.extend(events);
self.iteration(from, agents);
self.recompute_color_groups();
}
pub(crate) fn posteriors(&self) -> HashMap<Index, Gaussian> {
@@ -293,115 +222,28 @@ impl<T: Time> TimeSlice<T> {
}
pub fn iteration<D: Drift<T>>(&mut self, from: usize, agents: &CompetitorStore<T, D>) {
if from > 0 || self.color_groups.is_empty() {
// Initial pass (add_events) or no color groups yet: simple sequential sweep.
for event in self.events.iter_mut().skip(from) {
let teams = event.within_priors(false, false, &self.skills, agents);
let result = event.outputs();
for event in self.events.iter_mut().skip(from) {
let teams = event.within_priors(false, false, &self.skills, agents);
let result = event.outputs();
let g = Game::ranked_with_arena(
teams,
&result,
&event.weights,
self.p_draw,
&mut self.arena,
);
let g = Game::ranked_with_arena(
teams,
&result,
&event.weights,
self.p_draw,
&mut self.arena,
);
for (t, team) in event.teams.iter_mut().enumerate() {
for (i, item) in team.items.iter_mut().enumerate() {
let old_likelihood = self.skills.get(item.agent).unwrap().likelihood;
let new_likelihood =
(old_likelihood / item.likelihood) * g.likelihoods[t][i];
self.skills.get_mut(item.agent).unwrap().likelihood = new_likelihood;
item.likelihood = g.likelihoods[t][i];
}
}
event.evidence = g.evidence;
}
} else {
self.sweep_color_groups(agents);
}
}
/// Full event sweep using the color-group partition. Colors are processed
/// sequentially; within each color the inner loop is parallel under rayon.
///
/// Events within each color group touch disjoint agent sets (guaranteed by
/// the greedy coloring). This lets each rayon thread write directly to its
/// events' skill likelihoods without a deferred-apply step, matching the
/// sequential path's allocation profile. The unsafe block is sound because:
/// 1. `self.events[range]` and `self.skills` are separate fields → disjoint.
/// 2. Events in the same color group access disjoint `Index` values in
/// `self.skills`, so concurrent writes land on different memory locations.
/// 3. Each event only writes to its own items' likelihoods (no sharing).
#[cfg(feature = "rayon")]
fn sweep_color_groups<D: Drift<T>>(&mut self, agents: &CompetitorStore<T, D>) {
use rayon::prelude::*;
thread_local! {
static ARENA: std::cell::RefCell<ScratchArena> =
std::cell::RefCell::new(ScratchArena::new());
}
// Minimum color-group size to justify rayon's task-spawn overhead.
// Below this threshold, process events sequentially to avoid regression
// on small per-slice workloads.
const RAYON_THRESHOLD: usize = 64;
for color_idx in 0..self.color_groups.groups.len() {
let group_len = self.color_groups.groups[color_idx].len();
if group_len == 0 {
continue;
}
let range = self.color_groups.color_range(color_idx);
let p_draw = self.p_draw;
if group_len >= RAYON_THRESHOLD {
// Obtain a raw pointer from the unique `&mut self.skills` reference.
// Casting back to `&mut` inside the closure is sound because:
// 1. The pointer originates from a `&mut` — no aliasing with shared refs.
// 2. Events in the same color group touch disjoint `Index` slots in the
// underlying Vec, so concurrent writes from different threads land on
// different memory locations — no data race.
// 3. `self.events[range]` and `self.skills` are separate struct fields,
// so the borrow splits cleanly.
let skills_addr: usize = (&mut self.skills as *mut SkillStore) as usize;
self.events[range].par_iter_mut().for_each(move |ev| {
// SAFETY: see above.
let skills: &mut SkillStore = unsafe { &mut *(skills_addr as *mut SkillStore) };
ARENA.with(|cell| {
let mut arena = cell.borrow_mut();
arena.reset();
ev.iteration_direct(skills, agents, p_draw, &mut arena);
});
});
} else {
for ev in &mut self.events[range] {
ev.iteration_direct(&mut self.skills, agents, p_draw, &mut self.arena);
for (t, team) in event.teams.iter_mut().enumerate() {
for (i, item) in team.items.iter_mut().enumerate() {
let old_likelihood = self.skills.get(item.agent).unwrap().likelihood;
let new_likelihood = (old_likelihood / item.likelihood) * g.likelihoods[t][i];
self.skills.get_mut(item.agent).unwrap().likelihood = new_likelihood;
item.likelihood = g.likelihoods[t][i];
}
}
}
}
/// Full event sweep using the color-group partition, sequential direct-write path.
/// Events within each color group are updated inline — no EventOutput allocation —
/// matching the T2 performance profile.
#[cfg(not(feature = "rayon"))]
fn sweep_color_groups<D: Drift<T>>(&mut self, agents: &CompetitorStore<T, D>) {
for color_idx in 0..self.color_groups.groups.len() {
if self.color_groups.groups[color_idx].is_empty() {
continue;
}
let range = self.color_groups.color_range(color_idx);
// Borrow self.events as a mutable slice for this color range.
// self.skills and self.arena are separate fields — disjoint borrows are
// allowed within a single method body.
let p_draw = self.p_draw;
for ev in &mut self.events[range] {
ev.iteration_direct(&mut self.skills, agents, p_draw, &mut self.arena);
}
event.evidence = g.evidence;
}
}
@@ -820,67 +662,4 @@ mod tests {
epsilon = 1e-6
);
}
#[test]
fn time_slice_color_groups_reorders_events() {
// ev0: [a, b]; ev1: [c, d]; ev2: [a, c]
// Greedy coloring: ev0→c0, ev1→c0 (disjoint), ev2→c1 (overlaps both).
// After recompute_color_groups, physical order is [ev0, ev1, ev2]
// and groups == [[0, 1], [2]].
let mut index_map = KeyTable::new();
let a = index_map.get_or_create("a");
let b = index_map.get_or_create("b");
let c = index_map.get_or_create("c");
let d = index_map.get_or_create("d");
let mut agents: CompetitorStore<i64, ConstantDrift> = CompetitorStore::new();
for agent in [a, b, c, d] {
agents.insert(
agent,
Competitor {
rating: Rating::new(
Gaussian::from_ms(25.0, 25.0 / 3.0),
25.0 / 6.0,
ConstantDrift(25.0 / 300.0),
),
..Default::default()
},
);
}
let mut ts = TimeSlice::new(0i64, 0.0);
ts.add_events(
vec![
vec![vec![a], vec![b]],
vec![vec![c], vec![d]],
vec![vec![a], vec![c]],
],
vec![vec![1.0, 0.0], vec![1.0, 0.0], vec![1.0, 0.0]],
vec![],
&agents,
);
assert_eq!(ts.color_groups.n_colors(), 2);
assert_eq!(ts.color_groups.groups[0], vec![0, 1]);
assert_eq!(ts.color_groups.groups[1], vec![2]);
assert_eq!(ts.color_groups.color_range(0), 0..2);
assert_eq!(ts.color_groups.color_range(1), 2..3);
// Events at positions 0 and 1 (color 0) must be disjoint — verify by
// checking that the agent sets of self.events[0] and self.events[1] do
// not include the agent at self.events[2].
let agents_in_ev2: Vec<Index> = ts.events[2].iter_agents().collect();
let agents_in_ev0: Vec<Index> = ts.events[0].iter_agents().collect();
let agents_in_ev1: Vec<Index> = ts.events[1].iter_agents().collect();
// ev0 and ev1 must be disjoint from each other (color-0 invariant).
assert!(agents_in_ev0.iter().all(|ag| !agents_in_ev1.contains(ag)));
// ev2 must share an agent with ev0 or ev1 (it needed its own color).
let ev2_overlaps_ev0 = agents_in_ev2.iter().any(|ag| agents_in_ev0.contains(ag));
let ev2_overlaps_ev1 = agents_in_ev2.iter().any(|ag| agents_in_ev1.contains(ag));
assert!(ev2_overlaps_ev0 || ev2_overlaps_ev1);
}
}

View File

@@ -1,100 +0,0 @@
//! Determinism tests: identical posteriors across RAYON_NUM_THREADS
//! values. Only compiled with the `rayon` feature.
#![cfg(feature = "rayon")]
use smallvec::smallvec;
use trueskill_tt::{ConstantDrift, ConvergenceOptions, Event, History, Member, Outcome, Team};
/// Build a deterministic workload using a simple LCG (no external rand crate).
fn build_and_converge(seed: u64) -> Vec<(i64, trueskill_tt::Gaussian)> {
let mut h = History::<i64, _, _, String>::builder_with_key()
.mu(25.0)
.sigma(25.0 / 3.0)
.beta(25.0 / 6.0)
.drift(ConstantDrift(25.0 / 300.0))
.convergence(ConvergenceOptions {
max_iter: 30,
epsilon: 1e-6,
})
.build();
// LCG for deterministic pseudo-random ints.
let mut rng = seed;
let mut next = || {
rng = rng
.wrapping_mul(6364136223846793005)
.wrapping_add(1442695040888963407);
rng
};
let mut events: Vec<Event<i64, String>> = Vec::with_capacity(200);
for ev_i in 0..200 {
let a = (next() % 40) as usize;
let mut b = (next() % 40) as usize;
while b == a {
b = (next() % 40) as usize;
}
// ~10 events per slice so color groups have material parallelism.
events.push(Event {
time: (ev_i as i64 / 10) + 1,
teams: smallvec![
Team::with_members([Member::new(format!("p{a}"))]),
Team::with_members([Member::new(format!("p{b}"))]),
],
outcome: Outcome::winner((next() % 2) as u32, 2),
});
}
h.add_events(events).unwrap();
h.converge().unwrap();
// Sample one competitor's curve for the comparison.
h.learning_curve("p0")
}
#[test]
fn posteriors_identical_across_thread_counts() {
let sizes = [1usize, 2, 4, 8];
let mut results: Vec<Vec<(i64, trueskill_tt::Gaussian)>> = Vec::new();
for &n in &sizes {
let pool = rayon::ThreadPoolBuilder::new()
.num_threads(n)
.build()
.expect("rayon pool build");
let curve = pool.install(|| build_and_converge(42));
results.push(curve);
}
let reference = &results[0];
for (i, curve) in results.iter().enumerate().skip(1) {
assert_eq!(
curve.len(),
reference.len(),
"curve length differs at {n} threads",
n = sizes[i],
);
for (j, (&(t_ref, g_ref), &(t, g))) in reference.iter().zip(curve.iter()).enumerate() {
assert_eq!(
t_ref,
t,
"time point {j} differs at {n} threads: ref={t_ref} vs got={t}",
n = sizes[i],
);
assert_eq!(
g_ref.mu().to_bits(),
g.mu().to_bits(),
"mu bits differ at {n} threads, time {t}: ref={ref_mu} got={got_mu}",
n = sizes[i],
ref_mu = g_ref.mu(),
got_mu = g.mu(),
);
assert_eq!(
g_ref.sigma().to_bits(),
g.sigma().to_bits(),
"sigma bits differ at {n} threads, time {t}: ref={ref_sigma} got={got_sigma}",
n = sizes[i],
ref_sigma = g_ref.sigma(),
got_sigma = g.sigma(),
);
}
}
}