tests/determinism.rs runs the same deterministic 200-event history
at thread counts {1, 2, 4, 8} via rayon::ThreadPoolBuilder::install
and asserts every (time, posterior) pair has bit-identical mu and
sigma across all configurations.
Cfg-gated to the rayon feature; no-op under --features approx alone.
Verifies the T3 determinism invariant that the ordered-reduce
strategy (per-slice parallel, sequential sum) produces thread-count-
independent results.
Part of T3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-slice log_evidence contribution computed in parallel under
--features rayon; final reduction is sequential .into_iter().sum()
on Vec<f64>, preserving slice order so the sum is bit-identical to
the sequential T2 baseline.
Essential for the T3 acceptance criterion of identical posteriors
across RAYON_NUM_THREADS values.
Part of T3.
Per-slice posterior collection runs in parallel via par_iter; merge
into the per-key HashMap is sequential in slice order so iteration
order and HashMap insertion order are identical to the sequential
impl. Preserves deterministic output across thread counts.
Default-feature (no rayon) build unchanged — uses the T2 sequential
impl.
Part of T3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The compute/apply split introduced in 3680c54 was always active — the
sequential build paid EventOutput heap-alloc overhead even without
rayon, regressing Batch::iteration from 23.46 µs to 33.79 µs (+44%).
This commit makes the split feature-gated: under cfg(feature = "rayon")
the compute/apply pattern stays (needed for par_iter); under
cfg(not(feature = "rayon")) events update SkillStore inline via
Event::iteration_direct, matching the T2 performance profile.
EventOutput, Event::compute, and Event::apply_output are now
cfg(feature = "rayon")-only. TimeSlice::sweep_color_groups has two
cfg-gated implementations sharing the same signature.
Sequential restored to 23.29 µs; parallel 34.31 µs (small-workload
overhead expected — rayon threadpool amortizes at larger scales).
Part of T3.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Under #[cfg(feature = "rayon")], the per-iteration event sweep
processes events color-by-color: within a color, events touch
disjoint Index values by construction, so par_iter is safe.
Across colors, sequential ordering preserves async-EP semantics.
Event::compute() is a pure function returning an owned EventOutput
(new per-item likelihoods, evidence, and pre-computed new skill
likelihoods). The apply phase runs sequentially after the parallel
map, writing EventOutput values back to SkillStore and each event's
item likelihoods. This avoids shared mutable state in the hot loop.
Default build (no rayon) uses a sequential fallback that traverses
the same color-group order — behaviorally identical to the parallel
path. This keeps goldens bit-identical across feature configurations.
Scenario 3b applied: event updates read from and write to the shared
SkillStore, so the compute/apply split (Option A) was necessary.
Part of T3 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
TimeSlice gains a color_groups field of type ColorGroups, recomputed
whenever events change. After recompute, self.events is physically
reordered so color-0 events are first, then color-1, etc. Each color
is therefore a contiguous range of indices in self.events —
the invariant that Task 6's parallel par_iter_mut exploits.
Greedy coloring via crate::color_group::color_greedy; agent indices
come from Event::iter_agents. ColorGroups gains a color_range helper
that returns the contiguous Range<usize> for a given color.
Numerical behavior unchanged: async-EP is order-independent at
convergence, so event reordering does not affect goldens.
Part of T3.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ColorGroups holds a partition of event indices into color groups such
that events of the same color touch no shared Index. Computed greedily
in ingestion order: each event goes into the first color whose existing
members are disjoint from the event's indices.
Used in T3 for safe within-slice parallelism — events in the same
color can run concurrently without touching each other's skills.
Part of T3 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
Required for T3 rayon-based parallelism. Affected traits:
- Time (+ Send + Sync + 'static)
- Drift<T> (+ Send + Sync)
- Observer<T> (+ Send + Sync)
- Factor (+ Send + Sync)
- Schedule (+ Send + Sync)
All built-in impls (i64, Untimed, ConstantDrift, NullObserver,
EpsilonOrMax, TeamSumFactor, RankDiffFactor, TruncFactor,
BuiltinFactor) naturally satisfy these bounds via auto-derive.
Minor breaking change: downstream custom impls that aren't already
thread-safe will need to add the bounds.
Part of T3 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
Opt-in feature flag — users who want parallel paths build with
--features rayon. Default build remains single-threaded.
Spec Section 6 calls for default-on; we defer that flip until the
feature is stable under field use.
Part of T3 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
11-task plan for rayon-backed within-slice parallelism per
Section 6 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>