Outcome::Scored becomes a struct variant with an Option<f64> sigma
field. None inherits HistoryBuilder::score_sigma; Some(s) overrides
per event. Resolved at ingest time so EventKind::Scored stays a plain
f64 and TimeSlice/run_chain need zero changes. New constructors
Outcome::scores_with_sigma and EventBuilder::scores_with_sigma cover
the override path; existing scores(..) keeps its signature with
sigma=None internally.
Breaking change to Outcome::Scored variant shape (tuple → struct);
acceptable in 0.1.x. Closes the last item from the T4-MarginFactor
deferred wishlist.
Closes the gap between HistoryBuilder::convergence(opts) and the
within-game inference loop. TimeSlice gains a convergence field;
History passes self.convergence at construction; the three
Game::*_with_arena callsites in time_slice.rs read it. Also renames
TimeSlice::convergence the method (now iterate_to_convergence) to
disambiguate from the new field.
Pure plumbing — no new public API, no behavioral change for users on
default options. Makes Damped EP reachable through the History path.
Six tasks: Gaussian::damp_natural helper, ConvergenceOptions::alpha
field, TruncFactor and MarginFactor propagate_with_alpha pair, DiffFactor
+ Game integration (the big task — must land atomically), and
end-to-end tests for max_iter and alpha behavior.
Smallest-scope realisation of spec §"Built-in schedules" Damped: a
ConvergenceOptions::alpha field plumbed through run_chain to a new
Gaussian::damp_natural helper applied inside TruncFactor and
MarginFactor's propagate. alpha=1.0 default keeps every existing
golden bit-equal; alpha<1.0 stabilises oscillating fixed-point loops
on hard graphs.
Defers Schedule trait integration, nat-param convergence switch,
oscillation auto-detect, Residual/OneShot, and Synergy/ScoreFactor —
each gets its own future plan.
The plan's prose quoted Z_cav ≈ 0.046827 and log_evidence ≈ -3.0613,
which diverged from the values asserted by the shipped test in
src/factor/mod.rs (-3.062235327364623). Update prose and the matching
code comment to 0.04678 / -3.0622.
Three independent cleanups: dedupe Game::likelihoods and likelihoods_scored
via a run_chain helper taking a make_link closure, make BuiltinFactor's
log_evidence match exhaustive, and fix stale numerics in the T4 plan doc.
Implements T3 of `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md` Section 6. Plan: `docs/superpowers/plans/2026-04-24-t3-concurrency.md` (11 tasks).
## Summary
### Breaking
- `Send + Sync` bounds added to public traits: `Time`, `Drift<T>`, `Observer<T>`, `Factor`, `Schedule`. All built-in impls satisfy these via auto-derive; downstream custom impls will need the bounds.
### New
- Opt-in `rayon` cargo feature. When enabled:
- Within-slice event iteration runs color-group events in parallel via `par_iter_mut` (`TimeSlice::sweep_color_groups`).
- `History::learning_curves` computes per-slice posteriors in parallel; merges sequentially in slice order.
- `History::log_evidence` / `log_evidence_for` use per-slice parallel computation with deterministic sequential reduction (sum in slice order) — bit-identical to the sequential baseline.
- `ColorGroups` infrastructure (`src/color_group.rs`) with greedy graph coloring. Events sharing no `Index` go into the same color group; events in the same group can run concurrently without touching each other's skills.
- `tests/determinism.rs` asserts bit-identical posteriors across `RAYON_NUM_THREADS={1, 2, 4, 8}`.
- `benches/history_converge.rs` measures end-to-end convergence on three workload shapes.
## Performance
### Sequential (no rayon, default build)
| Metric | Before T3 | After T3 | Delta |
|---|---|---|---|
| `Batch::iteration` | 22.88 µs | 23.23 µs | **+1.5%** (noise) |
| `Gaussian::*` | ≈218–264 ps | ≈236 ps | within noise |
**No sequential regression.** Default build is as fast as T2.
### Parallel (`--features rayon`, Apple M5 Pro, auto thread count)
| Workload | Sequential | Parallel | Speedup |
|---|---:|---:|---:|
| 500 events / 100 competitors / 10 per slice | 4.03 ms | 4.24 ms | **1.0×** |
| 2000 events / 200 competitors / 20 per slice | 20.18 ms | 19.82 ms | **1.0×** |
| 5000 events / 50000 competitors / 1 slice | 11.88 ms | 9.10 ms | **1.3×** |
### ⚠️ The spec's >=2× target was not met on realistic workloads.
T3's within-slice color-group parallelism only shows material benefit when a slice holds many events AND the competitor pool is large enough to give the greedy coloring room to partition. Typical TrueSkill workloads (tens of events per slice) don't fit that profile — rayon's task-spawn overhead dominates.
**Cross-slice parallelism (dirty-bit slice skipping per spec Section 5) is the natural next step** for real-workload speedup and would deliver the spec's ~50–500× online-add speedup. Deferred to a future tier.
## Determinism
`tests/determinism.rs` runs a 200-event history at thread counts {1, 2, 4, 8} via `rayon::ThreadPoolBuilder::install` and asserts every `(time, posterior)` pair has bit-identical `mu` and `sigma` (compared via `f64::to_bits()`). Passes.
## Internals
- Parallel path uses an `unsafe` block to concurrently write to `SkillStore` from color-group-disjoint events. Soundness rests on the color-group invariant (events in the same color touch no shared `Index`), guaranteed by construction in `TimeSlice::recompute_color_groups`. Sequential path unchanged from T2.
- `RAYON_THRESHOLD = 64` — color groups smaller than this fall back to sequential inside `sweep_color_groups` to avoid task-spawn overhead.
- Thread-local `ScratchArena` per rayon worker thread.
## Test plan
- [x] `cargo test --features approx` — 96 tests pass (74 lib + 22 integration)
- [x] `cargo test --features approx,rayon` — 97 tests pass (+1 determinism)
- [x] `cargo clippy --all-targets --features approx -- -D warnings` — clean
- [x] `cargo clippy --all-targets --features approx,rayon -- -D warnings` — clean
- [x] `cargo +nightly fmt --check` — clean
- [x] `cargo bench --bench batch --features approx` — 23.23 µs (no regression vs T2)
- [x] `cargo bench --bench history_converge --features approx,rayon` — runs on all three workloads
- [x] Bit-identical posteriors across `RAYON_NUM_THREADS={1, 2, 4, 8}` — verified
## Commit history
13 commits on `t3-concurrency`. Each task is self-contained and bisectable. See `git log main..t3-concurrency` for the full list.
## Deferred
- **Cross-slice parallelism** (dirty-bit slice skipping) — the path that would actually speed up typical TrueSkill workloads.
- **Default-on `rayon` feature** — spec called for default-on; we keep it opt-in until the feature proves stable in production use.
- **Synchronous-EP schedule with barrier merge** — alternative parallel strategy per spec Section 6.
- **`MarginFactor` / `Outcome::Scored`** — T4.
- **`Damped` / `Residual` schedules** — T4.
- **N-team `predict_outcome`** — T4.
- **`Game::custom` full ergonomics** — T4.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reviewed-on: #2
Co-authored-by: Anders Olsson <anders.e.olsson@gmail.com>
Co-committed-by: Anders Olsson <anders.e.olsson@gmail.com>