Go to file

Anders Olsson 6bf3e7e294 T3: rayon-backed concurrency (opt-in) (#2 )

Implements T3 of `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md` Section 6. Plan: `docs/superpowers/plans/2026-04-24-t3-concurrency.md` (11 tasks).

## Summary

### Breaking

- `Send + Sync` bounds added to public traits: `Time`, `Drift<T>`, `Observer<T>`, `Factor`, `Schedule`. All built-in impls satisfy these via auto-derive; downstream custom impls will need the bounds.

### New

- Opt-in `rayon` cargo feature. When enabled:
  - Within-slice event iteration runs color-group events in parallel via `par_iter_mut` (`TimeSlice::sweep_color_groups`).
  - `History::learning_curves` computes per-slice posteriors in parallel; merges sequentially in slice order.
  - `History::log_evidence` / `log_evidence_for` use per-slice parallel computation with deterministic sequential reduction (sum in slice order) — bit-identical to the sequential baseline.
- `ColorGroups` infrastructure (`src/color_group.rs`) with greedy graph coloring. Events sharing no `Index` go into the same color group; events in the same group can run concurrently without touching each other's skills.
- `tests/determinism.rs` asserts bit-identical posteriors across `RAYON_NUM_THREADS={1, 2, 4, 8}`.
- `benches/history_converge.rs` measures end-to-end convergence on three workload shapes.

## Performance

### Sequential (no rayon, default build)

| Metric | Before T3 | After T3 | Delta |
|---|---|---|---|
| `Batch::iteration` | 22.88 µs | 23.23 µs | **+1.5%** (noise) |
| `Gaussian::*` | ≈218–264 ps | ≈236 ps | within noise |

**No sequential regression.** Default build is as fast as T2.

### Parallel (`--features rayon`, Apple M5 Pro, auto thread count)

| Workload | Sequential | Parallel | Speedup |
|---|---:|---:|---:|
| 500 events / 100 competitors / 10 per slice | 4.03 ms | 4.24 ms | **1.0×** |
| 2000 events / 200 competitors / 20 per slice | 20.18 ms | 19.82 ms | **1.0×** |
| 5000 events / 50000 competitors / 1 slice | 11.88 ms | 9.10 ms | **1.3×** |

### ⚠️ The spec's >=2× target was not met on realistic workloads.

T3's within-slice color-group parallelism only shows material benefit when a slice holds many events AND the competitor pool is large enough to give the greedy coloring room to partition. Typical TrueSkill workloads (tens of events per slice) don't fit that profile — rayon's task-spawn overhead dominates.

**Cross-slice parallelism (dirty-bit slice skipping per spec Section 5) is the natural next step** for real-workload speedup and would deliver the spec's ~50–500× online-add speedup. Deferred to a future tier.

## Determinism

`tests/determinism.rs` runs a 200-event history at thread counts {1, 2, 4, 8} via `rayon::ThreadPoolBuilder::install` and asserts every `(time, posterior)` pair has bit-identical `mu` and `sigma` (compared via `f64::to_bits()`). Passes.

## Internals

- Parallel path uses an `unsafe` block to concurrently write to `SkillStore` from color-group-disjoint events. Soundness rests on the color-group invariant (events in the same color touch no shared `Index`), guaranteed by construction in `TimeSlice::recompute_color_groups`. Sequential path unchanged from T2.
- `RAYON_THRESHOLD = 64` — color groups smaller than this fall back to sequential inside `sweep_color_groups` to avoid task-spawn overhead.
- Thread-local `ScratchArena` per rayon worker thread.

## Test plan

- [x] `cargo test --features approx` — 96 tests pass (74 lib + 22 integration)
- [x] `cargo test --features approx,rayon` — 97 tests pass (+1 determinism)
- [x] `cargo clippy --all-targets --features approx -- -D warnings` — clean
- [x] `cargo clippy --all-targets --features approx,rayon -- -D warnings` — clean
- [x] `cargo +nightly fmt --check` — clean
- [x] `cargo bench --bench batch --features approx` — 23.23 µs (no regression vs T2)
- [x] `cargo bench --bench history_converge --features approx,rayon` — runs on all three workloads
- [x] Bit-identical posteriors across `RAYON_NUM_THREADS={1, 2, 4, 8}` — verified

## Commit history

13 commits on `t3-concurrency`. Each task is self-contained and bisectable. See `git log main..t3-concurrency` for the full list.

## Deferred

- **Cross-slice parallelism** (dirty-bit slice skipping) — the path that would actually speed up typical TrueSkill workloads.
- **Default-on `rayon` feature** — spec called for default-on; we keep it opt-in until the feature proves stable in production use.
- **Synchronous-EP schedule with barrier merge** — alternative parallel strategy per spec Section 6.
- **`MarginFactor` / `Outcome::Scored`** — T4.
- **`Damped` / `Residual` schedules** — T4.
- **N-team `predict_outcome`** — T4.
- **`Game::custom` full ergonomics** — T4.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #2
Co-authored-by: Anders Olsson <anders.e.olsson@gmail.com>
Co-committed-by: Anders Olsson <anders.e.olsson@gmail.com>

2026-04-24 13:01:01 +00:00

benches

T3: rayon-backed concurrency (opt-in) (#2 )

2026-04-24 13:01:01 +00:00

docs/superpowers

T3: rayon-backed concurrency (opt-in) (#2 )

2026-04-24 13:01:01 +00:00

examples

T0 + T1 + T2: engine redesign through new API surface (#1 )

2026-04-24 11:20:04 +00:00

src

T3: rayon-backed concurrency (opt-in) (#2 )

2026-04-24 13:01:01 +00:00

tests

T3: rayon-backed concurrency (opt-in) (#2 )

2026-04-24 13:01:01 +00:00

.gitignore

remove notepad

2026-03-23 14:21:23 +01:00

Cargo.toml

T3: rayon-backed concurrency (opt-in) (#2 )

2026-04-24 13:01:01 +00:00

CHANGELOG.md

T3: rayon-backed concurrency (opt-in) (#2 )

2026-04-24 13:01:01 +00:00

CLAUDE.md

feat: added a Drift trait and a "default" ConstantDrift implementation

2026-03-16 12:06:04 +01:00

cliff.toml

chore: added cliff.toml, release.toml and rustfmt.toml

2026-04-23 20:22:27 +02:00

Justfile

chore: clean up

2026-04-23 20:24:10 +02:00

README.md

feat: added a Drift trait and a "default" ConstantDrift implementation

2026-03-16 12:06:04 +01:00

release.toml

chore: do not publish

2026-04-23 20:26:52 +02:00

rustfmt.toml

chore: added cliff.toml, release.toml and rustfmt.toml

2026-04-23 20:22:27 +02:00

README.md

TrueSkill - Through Time

Rust port of TrueSkillThroughTime.py.

Other implementations

Drift

Skill drift models how a player's true skill can change between appearances. Each time a player reappears after a gap, their skill uncertainty is widened by the drift model before the new evidence is incorporated.

Drift is represented by the Drift trait:

pub trait Drift: Copy + Debug {
    fn variance_delta(&self, elapsed: i64) -> f64;
}

variance_delta returns the amount to add to σ² given the elapsed time since the player last played. Internally, Gaussian::forget uses this to compute the new sigma: σ_new = sqrt(σ² + variance_delta).

ConstantDrift

The built-in ConstantDrift implements a linear random walk — skill uncertainty grows proportionally to time:

variance_delta = elapsed * γ²

This is the standard TrueSkill Through Time model. Use it by passing a ConstantDrift(gamma) when constructing a Player:

use trueskill_tt::{Player, Gaussian, drift::ConstantDrift};

// gamma = 0.1 means skill can shift ~0.1 per time unit
let player = Player::new(Gaussian::from_ms(0.0, 6.0), 1.0, ConstantDrift(0.1));

Custom drift

Implement Drift to express any other model. For example, a drift that saturates after a long absence (uncertainty grows with the square root of elapsed time instead of linearly):

use trueskill_tt::drift::Drift;

#[derive(Clone, Copy, Debug)]
struct SqrtDrift {
    gamma: f64,
}

impl Drift for SqrtDrift {
    fn variance_delta(&self, elapsed: i64) -> f64 {
        (elapsed as f64).sqrt() * self.gamma * self.gamma
    }
}

let player = Player::new(Gaussian::from_ms(0.0, 6.0), 1.0, SqrtDrift { gamma: 0.5 });

To use a custom drift type with History, use the .drift() builder method instead of .gamma():

let h = History::builder()
    .drift(SqrtDrift { gamma: 0.5 })
    .build();

Todo

Implement approx for Gaussian
Add more tests from TrueSkillThroughTime.jl
Add tests for quality() (Use sublee/trueskill as reference)
Benchmark Batch::iteration()
Time needs to be an enum so we can have multiple states (see batch::compute_elapsed())
Add examples (use same TrueSkillThroughTime.(py|jl))
Add Observer (see argmin for inspiration)

README.md Unescape Escape

TrueSkill - Through Time

Other implementations

Drift

ConstantDrift

Custom drift

Todo

README.md