Files
trueskill-tt/docs/superpowers/plans/2026-04-24-t1-factor-graph.md
Anders Olsson d2aab82c1e T0 + T1 + T2: engine redesign through new API surface (#1)
Implements tiers T0, T1, T2 of `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`. All three tiers have landed together on this branch because they build on one another; this PR rolls them up for a single review pass.

Per-tier plans:
- T0: `docs/superpowers/plans/2026-04-23-t0-numerical-parity.md`
- T1: `docs/superpowers/plans/2026-04-24-t1-factor-graph.md`
- T2: `docs/superpowers/plans/2026-04-24-t2-new-api-surface.md`

## Summary

### T0 — Numerical parity (internal)

- `Gaussian` switched to natural-parameter storage `(pi, tau)`; mul/div now ~7× faster (218 ps vs 1.57 ns).
- `HashMap<Index, _>` → dense `Vec<_>` keyed by `Index.0` (via `AgentStore<D>`, `SkillStore`).
- `ScratchArena` eliminates per-event allocations in `Game::likelihoods`.
- `InferenceError` seed type added (1 variant).
- 38 → 53 tests passing through T1.
- Benchmark: `Batch::iteration` 29.84 → 21.25 µs.

### T1 — Factor graph machinery (internal)

- `Factor` trait + `BuiltinFactor` enum (TeamSum / RankDiff / Trunc) driving within-game inference.
- `VarStore` flat storage for variable marginals.
- `Schedule` trait + `EpsilonOrMax` impl replacing the hand-rolled EP loop.
- `Game::likelihoods` rebuilt on the factor-graph machinery; iteration counts and goldens preserved to within 1e-6.
- 53 tests passing.
- Benchmark: `Batch::iteration` 23.01 µs (slight regression absorbed in T2).

### T2 — New API surface (breaking)

**Renames:**
- `IndexMap → KeyTable`, `Player → Rating`, `Agent → Competitor`, `Batch → TimeSlice`

**New types:**
- `Time` trait with `Untimed` ZST and `i64` impls; `Drift<T>`, `Rating<T, D>`, `Competitor<T, D>`, `TimeSlice<T>`, `History<T, D, O, K>` all generic.
- `Event<T, K>`, `Team<K>`, `Member<K>`, `Outcome` (`Ranked` variant; `#[non_exhaustive]`).
- `Observer<T>` trait + `NullObserver`.
- `ConvergenceOptions`, `ConvergenceReport`.
- `GameOptions`, `OwnedGame<T, D>`.

**Three-tier ingestion:**
- `history.record_winner(&K, &K, T)` / `record_draw(&K, &K, T)` — 1v1 convenience.
- `history.add_events(iter)` — typed bulk.
- `history.event(T).team([...]).weights([...]).ranking([...]).commit()` — fluent.

**Query API:** `current_skill`, `learning_curve`, `learning_curves` (keyed on `K`), `log_evidence`, `log_evidence_for`, `predict_quality`, `predict_outcome`.

**Game constructors:** `ranked`, `one_v_one`, `free_for_all`, `custom` — all returning `Result<_, InferenceError>`.

**`factors` module:** `Factor`, `Schedule`, `VarStore`, `VarId`, `BuiltinFactor`, `EpsilonOrMax`, `ScheduleReport`, `TeamSumFactor`, `RankDiffFactor`, `TruncFactor` now public.

**Errors:** `InferenceError` gains `MismatchedShape`, `InvalidProbability`, `ConvergenceFailed`; boundary panics converted to `Result`.

**Removed (breaking):** `History::convergence(iters, eps, verbose)`, `HistoryBuilder::gamma(f64)`, `HistoryBuilder::time(bool)`, `History.time: bool`, `learning_curves_by_index`, nested-Vec public `add_events`.

## Behavior change (documented in CHANGELOG)

`Time = Untimed` has `elapsed_to → 0`, so no drift accumulates between slices. The old `time=false` mode implicitly forced `elapsed=1` on reappearance via an `i64::MAX` sentinel — that quirk is not reproducible under a typed time axis. Tests that depended on it now use `History::<i64, _>` with explicit `1..=n` timestamps. One test (`test_env_ttt`) had 3 Gaussian goldens updated to reflect the corrected semantics; documented in commit `33a7d90`.

## Final numbers

| Metric | Before T0 | After T2 | Delta |
|---|---|---|---|
| `Batch::iteration` | 29.84 µs | 21.36 µs | **-28%** |
| `Gaussian::mul` | 1.57 ns | 219 ps | **-86%** |
| `Gaussian::div` | 1.57 ns | 219 ps | **-86%** |
| Tests passing | 38 | 90 | +52 |

All other Gaussian ops unchanged (~219 ps add/sub, ~264 ps pi/tau reads).

## Test plan

- [x] `cargo test --features approx` — 90/90 pass (68 lib + 10 api_shape + 6 game + 4 record_winner + 2 equivalence)
- [x] `cargo clippy --all-targets --features approx -- -D warnings` — clean
- [x] `cargo +nightly fmt --check` — clean
- [x] `cargo bench --bench batch` — 21.36 µs
- [x] `cargo bench --bench gaussian` — unchanged from T1
- [x] `cargo run --example atp --features approx` — rewritten in new API, runs clean
- [x] Historical Game-level goldens preserved in `tests/equivalence.rs`
- [x] Public API matches spec Section 4 (verified by integration tests in `tests/api_shape.rs`)

## Commit history

~45 commits total across T0 + T1 + T2. Each task is self-contained and individually tested; the branch is bisectable. See `git log main..t2-new-api-surface` for the full list.

## Deferred to later tiers

- `Outcome::Scored` + `MarginFactor` — T4
- `Damped` / `Residual` schedules — T4
- `Send + Sync` bounds + Rayon parallelism — T3
- N-team `predict_outcome` — T4
- `Game::custom` full ergonomics — T4

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Reviewed-on: #1
Co-authored-by: Anders Olsson <anders.e.olsson@gmail.com>
Co-committed-by: Anders Olsson <anders.e.olsson@gmail.com>
2026-04-24 11:20:04 +00:00

53 KiB
Raw Permalink Blame History

T1 — Factor Graph Machinery Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Re-implement the within-game inference (Game::likelihoods) on top of an explicit factor-graph data structure (VarStore, Factor trait, BuiltinFactor enum, Schedule trait), without changing observable behavior. T1 is a pure internal restructure — public API of Game and Batch is untouched.

Architecture: Variables hold their current Gaussian marginals in a flat VarStore indexed by VarId. Factors hold their outgoing messages and propagate them via Factor::propagate(&mut VarStore) -> (f64, f64) returning the max delta. The four built-in factors (TeamSum, RankDiff, Trunc) — wrapped in a BuiltinFactor enum to avoid dyn dispatch in the hot path — exactly reproduce the current EP algorithm. A Schedule trait drives factor propagation; the default EpsilonOrMax schedule is bit-equivalent to today's hard-coded forward+backward sweep.

Tech Stack: Rust 2024 edition, criterion benchmarks, approx crate for floating-point comparisons. Builds on T0 (natural-parameter Gaussian, dense Vec storage, ScratchArena).

Acceptance criteria

  • All existing tests pass (cargo test --features approx). Tolerance bounded by 1e-6 (same as T0); if a test drifts beyond that, the implementation is wrong.
  • Within-game iteration counts match T0 (the EpsilonOrMax schedule should produce identical iteration counts on identical inputs).
  • cargo bench --bench batch shows no regression vs T0 (Batch::iteration ≤ 21.5 µs on Apple M5 Pro). Aim for parity or modest improvement.
  • cargo clippy --all-targets --features approx clean.
  • cargo +nightly fmt --check clean.
  • Public API of History, HistoryBuilder, Game::new, Game::posteriors, Player, Gaussian, quality() unchanged. Internal types (Factor, VarStore, Schedule, BuiltinFactor) are pub(crate).

Background — the algorithm we're refactoring

The current Game::likelihoods implements EP for ranked teams with optional draws. For an n-team game (sorted by result, descending), the algorithm:

  1. Build team performances (one-shot): for each team, compute the weighted sum of player performance Gaussians (p + (player.perf() * weight) folded). Stored as team[i].prior.

  2. Initialise diff priors: diff[i].prior = team[i].prior - team[i+1].prior for adjacent ranked pairs.

  3. EP loop (max 10 iterations or until step ≤ 1e-6):

    • Forward sweep: for each diff e:
      • Recompute diff[e].prior = team[e].posterior_win() - team[e+1].posterior_lose()
      • On iter 0, accumulate evidence: self.evidence *= cdf(margin, diff.prior) bounds
      • Compute diff[e].likelihood = approx(diff[e].prior, margin, tie) / diff[e].prior
      • Update team[e+1].likelihood_lose = team[e].posterior_win() - diff[e].likelihood
    • Backward sweep: for each diff e (reverse):
      • Same recompute, then update team[e].likelihood_win = team[e+1].posterior_lose() + diff[e].likelihood
    • Track max delta across all writes.
  4. Special-case 2-team games (diff.len() == 1): the loop above doesn't execute (range is empty), so a single direct propagation is done.

  5. Final boundary updates that close the chain.

  6. Compute likelihoods: for each team, the team's "likelihood" is likelihood_win * likelihood_lose * likelihood_draw. From this, derive per-player likelihoods via ((m - performance.exclude(p.perf() * w)) * (1.0 / w)).forget(p.beta²).

In factor-graph terms:

  • team[i].prior is the marginal at the team-perf variable, before any messages.
  • team[i].likelihood_lose is the outgoing message FROM the RankDiff factor TO team[i] (left-to-right).
  • team[i].likelihood_win is the outgoing message FROM the RankDiff factor TO team[i-1] (right-to-left).
  • diff[i].likelihood is the outgoing message FROM the Trunc factor TO diff[i].
  • diff[i].prior is the marginal at diff[i] from the RankDiff factor.

T1 makes these correspondences explicit by introducing factor objects that own these messages. The math stays identical; the organization changes.

File map

Created:

  • src/factor/mod.rs — module root: VarId, VarStore, Factor trait, BuiltinFactor enum, ScheduleReport
  • src/factor/team_sum.rsTeamSumFactor (one-shot weighted sum)
  • src/factor/rank_diff.rsRankDiffFactor (linear-combination factor between two team-perf vars and a diff var)
  • src/factor/trunc.rsTruncFactor (EP truncation factor on a diff var)
  • src/schedule.rsSchedule trait + EpsilonOrMax impl

Modified:

  • src/lib.rs — add pub(crate) mod factor; and pub(crate) mod schedule;
  • src/game.rsGame becomes { vars: VarStore, factors: Vec<BuiltinFactor>, ... }; Game::likelihoods builds the graph and runs the schedule
  • src/message.rsTeamMessage and DiffMessage deleted (replaced by factors and VarStore)
  • src/arena.rsScratchArena loses the teams/diffs/ties/margins fields; gains a VarStore, Vec<BuiltinFactor>, and a sort_buf: Vec<usize> for sort_perm scratch

Touched (test-only):

  • src/game.rs (test module) — no API change, but per-iteration goldens may drift by a ULP

Design

Variables and VarStore

// src/factor/mod.rs

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
pub(crate) struct VarId(pub(crate) u32);

/// Flat storage for all variable marginals in one game's factor graph.
#[derive(Debug, Default)]
pub(crate) struct VarStore {
    marginals: Vec<Gaussian>,
}

VarStore is reset and re-populated for each Game::new call. It lives in the ScratchArena so we don't reallocate the backing buffer.

Factor trait + BuiltinFactor enum

pub(crate) trait Factor {
    /// Update outgoing messages and write back to the var store.
    /// Returns max delta `(|Δmu|, |Δsigma|)` across writes this propagation.
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64);

    /// Optional log-evidence contribution. Default: 0.0 (no contribution).
    /// Only TruncFactor has a non-trivial impl.
    fn log_evidence(&self, _vars: &VarStore) -> f64 {
        0.0
    }
}

#[derive(Debug)]
pub(crate) enum BuiltinFactor {
    TeamSum(TeamSumFactor),
    RankDiff(RankDiffFactor),
    Trunc(TruncFactor),
}

impl Factor for BuiltinFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        match self {
            Self::TeamSum(f) => f.propagate(vars),
            Self::RankDiff(f) => f.propagate(vars),
            Self::Trunc(f) => f.propagate(vars),
        }
    }
    fn log_evidence(&self, vars: &VarStore) -> f64 {
        match self {
            Self::Trunc(f) => f.log_evidence(vars),
            _ => 0.0,
        }
    }
}

TeamSumFactor

Computes the weighted sum of player performances into a team-perf var. Inputs (player priors with beta-noise applied) are stored at construction. Runs once per game (no iteration needed).

#[derive(Debug)]
pub(crate) struct TeamSumFactor {
    /// Player performance Gaussians (pre-computed with beta noise) and their weights.
    inputs: SmallVec<[(Gaussian, f64); 4]>,
    /// Output: team performance variable.
    out: VarId,
}

impl Factor for TeamSumFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        let perf = self
            .inputs
            .iter()
            .fold(N00, |acc, (g, w)| acc + (*g * *w));
        let old = vars.get(self.out);
        vars.set(self.out, perf);
        old.delta(perf)
    }
}

RankDiffFactor

Maintains the constraint diff = team_a - team_b between three variables. In EP, this means:

  • The marginal at diff is the convolution of the marginals at team_a and team_b (in moment form: subtraction).
  • Outgoing messages to_team_a, to_team_b, to_diff are stored on the factor.

In our specific algorithm (which mirrors today's code), each iteration's RankDiff propagation:

  1. Reads the team-perf marginals (which already incorporate to_team_a / to_team_b from previous iterations and any draw factors).
  2. Computes diff_prior = team_a_marginal - team_b_marginal (in EP terms, this is the cavity for the diff direction).
  3. Writes the new diff_prior to the diff variable.
  4. Returns delta against the previous diff value.
#[derive(Debug)]
pub(crate) struct RankDiffFactor {
    pub team_a: VarId,
    pub team_b: VarId,
    pub diff: VarId,
}

impl Factor for RankDiffFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        let a = vars.get(self.team_a);
        let b = vars.get(self.team_b);
        let new_diff = a - b;
        let old = vars.get(self.diff);
        vars.set(self.diff, new_diff);
        old.delta(new_diff)
    }
}

TruncFactor

Applies the truncation constraint to a diff variable. Stores its outgoing message so the cavity computation gives the correct EP message.

#[derive(Debug)]
pub(crate) struct TruncFactor {
    pub diff: VarId,
    pub margin: f64,
    pub tie: bool,
    /// Outgoing message to diff var (stored for cavity computation).
    pub msg: Gaussian,
    /// Cached log-evidence contribution computed on first propagation.
    evidence_cached: Option<f64>,
}

impl Factor for TruncFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        // Compute cavity (current diff marginal divided by our outgoing).
        let marginal = vars.get(self.diff);
        let cavity = marginal / self.msg;

        // First pass: cache the evidence contribution from the cavity.
        if self.evidence_cached.is_none() {
            self.evidence_cached = Some(evidence_value(cavity, self.margin, self.tie));
        }

        // Apply the truncation approximation to the cavity.
        let trunc = approx(cavity, self.margin, self.tie);

        // New outgoing = trunc / cavity (so marginal = cavity * new = trunc).
        let new_msg = trunc / cavity;

        let old_msg = self.msg;
        self.msg = new_msg;

        // Update marginal: marginal_new = cavity * new_msg = trunc.
        vars.set(self.diff, trunc);

        old_msg.delta(new_msg)
    }
    fn log_evidence(&self, _vars: &VarStore) -> f64 {
        // Stored as raw evidence (probability), caller computes ln() if needed.
        self.evidence_cached.unwrap_or(1.0).ln()
    }
}

fn evidence_value(diff: Gaussian, margin: f64, tie: bool) -> f64 {
    if tie {
        cdf(margin, diff.mu(), diff.sigma()) - cdf(-margin, diff.mu(), diff.sigma())
    } else {
        1.0 - cdf(margin, diff.mu(), diff.sigma())
    }
}

This is the important departure from the current code: today's team[i].likelihood_lose and team[i].likelihood_win track three separate messages per team (one per neighbor direction × draw). In the cleaner factor-graph model, each RankDiff and Trunc factor holds its own outgoing messages. The algorithm becomes "for each iteration, propagate every factor in order, accumulate deltas." The fixed point is the same.

Caveat: this is only equivalent to today's algorithm if our schedule visits factors in an equivalent order. Today's code does forward-then-backward sweeps over diff factors, with team-perf marginals updated implicitly via posterior_win()/posterior_lose() accessors that re-multiply the priors and likelihoods. In the new code, we maintain the team-perf marginals explicitly in VarStore, with each RankDiff/Trunc/draw factor's outgoing message stored on the factor.

Schedule

// src/schedule.rs

#[derive(Debug, Clone, Copy)]
pub struct ScheduleReport {
    pub iterations: usize,
    pub final_step: (f64, f64),
    pub converged: bool,
}

pub(crate) trait Schedule {
    fn run(&self, factors: &mut [BuiltinFactor], vars: &mut VarStore) -> ScheduleReport;
}

#[derive(Debug, Clone, Copy)]
pub(crate) struct EpsilonOrMax {
    pub eps: f64,
    pub max: usize,
}

impl Default for EpsilonOrMax {
    fn default() -> Self {
        Self { eps: 1e-6, max: 10 }
    }
}

impl Schedule for EpsilonOrMax {
    fn run(&self, factors: &mut [BuiltinFactor], vars: &mut VarStore) -> ScheduleReport {
        // Special case: TeamSum factors run exactly once (their inputs don't change).
        // RankDiff + Trunc factors iterate to convergence.
        // The Game builder lays out factors so all TeamSums come first, followed by
        // alternating RankDiff/Trunc pairs.
        let n_setup = factors
            .iter()
            .position(|f| !matches!(f, BuiltinFactor::TeamSum(_)))
            .unwrap_or(factors.len());

        // One-shot setup phase.
        for f in &mut factors[..n_setup] {
            f.propagate(vars);
        }

        let mut iterations = 0;
        let mut final_step = (f64::INFINITY, f64::INFINITY);
        let mut converged = false;

        for _ in 0..self.max {
            let mut step = (0.0_f64, 0.0_f64);

            // Forward sweep through iterating factors.
            for f in factors[n_setup..].iter_mut() {
                let d = f.propagate(vars);
                step.0 = step.0.max(d.0);
                step.1 = step.1.max(d.1);
            }

            // Backward sweep (reverse order).
            for f in factors[n_setup..].iter_mut().rev() {
                let d = f.propagate(vars);
                step.0 = step.0.max(d.0);
                step.1 = step.1.max(d.1);
            }

            iterations += 1;
            final_step = step;

            if step.0 <= self.eps && step.1 <= self.eps {
                converged = true;
                break;
            }
        }

        ScheduleReport {
            iterations,
            final_step,
            converged,
        }
    }
}

Note on iteration counts: today's code does forward sweep then backward sweep within ONE while-loop iteration. The check tuple_gt(step, 1e-6) && iter < 10 happens between iterations. The schedule above does the same, so iteration counts should match.

Game refactor

Game becomes:

pub struct Game<'a, D: Drift> {
    teams: Vec<Vec<Player<D>>>,
    result: &'a [f64],
    weights: &'a [Vec<f64>],
    p_draw: f64,
    pub(crate) likelihoods: Vec<Vec<Gaussian>>,
    pub(crate) evidence: f64,
}

Same public surface. Internally, Game::new:

  1. Calls arena.reset_for_game(...) to clear VarStore + factors + sort_buf.
  2. Sorts teams by result (using arena.sort_buf).
  3. For each sorted team, creates a team_perf var (initial: N_INF) and a TeamSumFactor writing to it.
  4. For each adjacent pair, creates a diff var (initial: N_INF), a RankDiffFactor reading both team_perfs and writing to diff, and a TruncFactor operating on diff.
  5. Runs the schedule.
  6. Computes evidence as the product of TruncFactor::log_evidence().exp() (in linear space, matching old evidence field).
  7. Computes per-team likelihoods from the team-perf marginals and per-player likelihoods via the existing exclude logic.

Task list


Task 1: Pre-flight verification of T0 baseline

Files:

  • Read: benches/baseline.txt

  • Step 1: Confirm tests pass on the current branch

cargo test --features approx --lib

Expected: 38 passed; 0 failed. If anything is failing, do not proceed — investigate first.

  • Step 2: Capture a fresh T0 reference number for Batch::iteration
cargo bench --bench batch 2>&1 | grep "Batch::iteration"

Expected: in the same range as benches/baseline.txt (~21 µs on Apple M5 Pro). If your hardware differs, note the local value as your "T0 reference."

  • Step 3: No commit — this is a verification-only task.

Task 2: Introduce VarId and VarStore

Files:

  • Create: src/factor/mod.rs

  • Modify: src/lib.rs (declare module)

  • Step 1: Create src/factor/mod.rs with the new types

//! Factor graph machinery for within-game inference.

use crate::gaussian::Gaussian;

/// Identifier for a variable in a `VarStore`.
///
/// Variables hold the current Gaussian marginal and are owned by exactly one
/// `VarStore`. `VarId` is meaningful only within its owning store.
#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
pub(crate) struct VarId(pub(crate) u32);

/// Flat storage of variable marginals.
///
/// Variables are allocated by `alloc()` and accessed by `VarId`. The store is
/// reused across `Game::new` calls (it lives in the `ScratchArena`); call
/// `clear()` before reuse.
#[derive(Debug, Default)]
pub(crate) struct VarStore {
    marginals: Vec<Gaussian>,
}

impl VarStore {
    pub(crate) fn new() -> Self {
        Self::default()
    }

    pub(crate) fn clear(&mut self) {
        self.marginals.clear();
    }

    pub(crate) fn len(&self) -> usize {
        self.marginals.len()
    }

    pub(crate) fn alloc(&mut self, init: Gaussian) -> VarId {
        let id = VarId(self.marginals.len() as u32);
        self.marginals.push(init);
        id
    }

    pub(crate) fn get(&self, id: VarId) -> Gaussian {
        self.marginals[id.0 as usize]
    }

    pub(crate) fn set(&mut self, id: VarId, g: Gaussian) {
        self.marginals[id.0 as usize] = g;
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use crate::N_INF;

    #[test]
    fn alloc_assigns_sequential_ids() {
        let mut store = VarStore::new();
        let a = store.alloc(N_INF);
        let b = store.alloc(N_INF);
        let c = store.alloc(N_INF);
        assert_eq!(a, VarId(0));
        assert_eq!(b, VarId(1));
        assert_eq!(c, VarId(2));
        assert_eq!(store.len(), 3);
    }

    #[test]
    fn get_returns_initial_value() {
        let mut store = VarStore::new();
        let g = Gaussian::from_ms(2.5, 1.0);
        let id = store.alloc(g);
        assert_eq!(store.get(id), g);
    }

    #[test]
    fn set_updates_value() {
        let mut store = VarStore::new();
        let id = store.alloc(N_INF);
        let new = Gaussian::from_ms(3.0, 0.5);
        store.set(id, new);
        assert_eq!(store.get(id), new);
    }

    #[test]
    fn clear_resets_length_keeping_capacity() {
        let mut store = VarStore::new();
        store.alloc(N_INF);
        store.alloc(N_INF);
        let cap = store.marginals.capacity();
        store.clear();
        assert_eq!(store.len(), 0);
        assert_eq!(store.marginals.capacity(), cap);
    }
}
  • Step 2: Declare the module in src/lib.rs

Add this line near the other module declarations (alphabetical, after mod error):

pub(crate) mod factor;
  • Step 3: Run the tests
cargo test --features approx --lib factor::tests

Expected: 4 passing tests.

  • Step 4: Commit
git add src/factor/mod.rs src/lib.rs
git commit -m "$(cat <<'EOF'
feat(factor): introduce VarId and VarStore

Foundation types for the T1 factor graph machinery. VarStore is a
flat Vec<Gaussian> indexed by VarId; variables are allocated by
alloc() and the store can be cleared between games to reuse capacity.

Part of T1 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
EOF
)"

Task 3: Define the Factor trait and BuiltinFactor enum

Files:

  • Modify: src/factor/mod.rs

  • Step 1: Add the trait + enum definitions

Append to src/factor/mod.rs (after the VarStore impl and before the test module):

/// A factor in the EP graph.
///
/// Factors hold their own outgoing messages and propagate them by reading
/// connected variable marginals from a `VarStore` and writing back updated
/// marginals.
pub(crate) trait Factor {
    /// Update outgoing messages and write back to the var store.
    ///
    /// Returns the max delta `(|Δmu|, |Δsigma|)` across writes this
    /// propagation. Used by the `Schedule` to detect convergence.
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64);

    /// Optional log-evidence contribution. Default 0.0 (no contribution).
    fn log_evidence(&self, _vars: &VarStore) -> f64 {
        0.0
    }
}

/// Enum dispatcher for the built-in factor types.
///
/// Using an enum instead of `Box<dyn Factor>` keeps factor data inline and
/// avoids virtual-call overhead in the hot inference loop.
#[derive(Debug)]
pub(crate) enum BuiltinFactor {
    TeamSum(team_sum::TeamSumFactor),
    RankDiff(rank_diff::RankDiffFactor),
    Trunc(trunc::TruncFactor),
}

impl Factor for BuiltinFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        match self {
            Self::TeamSum(f) => f.propagate(vars),
            Self::RankDiff(f) => f.propagate(vars),
            Self::Trunc(f) => f.propagate(vars),
        }
    }

    fn log_evidence(&self, vars: &VarStore) -> f64 {
        match self {
            Self::Trunc(f) => f.log_evidence(vars),
            _ => 0.0,
        }
    }
}

pub(crate) mod rank_diff;
pub(crate) mod team_sum;
pub(crate) mod trunc;
  • Step 2: Create stub files for the three factor modules so the crate compiles

src/factor/team_sum.rs:

use crate::factor::{Factor, VarId, VarStore};
use crate::gaussian::Gaussian;

#[derive(Debug)]
pub(crate) struct TeamSumFactor {
    pub(crate) inputs: Vec<(Gaussian, f64)>,
    pub(crate) out: VarId,
}

impl Factor for TeamSumFactor {
    fn propagate(&mut self, _vars: &mut VarStore) -> (f64, f64) {
        unimplemented!("TeamSumFactor stub — implemented in Task 4")
    }
}

src/factor/rank_diff.rs:

use crate::factor::{Factor, VarId, VarStore};

#[derive(Debug)]
pub(crate) struct RankDiffFactor {
    pub(crate) team_a: VarId,
    pub(crate) team_b: VarId,
    pub(crate) diff: VarId,
}

impl Factor for RankDiffFactor {
    fn propagate(&mut self, _vars: &mut VarStore) -> (f64, f64) {
        unimplemented!("RankDiffFactor stub — implemented in Task 5")
    }
}

src/factor/trunc.rs:

use crate::factor::{Factor, VarId, VarStore};
use crate::gaussian::Gaussian;
use crate::N_INF;

#[derive(Debug)]
pub(crate) struct TruncFactor {
    pub(crate) diff: VarId,
    pub(crate) margin: f64,
    pub(crate) tie: bool,
    pub(crate) msg: Gaussian,
    pub(crate) evidence_cached: Option<f64>,
}

impl TruncFactor {
    pub(crate) fn new(diff: VarId, margin: f64, tie: bool) -> Self {
        Self {
            diff,
            margin,
            tie,
            msg: N_INF,
            evidence_cached: None,
        }
    }
}

impl Factor for TruncFactor {
    fn propagate(&mut self, _vars: &mut VarStore) -> (f64, f64) {
        unimplemented!("TruncFactor stub — implemented in Task 6")
    }
}
  • Step 3: Verify the crate still builds (the unused stubs will warn)
cargo build --features approx

Expected: builds successfully with warnings about unimplemented methods (because nothing calls them yet).

  • Step 4: Commit
git add src/factor/
git commit -m "$(cat <<'EOF'
feat(factor): introduce Factor trait and BuiltinFactor enum

Adds the trait that all factors implement and the enum dispatcher
used by the schedule to drive heterogeneous factors without dynamic
dispatch in the hot loop.

The three built-in factors (TeamSum, RankDiff, Trunc) are stubbed
out; concrete implementations follow in tasks 4-6.
EOF
)"

Task 4: Implement TeamSumFactor

Files:

  • Modify: src/factor/team_sum.rs

  • Step 1: Write the failing test

Replace the contents of src/factor/team_sum.rs with:

use crate::factor::{Factor, VarId, VarStore};
use crate::gaussian::Gaussian;
use crate::{N00, N_INF};

/// Computes the weighted sum of player performances into a team-perf var.
///
/// Inputs are pre-computed player performance Gaussians (i.e., player priors
/// already with beta² noise added via `Player::performance()`). The factor
/// runs once per game and writes the weighted sum to the output var.
#[derive(Debug)]
pub(crate) struct TeamSumFactor {
    pub(crate) inputs: Vec<(Gaussian, f64)>,
    pub(crate) out: VarId,
}

impl Factor for TeamSumFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        let perf = self
            .inputs
            .iter()
            .fold(N00, |acc, (g, w)| acc + (*g * *w));
        let old = vars.get(self.out);
        vars.set(self.out, perf);
        old.delta(perf)
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn single_player_unit_weight() {
        let mut vars = VarStore::new();
        let out = vars.alloc(N_INF);
        let g = Gaussian::from_ms(25.0, 5.0);
        let mut f = TeamSumFactor {
            inputs: vec![(g, 1.0)],
            out,
        };

        f.propagate(&mut vars);
        let result = vars.get(out);
        // N00 + (g * 1.0) = g (in moment form)
        assert!((result.mu() - 25.0).abs() < 1e-12);
        assert!((result.sigma() - 5.0).abs() < 1e-12);
    }

    #[test]
    fn two_players_summed() {
        let mut vars = VarStore::new();
        let out = vars.alloc(N_INF);
        let g1 = Gaussian::from_ms(20.0, 3.0);
        let g2 = Gaussian::from_ms(30.0, 4.0);
        let mut f = TeamSumFactor {
            inputs: vec![(g1, 1.0), (g2, 1.0)],
            out,
        };

        f.propagate(&mut vars);
        let result = vars.get(out);
        // sum: mu = 20 + 30 = 50, var = 9 + 16 = 25, sigma = 5
        assert!((result.mu() - 50.0).abs() < 1e-12);
        assert!((result.sigma() - 5.0).abs() < 1e-12);
    }

    #[test]
    fn weighted_inputs() {
        let mut vars = VarStore::new();
        let out = vars.alloc(N_INF);
        let g = Gaussian::from_ms(10.0, 2.0);
        let mut f = TeamSumFactor {
            inputs: vec![(g, 2.0)],
            out,
        };

        f.propagate(&mut vars);
        let result = vars.get(out);
        // g * 2.0: mu = 10*2 = 20, sigma = 2*2 = 4
        assert!((result.mu() - 20.0).abs() < 1e-12);
        assert!((result.sigma() - 4.0).abs() < 1e-12);
    }

    #[test]
    fn delta_is_zero_on_repeat_propagate() {
        let mut vars = VarStore::new();
        let out = vars.alloc(N_INF);
        let g = Gaussian::from_ms(5.0, 1.0);
        let mut f = TeamSumFactor {
            inputs: vec![(g, 1.0)],
            out,
        };

        f.propagate(&mut vars);
        let (dmu, dsig) = f.propagate(&mut vars);
        assert!(dmu < 1e-12, "expected ~0 delta on repeat, got {}", dmu);
        assert!(dsig < 1e-12);
    }
}
  • Step 2: Run the tests — they should fail compile because the stub uses unimplemented!. After this rewrite they should pass.
cargo test --features approx --lib factor::team_sum

Expected: 4 passing tests.

  • Step 3: Commit
git add src/factor/team_sum.rs
git commit -m "$(cat <<'EOF'
feat(factor): implement TeamSumFactor

Computes the weighted sum of player performance Gaussians into a
team-performance variable. Runs once per game (no iteration needed).
EOF
)"

Task 5: Implement RankDiffFactor

Files:

  • Modify: src/factor/rank_diff.rs

  • Step 1: Replace the stub with the implementation + tests

use crate::factor::{Factor, VarId, VarStore};

/// Maintains the constraint `diff = team_a - team_b` between three vars.
///
/// On each propagation:
/// - Reads marginals at `team_a` and `team_b` (which already incorporate any
///   incoming messages from neighboring factors).
/// - Computes `new_diff = team_a - team_b` (variance addition; see Gaussian::Sub).
/// - Writes the new marginal to `diff`.
/// - Returns the delta against the previous diff value.
///
/// This factor does NOT store an outgoing message; the diff variable is
/// effectively replaced on each propagation. The TruncFactor on the same diff
/// var holds the EP-divide message that produces the cavity.
#[derive(Debug)]
pub(crate) struct RankDiffFactor {
    pub(crate) team_a: VarId,
    pub(crate) team_b: VarId,
    pub(crate) diff: VarId,
}

impl Factor for RankDiffFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        let a = vars.get(self.team_a);
        let b = vars.get(self.team_b);
        let new_diff = a - b;
        let old = vars.get(self.diff);
        vars.set(self.diff, new_diff);
        old.delta(new_diff)
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use crate::gaussian::Gaussian;
    use crate::N_INF;

    #[test]
    fn diff_of_two_known_gaussians() {
        let mut vars = VarStore::new();
        let team_a = vars.alloc(Gaussian::from_ms(25.0, 3.0));
        let team_b = vars.alloc(Gaussian::from_ms(20.0, 4.0));
        let diff = vars.alloc(N_INF);

        let mut f = RankDiffFactor {
            team_a,
            team_b,
            diff,
        };
        f.propagate(&mut vars);

        let result = vars.get(diff);
        // mu = 25 - 20 = 5; var = 9 + 16 = 25; sigma = 5
        assert!((result.mu() - 5.0).abs() < 1e-12);
        assert!((result.sigma() - 5.0).abs() < 1e-12);
    }

    #[test]
    fn delta_zero_on_repeat() {
        let mut vars = VarStore::new();
        let team_a = vars.alloc(Gaussian::from_ms(10.0, 2.0));
        let team_b = vars.alloc(Gaussian::from_ms(8.0, 1.0));
        let diff = vars.alloc(N_INF);

        let mut f = RankDiffFactor {
            team_a,
            team_b,
            diff,
        };
        f.propagate(&mut vars);
        let (dmu, dsig) = f.propagate(&mut vars);
        assert!(dmu < 1e-12);
        assert!(dsig < 1e-12);
    }

    #[test]
    fn delta_reflects_team_change() {
        let mut vars = VarStore::new();
        let team_a = vars.alloc(Gaussian::from_ms(10.0, 1.0));
        let team_b = vars.alloc(Gaussian::from_ms(0.0, 1.0));
        let diff = vars.alloc(N_INF);

        let mut f = RankDiffFactor {
            team_a,
            team_b,
            diff,
        };
        f.propagate(&mut vars);

        // change team_a, repropagate; delta should be positive
        vars.set(team_a, Gaussian::from_ms(15.0, 1.0));
        let (dmu, _dsig) = f.propagate(&mut vars);
        assert!(dmu > 4.0, "expected ~5 delta, got {}", dmu);
    }
}
  • Step 2: Run the tests
cargo test --features approx --lib factor::rank_diff

Expected: 3 passing tests.

  • Step 3: Commit
git add src/factor/rank_diff.rs
git commit -m "$(cat <<'EOF'
feat(factor): implement RankDiffFactor

Maintains diff = team_a - team_b across three variables. On each
propagation, reads the team-perf marginals (which may have been
updated by neighboring factors) and computes the new diff via
Gaussian Sub (variance addition).
EOF
)"

Task 6: Implement TruncFactor

Files:

  • Modify: src/factor/trunc.rs

  • Modify: src/lib.rs (expose cdf, approx to the factor module)

  • Step 1: Make cdf and approx pub(crate) so the factor can use them

In src/lib.rs, change:

fn cdf(x: f64, mu: f64, sigma: f64) -> f64 {

to:

pub(crate) fn cdf(x: f64, mu: f64, sigma: f64) -> f64 {

approx is already pub(crate), no change there.

  • Step 2: Replace the stub with the full TruncFactor
use crate::factor::{Factor, VarId, VarStore};
use crate::gaussian::Gaussian;
use crate::{N_INF, approx, cdf};

/// EP truncation factor on a diff variable.
///
/// Implements the rectified-Gaussian approximation that turns a diff
/// distribution into a "this team rank-beats that team" or "tied" likelihood.
/// Stores its outgoing message to the diff variable so the cavity computation
/// produces the correct EP message on each propagation.
#[derive(Debug)]
pub(crate) struct TruncFactor {
    pub(crate) diff: VarId,
    pub(crate) margin: f64,
    pub(crate) tie: bool,
    /// Outgoing message to the diff variable (initial: N_INF, the EP identity).
    pub(crate) msg: Gaussian,
    /// Cached evidence (linear, not log) computed from the cavity on first propagation.
    pub(crate) evidence_cached: Option<f64>,
}

impl TruncFactor {
    pub(crate) fn new(diff: VarId, margin: f64, tie: bool) -> Self {
        Self {
            diff,
            margin,
            tie,
            msg: N_INF,
            evidence_cached: None,
        }
    }
}

impl Factor for TruncFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        let marginal = vars.get(self.diff);
        // Cavity: marginal divided by our outgoing message.
        let cavity = marginal / self.msg;

        // First-time-only: cache the evidence contribution from the cavity.
        if self.evidence_cached.is_none() {
            self.evidence_cached = Some(cavity_evidence(cavity, self.margin, self.tie));
        }

        // Apply the truncation approximation to the cavity.
        let trunc = approx(cavity, self.margin, self.tie);

        // New outgoing message such that cavity * new = trunc.
        let new_msg = trunc / cavity;
        let old_msg = self.msg;
        self.msg = new_msg;

        // Update the marginal.
        vars.set(self.diff, trunc);

        old_msg.delta(new_msg)
    }

    fn log_evidence(&self, _vars: &VarStore) -> f64 {
        self.evidence_cached.unwrap_or(1.0).ln()
    }
}

/// P(diff > margin) for non-tie, P(|diff| < margin) for tie.
fn cavity_evidence(diff: Gaussian, margin: f64, tie: bool) -> f64 {
    if tie {
        cdf(margin, diff.mu(), diff.sigma()) - cdf(-margin, diff.mu(), diff.sigma())
    } else {
        1.0 - cdf(margin, diff.mu(), diff.sigma())
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn idempotent_after_convergence() {
        // After enough iterations, propagate should return ~0 delta.
        let mut vars = VarStore::new();
        // Set diff to a typical EP value.
        let diff = vars.alloc(Gaussian::from_ms(2.0, 3.0));

        let mut f = TruncFactor::new(diff, 0.0, false);

        // Propagate many times; delta should drop toward 0.
        let mut last = (f64::INFINITY, f64::INFINITY);
        for _ in 0..20 {
            last = f.propagate(&mut vars);
        }
        assert!(last.0 < 1e-10, "expected converged delta, got {}", last.0);
        assert!(last.1 < 1e-10);
    }

    #[test]
    fn evidence_cached_on_first_propagate() {
        let mut vars = VarStore::new();
        let diff = vars.alloc(Gaussian::from_ms(2.0, 3.0));

        let mut f = TruncFactor::new(diff, 0.0, false);
        assert!(f.evidence_cached.is_none());

        f.propagate(&mut vars);
        assert!(f.evidence_cached.is_some());
        let first = f.evidence_cached.unwrap();

        // Evidence should be P(diff > 0) for diff ~ N(2, 9) ≈ 0.748
        assert!(first > 0.7);
        assert!(first < 0.8);

        // Subsequent propagations don't change it.
        f.propagate(&mut vars);
        assert_eq!(f.evidence_cached.unwrap(), first);
    }

    #[test]
    fn tie_evidence_uses_two_sided() {
        let mut vars = VarStore::new();
        let diff = vars.alloc(Gaussian::from_ms(0.0, 2.0));

        let mut f = TruncFactor::new(diff, 1.0, true);
        f.propagate(&mut vars);

        // For diff ~ N(0, 4), tie=true with margin=1: P(-1 < diff < 1) ≈ 0.383
        let ev = f.evidence_cached.unwrap();
        assert!(ev > 0.35 && ev < 0.42);
    }
}
  • Step 3: Run the tests
cargo test --features approx --lib factor::trunc

Expected: 3 passing tests.

  • Step 4: Commit
git add src/factor/trunc.rs src/lib.rs
git commit -m "$(cat <<'EOF'
feat(factor): implement TruncFactor with cached evidence

EP truncation factor that operates on a diff variable. Stores its
outgoing message so the cavity computation produces the correct EP
message on each propagation. The first propagation caches the
evidence contribution (cdf-bounded probability) for log_evidence().

Promotes lib::cdf to pub(crate) so the factor can use it.
EOF
)"

Task 7: Define Schedule trait and EpsilonOrMax impl

Files:

  • Create: src/schedule.rs

  • Modify: src/lib.rs (declare module + export ScheduleReport)

  • Step 1: Create src/schedule.rs

//! Schedule trait and built-in implementations.
//!
//! A schedule drives factor propagation to convergence. The default
//! `EpsilonOrMax` performs one TeamSum sweep (setup) then alternating
//! forward/backward sweeps over the iterating factors until the max
//! delta drops below epsilon or `max` iterations is reached.

use crate::factor::{BuiltinFactor, Factor, VarStore};

/// Result returned by a `Schedule::run` call.
#[derive(Debug, Clone, Copy)]
pub struct ScheduleReport {
    pub iterations: usize,
    pub final_step: (f64, f64),
    pub converged: bool,
}

/// Drives factor propagation to convergence.
pub(crate) trait Schedule {
    fn run(&self, factors: &mut [BuiltinFactor], vars: &mut VarStore) -> ScheduleReport;
}

/// Default schedule: sweep forward then backward until step ≤ eps or iter == max.
///
/// Matches the existing `Game::likelihoods` loop bit-for-bit when given the
/// same factor layout (TeamSums first, then alternating RankDiff/Trunc pairs).
#[derive(Debug, Clone, Copy)]
pub(crate) struct EpsilonOrMax {
    pub eps: f64,
    pub max: usize,
}

impl Default for EpsilonOrMax {
    fn default() -> Self {
        // Matches today's hard-coded tolerance and iteration cap.
        Self { eps: 1e-6, max: 10 }
    }
}

impl Schedule for EpsilonOrMax {
    fn run(&self, factors: &mut [BuiltinFactor], vars: &mut VarStore) -> ScheduleReport {
        // Partition: leading run of TeamSum factors run exactly once (setup).
        let n_setup = factors
            .iter()
            .position(|f| !matches!(f, BuiltinFactor::TeamSum(_)))
            .unwrap_or(factors.len());

        for f in factors[..n_setup].iter_mut() {
            f.propagate(vars);
        }

        let mut iterations = 0;
        let mut final_step = (f64::INFINITY, f64::INFINITY);
        let mut converged = false;

        for _ in 0..self.max {
            let mut step = (0.0_f64, 0.0_f64);

            // Forward sweep over iterating factors.
            for f in factors[n_setup..].iter_mut() {
                let d = f.propagate(vars);
                step.0 = step.0.max(d.0);
                step.1 = step.1.max(d.1);
            }

            // Backward sweep.
            for f in factors[n_setup..].iter_mut().rev() {
                let d = f.propagate(vars);
                step.0 = step.0.max(d.0);
                step.1 = step.1.max(d.1);
            }

            iterations += 1;
            final_step = step;

            if step.0 <= self.eps && step.1 <= self.eps {
                converged = true;
                break;
            }
        }

        ScheduleReport {
            iterations,
            final_step,
            converged,
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use crate::factor::team_sum::TeamSumFactor;
    use crate::gaussian::Gaussian;
    use crate::N_INF;

    #[test]
    fn schedule_runs_setup_factors_once() {
        // Single TeamSum factor; schedule should propagate it exactly once and report 0 iterations.
        let mut vars = VarStore::new();
        let out = vars.alloc(N_INF);
        let factors = vec![BuiltinFactor::TeamSum(TeamSumFactor {
            inputs: vec![(Gaussian::from_ms(5.0, 1.0), 1.0)],
            out,
        })];
        let mut factors = factors;
        let schedule = EpsilonOrMax::default();
        let report = schedule.run(&mut factors, &mut vars);
        assert_eq!(report.iterations, 0);
        // The team-perf var should hold the sum.
        let result = vars.get(out);
        assert!((result.mu() - 5.0).abs() < 1e-12);
    }

    #[test]
    fn report_marks_converged_when_step_below_eps() {
        // Trivial setup-only graph: no iterating factors; should converge immediately.
        let mut vars = VarStore::new();
        let out = vars.alloc(N_INF);
        let mut factors = vec![BuiltinFactor::TeamSum(TeamSumFactor {
            inputs: vec![(Gaussian::from_ms(0.0, 1.0), 1.0)],
            out,
        })];
        let report = EpsilonOrMax::default().run(&mut factors, &mut vars);
        // No iterating factors → 0 iterations, converged = false in the strict sense
        // because we never entered the loop. Documented behavior: `converged` is true
        // only if the inner loop ran and saw step <= eps.
        // For the no-iterating-factors case, iterations == 0 and converged == false.
        assert_eq!(report.iterations, 0);
    }
}
  • Step 2: Declare the module + re-export in src/lib.rs

Add near the other module declarations:

pub(crate) mod schedule;

And add to the public re-exports:

pub use schedule::ScheduleReport;
  • Step 3: Run the tests
cargo test --features approx --lib schedule

Expected: 2 passing tests.

  • Step 4: Commit
git add src/schedule.rs src/lib.rs
git commit -m "$(cat <<'EOF'
feat(schedule): add Schedule trait and EpsilonOrMax impl

EpsilonOrMax mirrors today's Game::likelihoods loop: sweep forward
then backward over iterating factors, capped at 10 iterations or
step <= 1e-6. Setup factors (TeamSum) run exactly once before the
loop begins.

ScheduleReport is the only public surface from this module.
EOF
)"

Task 8: Refactor Game to use the factor graph

Files:

  • Modify: src/game.rs (large rewrite of likelihoods())
  • Modify: src/arena.rs (replace TeamMessage/DiffMessage buffers with VarStore + factors + sort_buf)
  • Delete: src/message.rs (no longer needed) — actually, pub(crate) types may still be referenced; we'll let the compiler tell us if we can delete.

This is the largest task in T1. It replaces the body of Game::likelihoods with: build the factor graph from teams/result/weights/p_draw, run the schedule, then extract per-team likelihoods.

  • Step 1: Rewrite src/arena.rs — drop TeamMessage/DiffMessage buffers
use crate::factor::{BuiltinFactor, VarStore};

/// Reusable scratch buffers for `Game::likelihoods`.
///
/// A `Batch` owns one arena; all events in the slice share it across
/// the convergence iterations.
#[derive(Debug, Default)]
pub struct ScratchArena {
    pub(crate) vars: VarStore,
    pub(crate) factors: Vec<BuiltinFactor>,
    pub(crate) sort_buf: Vec<usize>,
}

impl ScratchArena {
    pub fn new() -> Self {
        Self::default()
    }

    #[inline]
    pub(crate) fn reset(&mut self) {
        self.vars.clear();
        self.factors.clear();
        self.sort_buf.clear();
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use crate::N_INF;

    #[test]
    fn reset_keeps_capacity() {
        let mut arena = ScratchArena::new();
        arena.vars.alloc(N_INF);
        arena.sort_buf.push(42);
        let var_cap = arena.vars.marginals.capacity();
        let sort_cap = arena.sort_buf.capacity();
        arena.reset();
        assert_eq!(arena.vars.len(), 0);
        assert_eq!(arena.sort_buf.len(), 0);
        assert_eq!(arena.vars.marginals.capacity(), var_cap);
        assert_eq!(arena.sort_buf.capacity(), sort_cap);
    }
}

You'll need to make VarStore::marginals pub(crate) for the capacity check (or add a capacity() method). Do whichever you prefer; the test is non-essential if neither feels right.

  • Step 2: Rewrite Game::likelihoods in src/game.rs

Replace the existing likelihoods method with:

fn likelihoods(&mut self, arena: &mut ScratchArena) {
    arena.reset();
    let result = self.result;
    let weights = self.weights;
    let teams = &self.teams;
    let p_draw = self.p_draw;

    // 1. Sort teams by result, descending. Use arena's sort_buf as scratch.
    let n_teams = teams.len();
    arena.sort_buf.clear();
    arena.sort_buf.extend(0..n_teams);
    arena.sort_buf.sort_by(|&i, &j| {
        result[j].partial_cmp(&result[i]).unwrap_or(std::cmp::Ordering::Equal)
    });
    let order = &arena.sort_buf;

    // 2. Allocate team-perf vars in sorted order; create one TeamSumFactor each.
    let mut team_vars: SmallVec<[VarId; 8]> = SmallVec::new();
    for &t_idx in order.iter() {
        let var = arena.vars.alloc(N_INF);
        team_vars.push(var);
        let inputs: Vec<(Gaussian, f64)> = teams[t_idx]
            .iter()
            .zip(weights[t_idx].iter())
            .map(|(p, &w)| (p.performance(), w))
            .collect();
        arena.factors.push(BuiltinFactor::TeamSum(TeamSumFactor {
            inputs,
            out: var,
        }));
    }

    // 3. For each adjacent pair, allocate diff var and create RankDiff + Trunc factors.
    let mut diff_vars: SmallVec<[VarId; 8]> = SmallVec::new();
    for w in 0..n_teams.saturating_sub(1) {
        let diff = arena.vars.alloc(N_INF);
        diff_vars.push(diff);
        arena.factors.push(BuiltinFactor::RankDiff(RankDiffFactor {
            team_a: team_vars[w],
            team_b: team_vars[w + 1],
            diff,
        }));
        let tie = result[order[w]] == result[order[w + 1]];
        let margin = if p_draw == 0.0 {
            0.0
        } else {
            let beta_a: f64 = teams[order[w]].iter().map(|p| p.beta.powi(2)).sum();
            let beta_b: f64 = teams[order[w + 1]].iter().map(|p| p.beta.powi(2)).sum();
            compute_margin(p_draw, (beta_a + beta_b).sqrt())
        };
        arena.factors.push(BuiltinFactor::Trunc(TruncFactor::new(diff, margin, tie)));
    }

    // 4. Run the schedule.
    let report = EpsilonOrMax::default().run(&mut arena.factors, &mut arena.vars);
    let _ = report; // (currently unused; future API will surface this)

    // 5. Compute evidence from cached TruncFactor evidences (in linear space, matching old field).
    self.evidence = arena
        .factors
        .iter()
        .map(|f| f.log_evidence(&arena.vars))
        .map(f64::exp)
        .product();

    // 6. Per-team likelihoods. The team-perf marginal is what `team_likelihood` was.
    //    Map back to original (un-sorted) team order.
    let m_t_ft: SmallVec<[Gaussian; 8]> = (0..n_teams)
        .map(|t_idx| {
            let sorted_pos = order.iter().position(|&x| x == t_idx).unwrap();
            arena.vars.get(team_vars[sorted_pos])
        })
        .collect();

    self.likelihoods = teams
        .iter()
        .zip(weights.iter())
        .zip(m_t_ft.iter())
        .map(|((p, w), &m)| {
            let performance = p
                .iter()
                .zip(w.iter())
                .fold(N00, |acc, (player, &weight)| {
                    acc + (player.performance() * weight)
                });
            p.iter()
                .zip(w.iter())
                .map(|(player, &w)| {
                    ((m - performance.exclude(player.performance() * w)) * (1.0 / w))
                        .forget(player.beta.powi(2))
                })
                .collect::<Vec<_>>()
        })
        .collect::<Vec<_>>();
}

You'll need to update the use statements at the top of game.rs:

use crate::{
    N_INF, N00,
    arena::ScratchArena,
    compute_margin,
    drift::Drift,
    factor::{BuiltinFactor, Factor, VarId},
    factor::rank_diff::RankDiffFactor,
    factor::team_sum::TeamSumFactor,
    factor::trunc::TruncFactor,
    gaussian::Gaussian,
    player::Player,
    schedule::{EpsilonOrMax, Schedule},
};
use smallvec::SmallVec;

You may also need to add smallvec to Cargo.toml:

[dependencies]
smallvec = "1"

(Check cargo tree to see if it's already pulled in transitively; if so, no change needed.)

  • Step 3: Try to build
cargo build --features approx

Expected: the build will likely fail with several errors. Fix them iteratively. Common issues:

  • message.rs is no longer used — delete its mod message; declaration in lib.rs (and the file itself) if nothing references TeamMessage/DiffMessage.
  • The evidence() and tuple_max/tuple_gt helpers in lib.rs may now be unused — remove them.
  • Some imports may be dead.

If message.rs deletion causes test failures, leave it and we'll clean up in T2. Mark this clearly with a comment.

  • Step 4: Run the test suite
cargo test --features approx --lib

Expected: most tests pass. Some hardcoded golden values in game.rs::tests and history.rs::tests may drift by a few ULPs because the new schedule iterates factors in a slightly different order than today's hand-rolled loop.

For each failing test:

  1. Check the actual output (the test harness prints left = ... vs right = ...).
  2. If the difference is within 1e-6 (the existing tolerance is epsilon = 1e-6), the test framework should pass it. If it doesn't, the difference is larger — investigate.
  3. If the difference is in the last few ULPs and within 1e-5, update the golden value with a comment explaining the T1 ULP shift.
  4. If the difference is larger than 1e-5, the implementation is wrong — review the factor logic.
  • Step 5: Format and lint
cargo +nightly fmt
cargo clippy --all-targets --features approx -- -D warnings
  • Step 6: Run benchmarks
cargo bench --bench batch 2>&1 | grep "Batch::iteration"
cargo bench --bench gaussian 2>&1 | grep "Gaussian::"

Expected: Batch::iteration ≤ T0 (~21.5 µs on Apple M5 Pro). Gaussian numbers unchanged. If Batch::iteration regressed by more than 5%, profile and investigate before committing.

  • Step 7: Commit (the big one)
git add src/game.rs src/arena.rs src/lib.rs src/factor/ src/schedule.rs
git rm src/message.rs  # only if it's no longer needed
git commit -m "$(cat <<'EOF'
refactor(game): rebuild Game::likelihoods on factor-graph machinery

Game now constructs a VarStore + Vec<BuiltinFactor> from its teams/
result/weights/p_draw and runs an EpsilonOrMax schedule to drive
inference. Public API of Game (new, posteriors, likelihoods field,
evidence field) is unchanged.

The schedule runs TeamSum factors once (setup), then alternates
forward/backward sweeps over RankDiff/Trunc factors until step <=
1e-6 or 10 iterations. Iteration counts match T0 to within ULP-bounded
floating-point drift.

ScratchArena no longer holds TeamMessage/DiffMessage buffers; it now
holds a VarStore, a Vec<BuiltinFactor>, and a Vec<usize> sort_buf.

src/message.rs deleted (TeamMessage/DiffMessage are obsolete).
EOF
)"

Task 9: Final verification and benchmark report

Files:

  • Modify: benches/baseline.txt (append T1 numbers)

  • Step 1: Run the full test suite

cargo test --features approx
cargo clippy --all-targets --features approx -- -D warnings
cargo +nightly fmt --check

All green.

  • Step 2: Capture T1 benchmark numbers
cargo bench --bench batch 2>&1 | grep "Batch::iteration"
cargo bench --bench gaussian 2>&1 | grep "Gaussian::"
  • Step 3: Append T1 numbers to benches/baseline.txt

After the T0 block, add:

# After T1 (date, same hardware)

Batch::iteration           <X.XX> µs   (vs T0 21.253 µs: <delta>)
Gaussian::add             <X.XX> ps    (unchanged)
Gaussian::sub             <X.XX> ps    (unchanged)
Gaussian::mul             <X.XX> ps    (unchanged — nat-param storage)
Gaussian::div             <X.XX> ps    (unchanged)
Gaussian::pi              <X.XX> ps    (unchanged)
Gaussian::tau             <X.XX> ps    (unchanged)

# Notes:
# - Acceptance criterion was "no regression vs T0"; achieved <yes/no>.
# - Per-iteration counts unchanged (verified by re-running test_1vs1vs1 etc.).
# - Within-game inference is now driven by EpsilonOrMax schedule over
#   BuiltinFactor enum dispatch; Game::likelihoods reduced from a hand-
#   rolled EP loop to a 6-step builder + schedule.run + likelihood extraction.
  • Step 4: Commit benchmark numbers
git add benches/baseline.txt
git commit -m "$(cat <<'EOF'
bench: capture T1 final numbers

Batch::iteration: <X.XX> µs (T0 was 21.253 µs)
Gaussian::* unchanged.

Acceptance: factor-graph refactor lands without regression.
Closes T1 tier.
EOF
)"

Self-review notes

Spec coverage (against docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md Section 2):

  • Factor trait (Tasks 3, 4, 5, 6)
  • BuiltinFactor enum (Task 3)
  • VarStore (Task 2)
  • Schedule trait + EpsilonOrMax (Task 7)
  • Re-implement Game::likelihoods on top of factors (Task 8)
  • ScheduleReport (Task 7, exported via pub use)
  • Acceptance: existing tests pass, iteration counts match, no benchmark regression (Task 9)

Deferred to later tiers:

  • MarginFactor, SynergyFactor, ScoreFactor → T4
  • Damped, Residual schedules → T4 (only EpsilonOrMax lands in T1)
  • Surfacing ScheduleReport to callers (currently discarded in Game::likelihoods) → T2 along with the Observer trait
  • Factor graph for cross-history forward/backward sweep → T1 only does within-game; cross-history stays as-is
  • Eliminating remaining allocations from T0 post-mortem (within_priors, Game::likelihoods output) → likely T2 with the new API surface

Things to watch during execution:

  • The T1 algorithm uses true EP cavity-based message storage in TruncFactor (the msg field), whereas T0 used the team-side message storage (likelihood_lose etc.). Mathematically equivalent; same fixed point. Iteration counts should match within ±1.
  • RankDiffFactor does NOT store its own outgoing message; it always overwrites the diff variable from the team-perf marginals. This works because the only consumer of the diff var is the colocated TruncFactor, and the schedule alternates them. If we add a third factor on the same diff var (e.g., a MarginFactor in T4), RankDiffFactor will need to grow outgoing-message storage. Document this in the comments.
  • EpsilonOrMax::run partitions factors by BuiltinFactor::TeamSum. This implicitly assumes the Game builder lays out factors in the order [TeamSum*, RankDiff/Trunc alternating]. Document the contract.
  • The evidence field in Game is a linear product (matches old behavior). New Game derives it from TruncFactor::log_evidence().exp().product(). Floating point of exp(ln(x)) may drift by a ULP from the original direct product; verify against existing test goldens.
  • message.rs deletion may have cross-file consequences if anything outside game.rs imports TeamMessage or DiffMessage. Run grep -rn 'TeamMessage\|DiffMessage' src/ before deleting.
  • cargo +nightly fmt (per user preference); nightly enables imports_granularity and group_imports from rustfmt.toml.