Bite-sized, TDD-style task breakdown for the second tier of the engine redesign: introduce VarStore, Factor trait, BuiltinFactor enum, and EpsilonOrMax schedule, then re-implement Game::likelihoods on top of the new machinery. Internal-only refactor; public Game/History API unchanged. Acceptance: existing tests pass within ULP, iteration counts match T0, no Batch::iteration regression vs T0 (~21.5 µs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
53 KiB
T1 — Factor Graph Machinery Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Re-implement the within-game inference (Game::likelihoods) on top of an explicit factor-graph data structure (VarStore, Factor trait, BuiltinFactor enum, Schedule trait), without changing observable behavior. T1 is a pure internal restructure — public API of Game and Batch is untouched.
Architecture: Variables hold their current Gaussian marginals in a flat VarStore indexed by VarId. Factors hold their outgoing messages and propagate them via Factor::propagate(&mut VarStore) -> (f64, f64) returning the max delta. The four built-in factors (TeamSum, RankDiff, Trunc) — wrapped in a BuiltinFactor enum to avoid dyn dispatch in the hot path — exactly reproduce the current EP algorithm. A Schedule trait drives factor propagation; the default EpsilonOrMax schedule is bit-equivalent to today's hard-coded forward+backward sweep.
Tech Stack: Rust 2024 edition, criterion benchmarks, approx crate for floating-point comparisons. Builds on T0 (natural-parameter Gaussian, dense Vec storage, ScratchArena).
Acceptance criteria
- All existing tests pass (
cargo test --features approx). Tolerance bounded by1e-6(same as T0); if a test drifts beyond that, the implementation is wrong. - Within-game iteration counts match T0 (the
EpsilonOrMaxschedule should produce identical iteration counts on identical inputs). cargo bench --bench batchshows no regression vs T0 (Batch::iteration≤ 21.5 µs on Apple M5 Pro). Aim for parity or modest improvement.cargo clippy --all-targets --features approxclean.cargo +nightly fmt --checkclean.- Public API of
History,HistoryBuilder,Game::new,Game::posteriors,Player,Gaussian,quality()unchanged. Internal types (Factor,VarStore,Schedule,BuiltinFactor) arepub(crate).
Background — the algorithm we're refactoring
The current Game::likelihoods implements EP for ranked teams with optional draws. For an n-team game (sorted by result, descending), the algorithm:
-
Build team performances (one-shot): for each team, compute the weighted sum of player performance Gaussians (
p + (player.perf() * weight)folded). Stored asteam[i].prior. -
Initialise diff priors:
diff[i].prior = team[i].prior - team[i+1].priorfor adjacent ranked pairs. -
EP loop (max 10 iterations or until step ≤ 1e-6):
- Forward sweep: for each diff
e:- Recompute
diff[e].prior = team[e].posterior_win() - team[e+1].posterior_lose() - On iter 0, accumulate evidence:
self.evidence *= cdf(margin, diff.prior) bounds - Compute
diff[e].likelihood = approx(diff[e].prior, margin, tie) / diff[e].prior - Update
team[e+1].likelihood_lose = team[e].posterior_win() - diff[e].likelihood
- Recompute
- Backward sweep: for each diff
e(reverse):- Same recompute, then update
team[e].likelihood_win = team[e+1].posterior_lose() + diff[e].likelihood
- Same recompute, then update
- Track max delta across all writes.
- Forward sweep: for each diff
-
Special-case 2-team games (
diff.len() == 1): the loop above doesn't execute (range is empty), so a single direct propagation is done. -
Final boundary updates that close the chain.
-
Compute likelihoods: for each team, the team's "likelihood" is
likelihood_win * likelihood_lose * likelihood_draw. From this, derive per-player likelihoods via((m - performance.exclude(p.perf() * w)) * (1.0 / w)).forget(p.beta²).
In factor-graph terms:
team[i].prioris the marginal at the team-perf variable, before any messages.team[i].likelihood_loseis the outgoing message FROM the RankDiff factor TO team[i] (left-to-right).team[i].likelihood_winis the outgoing message FROM the RankDiff factor TO team[i-1] (right-to-left).diff[i].likelihoodis the outgoing message FROM the Trunc factor TO diff[i].diff[i].prioris the marginal at diff[i] from the RankDiff factor.
T1 makes these correspondences explicit by introducing factor objects that own these messages. The math stays identical; the organization changes.
File map
Created:
src/factor/mod.rs— module root:VarId,VarStore,Factortrait,BuiltinFactorenum,ScheduleReportsrc/factor/team_sum.rs—TeamSumFactor(one-shot weighted sum)src/factor/rank_diff.rs—RankDiffFactor(linear-combination factor between two team-perf vars and a diff var)src/factor/trunc.rs—TruncFactor(EP truncation factor on a diff var)src/schedule.rs—Scheduletrait +EpsilonOrMaximpl
Modified:
src/lib.rs— addpub(crate) mod factor;andpub(crate) mod schedule;src/game.rs—Gamebecomes{ vars: VarStore, factors: Vec<BuiltinFactor>, ... };Game::likelihoodsbuilds the graph and runs the schedulesrc/message.rs—TeamMessageandDiffMessagedeleted (replaced by factors and VarStore)src/arena.rs—ScratchArenaloses theteams/diffs/ties/marginsfields; gains aVarStore,Vec<BuiltinFactor>, and asort_buf: Vec<usize>for sort_perm scratch
Touched (test-only):
src/game.rs(test module) — no API change, but per-iteration goldens may drift by a ULP
Design
Variables and VarStore
// src/factor/mod.rs
#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
pub(crate) struct VarId(pub(crate) u32);
/// Flat storage for all variable marginals in one game's factor graph.
#[derive(Debug, Default)]
pub(crate) struct VarStore {
marginals: Vec<Gaussian>,
}
VarStore is reset and re-populated for each Game::new call. It lives in the ScratchArena so we don't reallocate the backing buffer.
Factor trait + BuiltinFactor enum
pub(crate) trait Factor {
/// Update outgoing messages and write back to the var store.
/// Returns max delta `(|Δmu|, |Δsigma|)` across writes this propagation.
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64);
/// Optional log-evidence contribution. Default: 0.0 (no contribution).
/// Only TruncFactor has a non-trivial impl.
fn log_evidence(&self, _vars: &VarStore) -> f64 {
0.0
}
}
#[derive(Debug)]
pub(crate) enum BuiltinFactor {
TeamSum(TeamSumFactor),
RankDiff(RankDiffFactor),
Trunc(TruncFactor),
}
impl Factor for BuiltinFactor {
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
match self {
Self::TeamSum(f) => f.propagate(vars),
Self::RankDiff(f) => f.propagate(vars),
Self::Trunc(f) => f.propagate(vars),
}
}
fn log_evidence(&self, vars: &VarStore) -> f64 {
match self {
Self::Trunc(f) => f.log_evidence(vars),
_ => 0.0,
}
}
}
TeamSumFactor
Computes the weighted sum of player performances into a team-perf var. Inputs (player priors with beta-noise applied) are stored at construction. Runs once per game (no iteration needed).
#[derive(Debug)]
pub(crate) struct TeamSumFactor {
/// Player performance Gaussians (pre-computed with beta noise) and their weights.
inputs: SmallVec<[(Gaussian, f64); 4]>,
/// Output: team performance variable.
out: VarId,
}
impl Factor for TeamSumFactor {
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
let perf = self
.inputs
.iter()
.fold(N00, |acc, (g, w)| acc + (*g * *w));
let old = vars.get(self.out);
vars.set(self.out, perf);
old.delta(perf)
}
}
RankDiffFactor
Maintains the constraint diff = team_a - team_b between three variables. In EP, this means:
- The marginal at
diffis the convolution of the marginals atteam_aandteam_b(in moment form: subtraction). - Outgoing messages
to_team_a,to_team_b,to_diffare stored on the factor.
In our specific algorithm (which mirrors today's code), each iteration's RankDiff propagation:
- Reads the team-perf marginals (which already incorporate
to_team_a/to_team_bfrom previous iterations and any draw factors). - Computes
diff_prior = team_a_marginal - team_b_marginal(in EP terms, this is the cavity for the diff direction). - Writes the new
diff_priorto the diff variable. - Returns delta against the previous diff value.
#[derive(Debug)]
pub(crate) struct RankDiffFactor {
pub team_a: VarId,
pub team_b: VarId,
pub diff: VarId,
}
impl Factor for RankDiffFactor {
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
let a = vars.get(self.team_a);
let b = vars.get(self.team_b);
let new_diff = a - b;
let old = vars.get(self.diff);
vars.set(self.diff, new_diff);
old.delta(new_diff)
}
}
TruncFactor
Applies the truncation constraint to a diff variable. Stores its outgoing message so the cavity computation gives the correct EP message.
#[derive(Debug)]
pub(crate) struct TruncFactor {
pub diff: VarId,
pub margin: f64,
pub tie: bool,
/// Outgoing message to diff var (stored for cavity computation).
pub msg: Gaussian,
/// Cached log-evidence contribution computed on first propagation.
evidence_cached: Option<f64>,
}
impl Factor for TruncFactor {
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
// Compute cavity (current diff marginal divided by our outgoing).
let marginal = vars.get(self.diff);
let cavity = marginal / self.msg;
// First pass: cache the evidence contribution from the cavity.
if self.evidence_cached.is_none() {
self.evidence_cached = Some(evidence_value(cavity, self.margin, self.tie));
}
// Apply the truncation approximation to the cavity.
let trunc = approx(cavity, self.margin, self.tie);
// New outgoing = trunc / cavity (so marginal = cavity * new = trunc).
let new_msg = trunc / cavity;
let old_msg = self.msg;
self.msg = new_msg;
// Update marginal: marginal_new = cavity * new_msg = trunc.
vars.set(self.diff, trunc);
old_msg.delta(new_msg)
}
fn log_evidence(&self, _vars: &VarStore) -> f64 {
// Stored as raw evidence (probability), caller computes ln() if needed.
self.evidence_cached.unwrap_or(1.0).ln()
}
}
fn evidence_value(diff: Gaussian, margin: f64, tie: bool) -> f64 {
if tie {
cdf(margin, diff.mu(), diff.sigma()) - cdf(-margin, diff.mu(), diff.sigma())
} else {
1.0 - cdf(margin, diff.mu(), diff.sigma())
}
}
This is the important departure from the current code: today's team[i].likelihood_lose and team[i].likelihood_win track three separate messages per team (one per neighbor direction × draw). In the cleaner factor-graph model, each RankDiff and Trunc factor holds its own outgoing messages. The algorithm becomes "for each iteration, propagate every factor in order, accumulate deltas." The fixed point is the same.
Caveat: this is only equivalent to today's algorithm if our schedule visits factors in an equivalent order. Today's code does forward-then-backward sweeps over diff factors, with team-perf marginals updated implicitly via posterior_win()/posterior_lose() accessors that re-multiply the priors and likelihoods. In the new code, we maintain the team-perf marginals explicitly in VarStore, with each RankDiff/Trunc/draw factor's outgoing message stored on the factor.
Schedule
// src/schedule.rs
#[derive(Debug, Clone, Copy)]
pub struct ScheduleReport {
pub iterations: usize,
pub final_step: (f64, f64),
pub converged: bool,
}
pub(crate) trait Schedule {
fn run(&self, factors: &mut [BuiltinFactor], vars: &mut VarStore) -> ScheduleReport;
}
#[derive(Debug, Clone, Copy)]
pub(crate) struct EpsilonOrMax {
pub eps: f64,
pub max: usize,
}
impl Default for EpsilonOrMax {
fn default() -> Self {
Self { eps: 1e-6, max: 10 }
}
}
impl Schedule for EpsilonOrMax {
fn run(&self, factors: &mut [BuiltinFactor], vars: &mut VarStore) -> ScheduleReport {
// Special case: TeamSum factors run exactly once (their inputs don't change).
// RankDiff + Trunc factors iterate to convergence.
// The Game builder lays out factors so all TeamSums come first, followed by
// alternating RankDiff/Trunc pairs.
let n_setup = factors
.iter()
.position(|f| !matches!(f, BuiltinFactor::TeamSum(_)))
.unwrap_or(factors.len());
// One-shot setup phase.
for f in &mut factors[..n_setup] {
f.propagate(vars);
}
let mut iterations = 0;
let mut final_step = (f64::INFINITY, f64::INFINITY);
let mut converged = false;
for _ in 0..self.max {
let mut step = (0.0_f64, 0.0_f64);
// Forward sweep through iterating factors.
for f in factors[n_setup..].iter_mut() {
let d = f.propagate(vars);
step.0 = step.0.max(d.0);
step.1 = step.1.max(d.1);
}
// Backward sweep (reverse order).
for f in factors[n_setup..].iter_mut().rev() {
let d = f.propagate(vars);
step.0 = step.0.max(d.0);
step.1 = step.1.max(d.1);
}
iterations += 1;
final_step = step;
if step.0 <= self.eps && step.1 <= self.eps {
converged = true;
break;
}
}
ScheduleReport {
iterations,
final_step,
converged,
}
}
}
Note on iteration counts: today's code does forward sweep then backward sweep within ONE while-loop iteration. The check tuple_gt(step, 1e-6) && iter < 10 happens between iterations. The schedule above does the same, so iteration counts should match.
Game refactor
Game becomes:
pub struct Game<'a, D: Drift> {
teams: Vec<Vec<Player<D>>>,
result: &'a [f64],
weights: &'a [Vec<f64>],
p_draw: f64,
pub(crate) likelihoods: Vec<Vec<Gaussian>>,
pub(crate) evidence: f64,
}
Same public surface. Internally, Game::new:
- Calls
arena.reset_for_game(...)to clear VarStore + factors + sort_buf. - Sorts teams by result (using arena.sort_buf).
- For each sorted team, creates a
team_perfvar (initial: N_INF) and aTeamSumFactorwriting to it. - For each adjacent pair, creates a
diffvar (initial: N_INF), aRankDiffFactorreading both team_perfs and writing to diff, and aTruncFactoroperating on diff. - Runs the schedule.
- Computes evidence as the product of
TruncFactor::log_evidence().exp()(in linear space, matching oldevidencefield). - Computes per-team likelihoods from the team-perf marginals and per-player likelihoods via the existing
excludelogic.
Task list
Task 1: Pre-flight verification of T0 baseline
Files:
-
Read:
benches/baseline.txt -
Step 1: Confirm tests pass on the current branch
cargo test --features approx --lib
Expected: 38 passed; 0 failed. If anything is failing, do not proceed — investigate first.
- Step 2: Capture a fresh T0 reference number for
Batch::iteration
cargo bench --bench batch 2>&1 | grep "Batch::iteration"
Expected: in the same range as benches/baseline.txt (~21 µs on Apple M5 Pro). If your hardware differs, note the local value as your "T0 reference."
- Step 3: No commit — this is a verification-only task.
Task 2: Introduce VarId and VarStore
Files:
-
Create:
src/factor/mod.rs -
Modify:
src/lib.rs(declare module) -
Step 1: Create
src/factor/mod.rswith the new types
//! Factor graph machinery for within-game inference.
use crate::gaussian::Gaussian;
/// Identifier for a variable in a `VarStore`.
///
/// Variables hold the current Gaussian marginal and are owned by exactly one
/// `VarStore`. `VarId` is meaningful only within its owning store.
#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
pub(crate) struct VarId(pub(crate) u32);
/// Flat storage of variable marginals.
///
/// Variables are allocated by `alloc()` and accessed by `VarId`. The store is
/// reused across `Game::new` calls (it lives in the `ScratchArena`); call
/// `clear()` before reuse.
#[derive(Debug, Default)]
pub(crate) struct VarStore {
marginals: Vec<Gaussian>,
}
impl VarStore {
pub(crate) fn new() -> Self {
Self::default()
}
pub(crate) fn clear(&mut self) {
self.marginals.clear();
}
pub(crate) fn len(&self) -> usize {
self.marginals.len()
}
pub(crate) fn alloc(&mut self, init: Gaussian) -> VarId {
let id = VarId(self.marginals.len() as u32);
self.marginals.push(init);
id
}
pub(crate) fn get(&self, id: VarId) -> Gaussian {
self.marginals[id.0 as usize]
}
pub(crate) fn set(&mut self, id: VarId, g: Gaussian) {
self.marginals[id.0 as usize] = g;
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::N_INF;
#[test]
fn alloc_assigns_sequential_ids() {
let mut store = VarStore::new();
let a = store.alloc(N_INF);
let b = store.alloc(N_INF);
let c = store.alloc(N_INF);
assert_eq!(a, VarId(0));
assert_eq!(b, VarId(1));
assert_eq!(c, VarId(2));
assert_eq!(store.len(), 3);
}
#[test]
fn get_returns_initial_value() {
let mut store = VarStore::new();
let g = Gaussian::from_ms(2.5, 1.0);
let id = store.alloc(g);
assert_eq!(store.get(id), g);
}
#[test]
fn set_updates_value() {
let mut store = VarStore::new();
let id = store.alloc(N_INF);
let new = Gaussian::from_ms(3.0, 0.5);
store.set(id, new);
assert_eq!(store.get(id), new);
}
#[test]
fn clear_resets_length_keeping_capacity() {
let mut store = VarStore::new();
store.alloc(N_INF);
store.alloc(N_INF);
let cap = store.marginals.capacity();
store.clear();
assert_eq!(store.len(), 0);
assert_eq!(store.marginals.capacity(), cap);
}
}
- Step 2: Declare the module in
src/lib.rs
Add this line near the other module declarations (alphabetical, after mod error):
pub(crate) mod factor;
- Step 3: Run the tests
cargo test --features approx --lib factor::tests
Expected: 4 passing tests.
- Step 4: Commit
git add src/factor/mod.rs src/lib.rs
git commit -m "$(cat <<'EOF'
feat(factor): introduce VarId and VarStore
Foundation types for the T1 factor graph machinery. VarStore is a
flat Vec<Gaussian> indexed by VarId; variables are allocated by
alloc() and the store can be cleared between games to reuse capacity.
Part of T1 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
EOF
)"
Task 3: Define the Factor trait and BuiltinFactor enum
Files:
-
Modify:
src/factor/mod.rs -
Step 1: Add the trait + enum definitions
Append to src/factor/mod.rs (after the VarStore impl and before the test module):
/// A factor in the EP graph.
///
/// Factors hold their own outgoing messages and propagate them by reading
/// connected variable marginals from a `VarStore` and writing back updated
/// marginals.
pub(crate) trait Factor {
/// Update outgoing messages and write back to the var store.
///
/// Returns the max delta `(|Δmu|, |Δsigma|)` across writes this
/// propagation. Used by the `Schedule` to detect convergence.
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64);
/// Optional log-evidence contribution. Default 0.0 (no contribution).
fn log_evidence(&self, _vars: &VarStore) -> f64 {
0.0
}
}
/// Enum dispatcher for the built-in factor types.
///
/// Using an enum instead of `Box<dyn Factor>` keeps factor data inline and
/// avoids virtual-call overhead in the hot inference loop.
#[derive(Debug)]
pub(crate) enum BuiltinFactor {
TeamSum(team_sum::TeamSumFactor),
RankDiff(rank_diff::RankDiffFactor),
Trunc(trunc::TruncFactor),
}
impl Factor for BuiltinFactor {
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
match self {
Self::TeamSum(f) => f.propagate(vars),
Self::RankDiff(f) => f.propagate(vars),
Self::Trunc(f) => f.propagate(vars),
}
}
fn log_evidence(&self, vars: &VarStore) -> f64 {
match self {
Self::Trunc(f) => f.log_evidence(vars),
_ => 0.0,
}
}
}
pub(crate) mod rank_diff;
pub(crate) mod team_sum;
pub(crate) mod trunc;
- Step 2: Create stub files for the three factor modules so the crate compiles
src/factor/team_sum.rs:
use crate::factor::{Factor, VarId, VarStore};
use crate::gaussian::Gaussian;
#[derive(Debug)]
pub(crate) struct TeamSumFactor {
pub(crate) inputs: Vec<(Gaussian, f64)>,
pub(crate) out: VarId,
}
impl Factor for TeamSumFactor {
fn propagate(&mut self, _vars: &mut VarStore) -> (f64, f64) {
unimplemented!("TeamSumFactor stub — implemented in Task 4")
}
}
src/factor/rank_diff.rs:
use crate::factor::{Factor, VarId, VarStore};
#[derive(Debug)]
pub(crate) struct RankDiffFactor {
pub(crate) team_a: VarId,
pub(crate) team_b: VarId,
pub(crate) diff: VarId,
}
impl Factor for RankDiffFactor {
fn propagate(&mut self, _vars: &mut VarStore) -> (f64, f64) {
unimplemented!("RankDiffFactor stub — implemented in Task 5")
}
}
src/factor/trunc.rs:
use crate::factor::{Factor, VarId, VarStore};
use crate::gaussian::Gaussian;
use crate::N_INF;
#[derive(Debug)]
pub(crate) struct TruncFactor {
pub(crate) diff: VarId,
pub(crate) margin: f64,
pub(crate) tie: bool,
pub(crate) msg: Gaussian,
pub(crate) evidence_cached: Option<f64>,
}
impl TruncFactor {
pub(crate) fn new(diff: VarId, margin: f64, tie: bool) -> Self {
Self {
diff,
margin,
tie,
msg: N_INF,
evidence_cached: None,
}
}
}
impl Factor for TruncFactor {
fn propagate(&mut self, _vars: &mut VarStore) -> (f64, f64) {
unimplemented!("TruncFactor stub — implemented in Task 6")
}
}
- Step 3: Verify the crate still builds (the unused stubs will warn)
cargo build --features approx
Expected: builds successfully with warnings about unimplemented methods (because nothing calls them yet).
- Step 4: Commit
git add src/factor/
git commit -m "$(cat <<'EOF'
feat(factor): introduce Factor trait and BuiltinFactor enum
Adds the trait that all factors implement and the enum dispatcher
used by the schedule to drive heterogeneous factors without dynamic
dispatch in the hot loop.
The three built-in factors (TeamSum, RankDiff, Trunc) are stubbed
out; concrete implementations follow in tasks 4-6.
EOF
)"
Task 4: Implement TeamSumFactor
Files:
-
Modify:
src/factor/team_sum.rs -
Step 1: Write the failing test
Replace the contents of src/factor/team_sum.rs with:
use crate::factor::{Factor, VarId, VarStore};
use crate::gaussian::Gaussian;
use crate::{N00, N_INF};
/// Computes the weighted sum of player performances into a team-perf var.
///
/// Inputs are pre-computed player performance Gaussians (i.e., player priors
/// already with beta² noise added via `Player::performance()`). The factor
/// runs once per game and writes the weighted sum to the output var.
#[derive(Debug)]
pub(crate) struct TeamSumFactor {
pub(crate) inputs: Vec<(Gaussian, f64)>,
pub(crate) out: VarId,
}
impl Factor for TeamSumFactor {
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
let perf = self
.inputs
.iter()
.fold(N00, |acc, (g, w)| acc + (*g * *w));
let old = vars.get(self.out);
vars.set(self.out, perf);
old.delta(perf)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn single_player_unit_weight() {
let mut vars = VarStore::new();
let out = vars.alloc(N_INF);
let g = Gaussian::from_ms(25.0, 5.0);
let mut f = TeamSumFactor {
inputs: vec![(g, 1.0)],
out,
};
f.propagate(&mut vars);
let result = vars.get(out);
// N00 + (g * 1.0) = g (in moment form)
assert!((result.mu() - 25.0).abs() < 1e-12);
assert!((result.sigma() - 5.0).abs() < 1e-12);
}
#[test]
fn two_players_summed() {
let mut vars = VarStore::new();
let out = vars.alloc(N_INF);
let g1 = Gaussian::from_ms(20.0, 3.0);
let g2 = Gaussian::from_ms(30.0, 4.0);
let mut f = TeamSumFactor {
inputs: vec![(g1, 1.0), (g2, 1.0)],
out,
};
f.propagate(&mut vars);
let result = vars.get(out);
// sum: mu = 20 + 30 = 50, var = 9 + 16 = 25, sigma = 5
assert!((result.mu() - 50.0).abs() < 1e-12);
assert!((result.sigma() - 5.0).abs() < 1e-12);
}
#[test]
fn weighted_inputs() {
let mut vars = VarStore::new();
let out = vars.alloc(N_INF);
let g = Gaussian::from_ms(10.0, 2.0);
let mut f = TeamSumFactor {
inputs: vec![(g, 2.0)],
out,
};
f.propagate(&mut vars);
let result = vars.get(out);
// g * 2.0: mu = 10*2 = 20, sigma = 2*2 = 4
assert!((result.mu() - 20.0).abs() < 1e-12);
assert!((result.sigma() - 4.0).abs() < 1e-12);
}
#[test]
fn delta_is_zero_on_repeat_propagate() {
let mut vars = VarStore::new();
let out = vars.alloc(N_INF);
let g = Gaussian::from_ms(5.0, 1.0);
let mut f = TeamSumFactor {
inputs: vec![(g, 1.0)],
out,
};
f.propagate(&mut vars);
let (dmu, dsig) = f.propagate(&mut vars);
assert!(dmu < 1e-12, "expected ~0 delta on repeat, got {}", dmu);
assert!(dsig < 1e-12);
}
}
- Step 2: Run the tests — they should fail compile because the stub uses
unimplemented!. After this rewrite they should pass.
cargo test --features approx --lib factor::team_sum
Expected: 4 passing tests.
- Step 3: Commit
git add src/factor/team_sum.rs
git commit -m "$(cat <<'EOF'
feat(factor): implement TeamSumFactor
Computes the weighted sum of player performance Gaussians into a
team-performance variable. Runs once per game (no iteration needed).
EOF
)"
Task 5: Implement RankDiffFactor
Files:
-
Modify:
src/factor/rank_diff.rs -
Step 1: Replace the stub with the implementation + tests
use crate::factor::{Factor, VarId, VarStore};
/// Maintains the constraint `diff = team_a - team_b` between three vars.
///
/// On each propagation:
/// - Reads marginals at `team_a` and `team_b` (which already incorporate any
/// incoming messages from neighboring factors).
/// - Computes `new_diff = team_a - team_b` (variance addition; see Gaussian::Sub).
/// - Writes the new marginal to `diff`.
/// - Returns the delta against the previous diff value.
///
/// This factor does NOT store an outgoing message; the diff variable is
/// effectively replaced on each propagation. The TruncFactor on the same diff
/// var holds the EP-divide message that produces the cavity.
#[derive(Debug)]
pub(crate) struct RankDiffFactor {
pub(crate) team_a: VarId,
pub(crate) team_b: VarId,
pub(crate) diff: VarId,
}
impl Factor for RankDiffFactor {
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
let a = vars.get(self.team_a);
let b = vars.get(self.team_b);
let new_diff = a - b;
let old = vars.get(self.diff);
vars.set(self.diff, new_diff);
old.delta(new_diff)
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::gaussian::Gaussian;
use crate::N_INF;
#[test]
fn diff_of_two_known_gaussians() {
let mut vars = VarStore::new();
let team_a = vars.alloc(Gaussian::from_ms(25.0, 3.0));
let team_b = vars.alloc(Gaussian::from_ms(20.0, 4.0));
let diff = vars.alloc(N_INF);
let mut f = RankDiffFactor {
team_a,
team_b,
diff,
};
f.propagate(&mut vars);
let result = vars.get(diff);
// mu = 25 - 20 = 5; var = 9 + 16 = 25; sigma = 5
assert!((result.mu() - 5.0).abs() < 1e-12);
assert!((result.sigma() - 5.0).abs() < 1e-12);
}
#[test]
fn delta_zero_on_repeat() {
let mut vars = VarStore::new();
let team_a = vars.alloc(Gaussian::from_ms(10.0, 2.0));
let team_b = vars.alloc(Gaussian::from_ms(8.0, 1.0));
let diff = vars.alloc(N_INF);
let mut f = RankDiffFactor {
team_a,
team_b,
diff,
};
f.propagate(&mut vars);
let (dmu, dsig) = f.propagate(&mut vars);
assert!(dmu < 1e-12);
assert!(dsig < 1e-12);
}
#[test]
fn delta_reflects_team_change() {
let mut vars = VarStore::new();
let team_a = vars.alloc(Gaussian::from_ms(10.0, 1.0));
let team_b = vars.alloc(Gaussian::from_ms(0.0, 1.0));
let diff = vars.alloc(N_INF);
let mut f = RankDiffFactor {
team_a,
team_b,
diff,
};
f.propagate(&mut vars);
// change team_a, repropagate; delta should be positive
vars.set(team_a, Gaussian::from_ms(15.0, 1.0));
let (dmu, _dsig) = f.propagate(&mut vars);
assert!(dmu > 4.0, "expected ~5 delta, got {}", dmu);
}
}
- Step 2: Run the tests
cargo test --features approx --lib factor::rank_diff
Expected: 3 passing tests.
- Step 3: Commit
git add src/factor/rank_diff.rs
git commit -m "$(cat <<'EOF'
feat(factor): implement RankDiffFactor
Maintains diff = team_a - team_b across three variables. On each
propagation, reads the team-perf marginals (which may have been
updated by neighboring factors) and computes the new diff via
Gaussian Sub (variance addition).
EOF
)"
Task 6: Implement TruncFactor
Files:
-
Modify:
src/factor/trunc.rs -
Modify:
src/lib.rs(exposecdf,approxto the factor module) -
Step 1: Make
cdfandapproxpub(crate)so the factor can use them
In src/lib.rs, change:
fn cdf(x: f64, mu: f64, sigma: f64) -> f64 {
to:
pub(crate) fn cdf(x: f64, mu: f64, sigma: f64) -> f64 {
approx is already pub(crate), no change there.
- Step 2: Replace the stub with the full TruncFactor
use crate::factor::{Factor, VarId, VarStore};
use crate::gaussian::Gaussian;
use crate::{N_INF, approx, cdf};
/// EP truncation factor on a diff variable.
///
/// Implements the rectified-Gaussian approximation that turns a diff
/// distribution into a "this team rank-beats that team" or "tied" likelihood.
/// Stores its outgoing message to the diff variable so the cavity computation
/// produces the correct EP message on each propagation.
#[derive(Debug)]
pub(crate) struct TruncFactor {
pub(crate) diff: VarId,
pub(crate) margin: f64,
pub(crate) tie: bool,
/// Outgoing message to the diff variable (initial: N_INF, the EP identity).
pub(crate) msg: Gaussian,
/// Cached evidence (linear, not log) computed from the cavity on first propagation.
pub(crate) evidence_cached: Option<f64>,
}
impl TruncFactor {
pub(crate) fn new(diff: VarId, margin: f64, tie: bool) -> Self {
Self {
diff,
margin,
tie,
msg: N_INF,
evidence_cached: None,
}
}
}
impl Factor for TruncFactor {
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
let marginal = vars.get(self.diff);
// Cavity: marginal divided by our outgoing message.
let cavity = marginal / self.msg;
// First-time-only: cache the evidence contribution from the cavity.
if self.evidence_cached.is_none() {
self.evidence_cached = Some(cavity_evidence(cavity, self.margin, self.tie));
}
// Apply the truncation approximation to the cavity.
let trunc = approx(cavity, self.margin, self.tie);
// New outgoing message such that cavity * new = trunc.
let new_msg = trunc / cavity;
let old_msg = self.msg;
self.msg = new_msg;
// Update the marginal.
vars.set(self.diff, trunc);
old_msg.delta(new_msg)
}
fn log_evidence(&self, _vars: &VarStore) -> f64 {
self.evidence_cached.unwrap_or(1.0).ln()
}
}
/// P(diff > margin) for non-tie, P(|diff| < margin) for tie.
fn cavity_evidence(diff: Gaussian, margin: f64, tie: bool) -> f64 {
if tie {
cdf(margin, diff.mu(), diff.sigma()) - cdf(-margin, diff.mu(), diff.sigma())
} else {
1.0 - cdf(margin, diff.mu(), diff.sigma())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn idempotent_after_convergence() {
// After enough iterations, propagate should return ~0 delta.
let mut vars = VarStore::new();
// Set diff to a typical EP value.
let diff = vars.alloc(Gaussian::from_ms(2.0, 3.0));
let mut f = TruncFactor::new(diff, 0.0, false);
// Propagate many times; delta should drop toward 0.
let mut last = (f64::INFINITY, f64::INFINITY);
for _ in 0..20 {
last = f.propagate(&mut vars);
}
assert!(last.0 < 1e-10, "expected converged delta, got {}", last.0);
assert!(last.1 < 1e-10);
}
#[test]
fn evidence_cached_on_first_propagate() {
let mut vars = VarStore::new();
let diff = vars.alloc(Gaussian::from_ms(2.0, 3.0));
let mut f = TruncFactor::new(diff, 0.0, false);
assert!(f.evidence_cached.is_none());
f.propagate(&mut vars);
assert!(f.evidence_cached.is_some());
let first = f.evidence_cached.unwrap();
// Evidence should be P(diff > 0) for diff ~ N(2, 9) ≈ 0.748
assert!(first > 0.7);
assert!(first < 0.8);
// Subsequent propagations don't change it.
f.propagate(&mut vars);
assert_eq!(f.evidence_cached.unwrap(), first);
}
#[test]
fn tie_evidence_uses_two_sided() {
let mut vars = VarStore::new();
let diff = vars.alloc(Gaussian::from_ms(0.0, 2.0));
let mut f = TruncFactor::new(diff, 1.0, true);
f.propagate(&mut vars);
// For diff ~ N(0, 4), tie=true with margin=1: P(-1 < diff < 1) ≈ 0.383
let ev = f.evidence_cached.unwrap();
assert!(ev > 0.35 && ev < 0.42);
}
}
- Step 3: Run the tests
cargo test --features approx --lib factor::trunc
Expected: 3 passing tests.
- Step 4: Commit
git add src/factor/trunc.rs src/lib.rs
git commit -m "$(cat <<'EOF'
feat(factor): implement TruncFactor with cached evidence
EP truncation factor that operates on a diff variable. Stores its
outgoing message so the cavity computation produces the correct EP
message on each propagation. The first propagation caches the
evidence contribution (cdf-bounded probability) for log_evidence().
Promotes lib::cdf to pub(crate) so the factor can use it.
EOF
)"
Task 7: Define Schedule trait and EpsilonOrMax impl
Files:
-
Create:
src/schedule.rs -
Modify:
src/lib.rs(declare module + exportScheduleReport) -
Step 1: Create
src/schedule.rs
//! Schedule trait and built-in implementations.
//!
//! A schedule drives factor propagation to convergence. The default
//! `EpsilonOrMax` performs one TeamSum sweep (setup) then alternating
//! forward/backward sweeps over the iterating factors until the max
//! delta drops below epsilon or `max` iterations is reached.
use crate::factor::{BuiltinFactor, Factor, VarStore};
/// Result returned by a `Schedule::run` call.
#[derive(Debug, Clone, Copy)]
pub struct ScheduleReport {
pub iterations: usize,
pub final_step: (f64, f64),
pub converged: bool,
}
/// Drives factor propagation to convergence.
pub(crate) trait Schedule {
fn run(&self, factors: &mut [BuiltinFactor], vars: &mut VarStore) -> ScheduleReport;
}
/// Default schedule: sweep forward then backward until step ≤ eps or iter == max.
///
/// Matches the existing `Game::likelihoods` loop bit-for-bit when given the
/// same factor layout (TeamSums first, then alternating RankDiff/Trunc pairs).
#[derive(Debug, Clone, Copy)]
pub(crate) struct EpsilonOrMax {
pub eps: f64,
pub max: usize,
}
impl Default for EpsilonOrMax {
fn default() -> Self {
// Matches today's hard-coded tolerance and iteration cap.
Self { eps: 1e-6, max: 10 }
}
}
impl Schedule for EpsilonOrMax {
fn run(&self, factors: &mut [BuiltinFactor], vars: &mut VarStore) -> ScheduleReport {
// Partition: leading run of TeamSum factors run exactly once (setup).
let n_setup = factors
.iter()
.position(|f| !matches!(f, BuiltinFactor::TeamSum(_)))
.unwrap_or(factors.len());
for f in factors[..n_setup].iter_mut() {
f.propagate(vars);
}
let mut iterations = 0;
let mut final_step = (f64::INFINITY, f64::INFINITY);
let mut converged = false;
for _ in 0..self.max {
let mut step = (0.0_f64, 0.0_f64);
// Forward sweep over iterating factors.
for f in factors[n_setup..].iter_mut() {
let d = f.propagate(vars);
step.0 = step.0.max(d.0);
step.1 = step.1.max(d.1);
}
// Backward sweep.
for f in factors[n_setup..].iter_mut().rev() {
let d = f.propagate(vars);
step.0 = step.0.max(d.0);
step.1 = step.1.max(d.1);
}
iterations += 1;
final_step = step;
if step.0 <= self.eps && step.1 <= self.eps {
converged = true;
break;
}
}
ScheduleReport {
iterations,
final_step,
converged,
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::factor::team_sum::TeamSumFactor;
use crate::gaussian::Gaussian;
use crate::N_INF;
#[test]
fn schedule_runs_setup_factors_once() {
// Single TeamSum factor; schedule should propagate it exactly once and report 0 iterations.
let mut vars = VarStore::new();
let out = vars.alloc(N_INF);
let factors = vec![BuiltinFactor::TeamSum(TeamSumFactor {
inputs: vec![(Gaussian::from_ms(5.0, 1.0), 1.0)],
out,
})];
let mut factors = factors;
let schedule = EpsilonOrMax::default();
let report = schedule.run(&mut factors, &mut vars);
assert_eq!(report.iterations, 0);
// The team-perf var should hold the sum.
let result = vars.get(out);
assert!((result.mu() - 5.0).abs() < 1e-12);
}
#[test]
fn report_marks_converged_when_step_below_eps() {
// Trivial setup-only graph: no iterating factors; should converge immediately.
let mut vars = VarStore::new();
let out = vars.alloc(N_INF);
let mut factors = vec![BuiltinFactor::TeamSum(TeamSumFactor {
inputs: vec![(Gaussian::from_ms(0.0, 1.0), 1.0)],
out,
})];
let report = EpsilonOrMax::default().run(&mut factors, &mut vars);
// No iterating factors → 0 iterations, converged = false in the strict sense
// because we never entered the loop. Documented behavior: `converged` is true
// only if the inner loop ran and saw step <= eps.
// For the no-iterating-factors case, iterations == 0 and converged == false.
assert_eq!(report.iterations, 0);
}
}
- Step 2: Declare the module + re-export in
src/lib.rs
Add near the other module declarations:
pub(crate) mod schedule;
And add to the public re-exports:
pub use schedule::ScheduleReport;
- Step 3: Run the tests
cargo test --features approx --lib schedule
Expected: 2 passing tests.
- Step 4: Commit
git add src/schedule.rs src/lib.rs
git commit -m "$(cat <<'EOF'
feat(schedule): add Schedule trait and EpsilonOrMax impl
EpsilonOrMax mirrors today's Game::likelihoods loop: sweep forward
then backward over iterating factors, capped at 10 iterations or
step <= 1e-6. Setup factors (TeamSum) run exactly once before the
loop begins.
ScheduleReport is the only public surface from this module.
EOF
)"
Task 8: Refactor Game to use the factor graph
Files:
- Modify:
src/game.rs(large rewrite oflikelihoods()) - Modify:
src/arena.rs(replace TeamMessage/DiffMessage buffers with VarStore + factors + sort_buf) - Delete:
src/message.rs(no longer needed) — actually,pub(crate)types may still be referenced; we'll let the compiler tell us if we can delete.
This is the largest task in T1. It replaces the body of Game::likelihoods with: build the factor graph from teams/result/weights/p_draw, run the schedule, then extract per-team likelihoods.
- Step 1: Rewrite
src/arena.rs— drop TeamMessage/DiffMessage buffers
use crate::factor::{BuiltinFactor, VarStore};
/// Reusable scratch buffers for `Game::likelihoods`.
///
/// A `Batch` owns one arena; all events in the slice share it across
/// the convergence iterations.
#[derive(Debug, Default)]
pub struct ScratchArena {
pub(crate) vars: VarStore,
pub(crate) factors: Vec<BuiltinFactor>,
pub(crate) sort_buf: Vec<usize>,
}
impl ScratchArena {
pub fn new() -> Self {
Self::default()
}
#[inline]
pub(crate) fn reset(&mut self) {
self.vars.clear();
self.factors.clear();
self.sort_buf.clear();
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::N_INF;
#[test]
fn reset_keeps_capacity() {
let mut arena = ScratchArena::new();
arena.vars.alloc(N_INF);
arena.sort_buf.push(42);
let var_cap = arena.vars.marginals.capacity();
let sort_cap = arena.sort_buf.capacity();
arena.reset();
assert_eq!(arena.vars.len(), 0);
assert_eq!(arena.sort_buf.len(), 0);
assert_eq!(arena.vars.marginals.capacity(), var_cap);
assert_eq!(arena.sort_buf.capacity(), sort_cap);
}
}
You'll need to make VarStore::marginals pub(crate) for the capacity check (or add a capacity() method). Do whichever you prefer; the test is non-essential if neither feels right.
- Step 2: Rewrite
Game::likelihoodsinsrc/game.rs
Replace the existing likelihoods method with:
fn likelihoods(&mut self, arena: &mut ScratchArena) {
arena.reset();
let result = self.result;
let weights = self.weights;
let teams = &self.teams;
let p_draw = self.p_draw;
// 1. Sort teams by result, descending. Use arena's sort_buf as scratch.
let n_teams = teams.len();
arena.sort_buf.clear();
arena.sort_buf.extend(0..n_teams);
arena.sort_buf.sort_by(|&i, &j| {
result[j].partial_cmp(&result[i]).unwrap_or(std::cmp::Ordering::Equal)
});
let order = &arena.sort_buf;
// 2. Allocate team-perf vars in sorted order; create one TeamSumFactor each.
let mut team_vars: SmallVec<[VarId; 8]> = SmallVec::new();
for &t_idx in order.iter() {
let var = arena.vars.alloc(N_INF);
team_vars.push(var);
let inputs: Vec<(Gaussian, f64)> = teams[t_idx]
.iter()
.zip(weights[t_idx].iter())
.map(|(p, &w)| (p.performance(), w))
.collect();
arena.factors.push(BuiltinFactor::TeamSum(TeamSumFactor {
inputs,
out: var,
}));
}
// 3. For each adjacent pair, allocate diff var and create RankDiff + Trunc factors.
let mut diff_vars: SmallVec<[VarId; 8]> = SmallVec::new();
for w in 0..n_teams.saturating_sub(1) {
let diff = arena.vars.alloc(N_INF);
diff_vars.push(diff);
arena.factors.push(BuiltinFactor::RankDiff(RankDiffFactor {
team_a: team_vars[w],
team_b: team_vars[w + 1],
diff,
}));
let tie = result[order[w]] == result[order[w + 1]];
let margin = if p_draw == 0.0 {
0.0
} else {
let beta_a: f64 = teams[order[w]].iter().map(|p| p.beta.powi(2)).sum();
let beta_b: f64 = teams[order[w + 1]].iter().map(|p| p.beta.powi(2)).sum();
compute_margin(p_draw, (beta_a + beta_b).sqrt())
};
arena.factors.push(BuiltinFactor::Trunc(TruncFactor::new(diff, margin, tie)));
}
// 4. Run the schedule.
let report = EpsilonOrMax::default().run(&mut arena.factors, &mut arena.vars);
let _ = report; // (currently unused; future API will surface this)
// 5. Compute evidence from cached TruncFactor evidences (in linear space, matching old field).
self.evidence = arena
.factors
.iter()
.map(|f| f.log_evidence(&arena.vars))
.map(f64::exp)
.product();
// 6. Per-team likelihoods. The team-perf marginal is what `team_likelihood` was.
// Map back to original (un-sorted) team order.
let m_t_ft: SmallVec<[Gaussian; 8]> = (0..n_teams)
.map(|t_idx| {
let sorted_pos = order.iter().position(|&x| x == t_idx).unwrap();
arena.vars.get(team_vars[sorted_pos])
})
.collect();
self.likelihoods = teams
.iter()
.zip(weights.iter())
.zip(m_t_ft.iter())
.map(|((p, w), &m)| {
let performance = p
.iter()
.zip(w.iter())
.fold(N00, |acc, (player, &weight)| {
acc + (player.performance() * weight)
});
p.iter()
.zip(w.iter())
.map(|(player, &w)| {
((m - performance.exclude(player.performance() * w)) * (1.0 / w))
.forget(player.beta.powi(2))
})
.collect::<Vec<_>>()
})
.collect::<Vec<_>>();
}
You'll need to update the use statements at the top of game.rs:
use crate::{
N_INF, N00,
arena::ScratchArena,
compute_margin,
drift::Drift,
factor::{BuiltinFactor, Factor, VarId},
factor::rank_diff::RankDiffFactor,
factor::team_sum::TeamSumFactor,
factor::trunc::TruncFactor,
gaussian::Gaussian,
player::Player,
schedule::{EpsilonOrMax, Schedule},
};
use smallvec::SmallVec;
You may also need to add smallvec to Cargo.toml:
[dependencies]
smallvec = "1"
(Check cargo tree to see if it's already pulled in transitively; if so, no change needed.)
- Step 3: Try to build
cargo build --features approx
Expected: the build will likely fail with several errors. Fix them iteratively. Common issues:
message.rsis no longer used — delete itsmod message;declaration inlib.rs(and the file itself) if nothing referencesTeamMessage/DiffMessage.- The
evidence()andtuple_max/tuple_gthelpers inlib.rsmay now be unused — remove them. - Some imports may be dead.
If message.rs deletion causes test failures, leave it and we'll clean up in T2. Mark this clearly with a comment.
- Step 4: Run the test suite
cargo test --features approx --lib
Expected: most tests pass. Some hardcoded golden values in game.rs::tests and history.rs::tests may drift by a few ULPs because the new schedule iterates factors in a slightly different order than today's hand-rolled loop.
For each failing test:
- Check the actual output (the test harness prints
left = ...vsright = ...). - If the difference is within
1e-6(the existing tolerance isepsilon = 1e-6), the test framework should pass it. If it doesn't, the difference is larger — investigate. - If the difference is in the last few ULPs and within
1e-5, update the golden value with a comment explaining the T1 ULP shift. - If the difference is larger than
1e-5, the implementation is wrong — review the factor logic.
- Step 5: Format and lint
cargo +nightly fmt
cargo clippy --all-targets --features approx -- -D warnings
- Step 6: Run benchmarks
cargo bench --bench batch 2>&1 | grep "Batch::iteration"
cargo bench --bench gaussian 2>&1 | grep "Gaussian::"
Expected: Batch::iteration ≤ T0 (~21.5 µs on Apple M5 Pro). Gaussian numbers unchanged. If Batch::iteration regressed by more than 5%, profile and investigate before committing.
- Step 7: Commit (the big one)
git add src/game.rs src/arena.rs src/lib.rs src/factor/ src/schedule.rs
git rm src/message.rs # only if it's no longer needed
git commit -m "$(cat <<'EOF'
refactor(game): rebuild Game::likelihoods on factor-graph machinery
Game now constructs a VarStore + Vec<BuiltinFactor> from its teams/
result/weights/p_draw and runs an EpsilonOrMax schedule to drive
inference. Public API of Game (new, posteriors, likelihoods field,
evidence field) is unchanged.
The schedule runs TeamSum factors once (setup), then alternates
forward/backward sweeps over RankDiff/Trunc factors until step <=
1e-6 or 10 iterations. Iteration counts match T0 to within ULP-bounded
floating-point drift.
ScratchArena no longer holds TeamMessage/DiffMessage buffers; it now
holds a VarStore, a Vec<BuiltinFactor>, and a Vec<usize> sort_buf.
src/message.rs deleted (TeamMessage/DiffMessage are obsolete).
EOF
)"
Task 9: Final verification and benchmark report
Files:
-
Modify:
benches/baseline.txt(append T1 numbers) -
Step 1: Run the full test suite
cargo test --features approx
cargo clippy --all-targets --features approx -- -D warnings
cargo +nightly fmt --check
All green.
- Step 2: Capture T1 benchmark numbers
cargo bench --bench batch 2>&1 | grep "Batch::iteration"
cargo bench --bench gaussian 2>&1 | grep "Gaussian::"
- Step 3: Append T1 numbers to
benches/baseline.txt
After the T0 block, add:
# After T1 (date, same hardware)
Batch::iteration <X.XX> µs (vs T0 21.253 µs: <delta>)
Gaussian::add <X.XX> ps (unchanged)
Gaussian::sub <X.XX> ps (unchanged)
Gaussian::mul <X.XX> ps (unchanged — nat-param storage)
Gaussian::div <X.XX> ps (unchanged)
Gaussian::pi <X.XX> ps (unchanged)
Gaussian::tau <X.XX> ps (unchanged)
# Notes:
# - Acceptance criterion was "no regression vs T0"; achieved <yes/no>.
# - Per-iteration counts unchanged (verified by re-running test_1vs1vs1 etc.).
# - Within-game inference is now driven by EpsilonOrMax schedule over
# BuiltinFactor enum dispatch; Game::likelihoods reduced from a hand-
# rolled EP loop to a 6-step builder + schedule.run + likelihood extraction.
- Step 4: Commit benchmark numbers
git add benches/baseline.txt
git commit -m "$(cat <<'EOF'
bench: capture T1 final numbers
Batch::iteration: <X.XX> µs (T0 was 21.253 µs)
Gaussian::* unchanged.
Acceptance: factor-graph refactor lands without regression.
Closes T1 tier.
EOF
)"
Self-review notes
Spec coverage (against docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md Section 2):
- ✅
Factortrait (Tasks 3, 4, 5, 6) - ✅
BuiltinFactorenum (Task 3) - ✅
VarStore(Task 2) - ✅
Scheduletrait +EpsilonOrMax(Task 7) - ✅ Re-implement
Game::likelihoodson top of factors (Task 8) - ✅
ScheduleReport(Task 7, exported viapub use) - ✅ Acceptance: existing tests pass, iteration counts match, no benchmark regression (Task 9)
Deferred to later tiers:
MarginFactor,SynergyFactor,ScoreFactor→ T4Damped,Residualschedules → T4 (onlyEpsilonOrMaxlands in T1)- Surfacing
ScheduleReportto callers (currently discarded inGame::likelihoods) → T2 along with the Observer trait - Factor graph for cross-history forward/backward sweep → T1 only does within-game; cross-history stays as-is
- Eliminating remaining allocations from T0 post-mortem (within_priors, Game::likelihoods output) → likely T2 with the new API surface
Things to watch during execution:
- The T1 algorithm uses true EP cavity-based message storage in
TruncFactor(themsgfield), whereas T0 used the team-side message storage (likelihood_loseetc.). Mathematically equivalent; same fixed point. Iteration counts should match within ±1. RankDiffFactordoes NOT store its own outgoing message; it always overwrites the diff variable from the team-perf marginals. This works because the only consumer of the diff var is the colocatedTruncFactor, and the schedule alternates them. If we add a third factor on the same diff var (e.g., aMarginFactorin T4),RankDiffFactorwill need to grow outgoing-message storage. Document this in the comments.EpsilonOrMax::runpartitions factors byBuiltinFactor::TeamSum. This implicitly assumes the Game builder lays out factors in the order[TeamSum*, RankDiff/Trunc alternating]. Document the contract.- The
evidencefield in Game is a linear product (matches old behavior). New Game derives it fromTruncFactor::log_evidence().exp().product(). Floating point ofexp(ln(x))may drift by a ULP from the original direct product; verify against existing test goldens. message.rsdeletion may have cross-file consequences if anything outsidegame.rsimportsTeamMessageorDiffMessage. Rungrep -rn 'TeamMessage\|DiffMessage' src/before deleting.cargo +nightly fmt(per user preference); nightly enablesimports_granularityandgroup_importsfromrustfmt.toml.