Files

Anders Olsson 8b53cacd64 T4 (MarginFactor): scored outcomes via Gaussian-margin EP evidence

Adds soft Gaussian-observation evidence on the per-pair diff variable,
enabling continuous score margins as a richer alternative to ranks.

Public API:
- `Outcome::Scored([scores])` (non-breaking enum extension under
  `#[non_exhaustive]`).
- `Game::scored(teams, outcome, options)` constructor parallel to
  `Game::ranked`.
- `EventBuilder::scores([...])` fluent helper.
- `HistoryBuilder::score_sigma(σ)` knob (default 1.0, validated > 0).
- `GameOptions::score_sigma`.
- `EventKind` re-exported from `lib.rs` (annotated `#[non_exhaustive]`).
- New `InferenceError::InvalidParameter { name, value }` variant.

Internals:
- `MarginFactor` (`factor/margin.rs`): Gaussian observation factor that
  closes in one EP step; cavity-cached log-evidence mirrors `TruncFactor`.
- `BuiltinFactor::Margin` dispatch arm.
- `DiffFactor` enum in `game.rs` lets `Game::likelihoods` and the new
  `likelihoods_scored` share the per-pair link abstraction.
- Per-event `EventKind { Ranked, Scored { score_sigma } }` routed through
  `TimeSlice::add_events`, `iteration_direct`, and `log_evidence`.

Tests: 88 lib + 27 integration (4 new in `tests/scored.rs`); existing
goldens byte-identical.  Bench: `benches/scored.rs` baseline ~960µs for
60 events × 20-player pool with default convergence.

Plan: docs/superpowers/plans/2026-04-27-t4-margin-factor.md
Spec item marked Done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-27 08:47:36 +02:00

64 KiB

Raw Blame History

T4 — MarginFactor + Outcome::Scored Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add a MarginFactor (Gaussian observation factor on a diff variable) and an Outcome::Scored(scores) variant, so users can supply continuous per-team scores instead of just ranks. Per-pair score margins become soft EP evidence about the latent performance diff.

Architecture:

Sort scored teams by score descending; for each adjacent pair compute m_obs = score_higher − score_lower ≥ 0. Per pair: RankDiffFactor writes diff = team_a − team_b, then a MarginFactor multiplies in the Gaussian observation N(m_obs, score_sigma²). This replaces the TruncFactor for scored outcomes; ranked outcomes are unchanged.
A new internal enum DiffFactor { Trunc(TruncFactor), Margin(MarginFactor) } lets Game::likelihoods keep its single hand-rolled forward/backward sweep loop while dispatching the per-diff factor by enum.
score_sigma is configurable on GameOptions and HistoryBuilder (default 1.0).
Outcome is already #[non_exhaustive], so adding Scored is non-breaking for downstream match arms.

Tech Stack: Rust 2024, smallvec, rayon (already in tree). No new crate dependencies.

File Structure

Path	Status	Responsibility
`src/factor/margin.rs`	create	`MarginFactor` struct + `Factor` impl + cavity-cached evidence + unit tests
`src/factor/mod.rs`	modify	`pub mod margin;`, `BuiltinFactor::Margin(...)` variant + dispatch arms
`src/factors.rs`	modify	re-export `MarginFactor`
`src/outcome.rs`	modify	`Outcome::Scored(SmallVec<[f64; 4]>)` variant, `scores()` ctor, `as_scores()` accessor, `team_count` arm
`src/game.rs`	modify	`pub(crate) enum DiffFactor`, scored path in `likelihoods`, `Game::scored()` ctor, `GameOptions::score_sigma`
`src/event_builder.rs`	modify	`.scores([...])` builder method
`src/history.rs`	modify	match `Outcome::Scored` in `add_events`; `HistoryBuilder::score_sigma`; new internal `add_events_scored_with_prior` (or extra arg)
`tests/scored.rs`	create	end-to-end Scored integration tests
`examples/scored.rs`	create	worked example using `Outcome::Scored`
`benches/scored.rs`	create	criterion benchmark mirroring `batch.rs` with scored events
`CLAUDE.md`	modify	mark T4-MarginFactor complete in the architecture notes

Background — math the implementer needs

For a diff variable D with current marginal D_marg, the MarginFactor models an observation m_obs ~ N(D, σ²) where σ = score_sigma. Standard EP for a Gaussian-likelihood factor:

Cavity: D_cav = D_marg / msg (where msg is this factor's stored outgoing message; init N_INF so the first cavity = the current marginal).
Tilted distribution: D_cav · N(m_obs, σ²) — a product of two Gaussians; closed-form, no approximation needed (so it converges in one propagation).
New marginal: the tilted distribution.
New outgoing message: new_msg = new_marginal / D_cav. Because the tilted distribution is exact, new_msg = N(m_obs, σ²) (a constant in m_obs and σ).
Cavity evidence: Z_cav = pdf(m_obs; D_cav.mu(), sqrt(D_cav.sigma()² + σ²)) (the marginal likelihood of m_obs under the cavity). Cache on first propagate, identical to TruncFactor's pattern. log_evidence = Z_cav.ln().

Practical consequence: MarginFactor::propagate returns a non-zero delta on its first call (because msg jumps from N_INF to N(m_obs, σ²)) and exactly zero afterwards, since new_msg is a constant.

A Gaussian N(m, σ) constructed via Gaussian::from_ms(m, σ). Multiplication adds nat-params (pi += other.pi; tau += other.tau). Division subtracts. The pdf(x, mu, sigma) helper already exists in lib.rs (private, but importable as crate::pdf).

Concrete numerical check for tests: With cavity N(0, 6) and observation m_obs=5, σ=1:

D_cav.pi = 1/36 ≈ 0.027778, D_cav.tau = 0.
New marginal: pi = 0.027778 + 1 = 1.027778, tau = 0 + 5 = 5. So mu = 5 / 1.027778 ≈ 4.864865, sigma = 1/sqrt(1.027778) ≈ 0.986394.
Z_cav = pdf(5, 0, sqrt(36 + 1)) = pdf(5, 0, sqrt(37)) ≈ 0.046827. So log_evidence ≈ -3.0613.

Task 1: `MarginFactor` core (file + struct + Factor impl + unit tests)

Files:

Create: src/factor/margin.rs
Modify: src/factor/mod.rs:100-102 (add pub mod margin; next to the existing pub mod lines)
Step 1: Add the module declaration so the new file compiles

In src/factor/mod.rs, find the existing block:

pub mod rank_diff;
pub mod team_sum;
pub mod trunc;

Replace with:

pub mod margin;
pub mod rank_diff;
pub mod team_sum;
pub mod trunc;

Step 2: Create src/factor/margin.rs with the failing tests first

use crate::{
    N_INF, cdf, pdf,
    factor::{Factor, VarId, VarStore},
    gaussian::Gaussian,
};

/// Gaussian observation factor on a diff variable.
///
/// Encodes the soft evidence `m_obs ~ N(diff, sigma²)`. The outgoing message
/// to `diff` is the constant `N(m_obs, sigma²)`, so this factor converges in a
/// single propagation: subsequent calls return a zero delta.
#[derive(Debug)]
pub struct MarginFactor {
    pub diff: VarId,
    pub m_obs: f64,
    pub sigma: f64,
    pub(crate) msg: Gaussian,
    pub(crate) evidence_cached: Option<f64>,
}

impl MarginFactor {
    pub fn new(diff: VarId, m_obs: f64, sigma: f64) -> Self {
        debug_assert!(sigma > 0.0, "score sigma must be positive");
        Self {
            diff,
            m_obs,
            sigma,
            msg: N_INF,
            evidence_cached: None,
        }
    }
}

impl Factor for MarginFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        let marginal = vars.get(self.diff);
        let cavity = marginal / self.msg;

        if self.evidence_cached.is_none() {
            self.evidence_cached = Some(cavity_evidence(cavity, self.m_obs, self.sigma));
        }

        let new_msg = Gaussian::from_ms(self.m_obs, self.sigma);
        let new_marginal = cavity * new_msg;
        let old_msg = self.msg;
        self.msg = new_msg;
        vars.set(self.diff, new_marginal);

        old_msg.delta(new_msg)
    }

    fn log_evidence(&self, _vars: &VarStore) -> f64 {
        self.evidence_cached.unwrap_or(1.0).ln()
    }
}

fn cavity_evidence(cavity: Gaussian, m_obs: f64, sigma: f64) -> f64 {
    let combined_sigma = (cavity.sigma().powi(2) + sigma.powi(2)).sqrt();
    pdf(m_obs, cavity.mu(), combined_sigma)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn first_propagate_writes_tilted_marginal() {
        let mut vars = VarStore::new();
        let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
        let mut f = MarginFactor::new(diff, 5.0, 1.0);

        f.propagate(&mut vars);

        let result = vars.get(diff);
        // pi = 1/36 + 1 ≈ 1.027778; tau = 0 + 5 = 5
        // mu = 5 / 1.027778 ≈ 4.864865; sigma = 1/sqrt(1.027778) ≈ 0.986394
        assert!((result.mu() - 4.864864864864865).abs() < 1e-12);
        assert!((result.sigma() - 0.986393923832144).abs() < 1e-12);
    }

    #[test]
    fn converges_in_one_step() {
        let mut vars = VarStore::new();
        let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
        let mut f = MarginFactor::new(diff, 5.0, 1.0);

        f.propagate(&mut vars);
        let (dmu, dsig) = f.propagate(&mut vars);
        assert!(dmu < 1e-12, "expected ~0 delta on second propagate, got {dmu}");
        assert!(dsig < 1e-12);
    }

    #[test]
    fn evidence_cached_on_first_propagate() {
        let mut vars = VarStore::new();
        let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
        let mut f = MarginFactor::new(diff, 5.0, 1.0);
        assert!(f.evidence_cached.is_none());

        f.propagate(&mut vars);
        let z = f.evidence_cached.unwrap();
        // pdf(5, 0, sqrt(37)) ≈ 0.046827
        assert!((z - 0.04682752233851171).abs() < 1e-10);

        // Subsequent propagations don't change it.
        f.propagate(&mut vars);
        assert_eq!(f.evidence_cached.unwrap(), z);
    }

    #[test]
    fn log_evidence_matches_cached_ln() {
        let mut vars = VarStore::new();
        let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
        let mut f = MarginFactor::new(diff, 5.0, 1.0);
        f.propagate(&mut vars);
        let logz = f.log_evidence(&vars);
        assert!((logz - (-3.061357379815869)).abs() < 1e-10);
    }

    // Silence unused-import warning for cdf until/if a tie-band variant is added.
    #[allow(dead_code)]
    fn _cdf_smoke() -> f64 {
        cdf(0.0, 0.0, 1.0)
    }
}

Note: the unused cdf import keeps parity with trunc.rs style and reserves the spot if a tie-band MarginFactor variant gets added later. If you'd rather drop it, remove the cdf from the import list and delete _cdf_smoke.

Step 3: Run the new tests to verify they pass once added (after Step 2 they will pass; this step is the guard)

Run: cargo test --lib factor::margin

Expected: 4 passed.

Step 4: Verify the module compiles cleanly with no warnings

Run: cargo build and cargo clippy --lib -- -D warnings

Expected: no warnings, no errors.

Step 5: Format and commit

cargo +nightly fmt
git add src/factor/margin.rs src/factor/mod.rs
git commit -m "feat(factor): add MarginFactor for scored-margin EP evidence"

Task 2: Wire `MarginFactor` into `BuiltinFactor` enum dispatch

Files:

Modify: src/factor/mod.rs:76-98 (the BuiltinFactor enum and its Factor impl)
Modify: src/factors.rs:7-13 (the public re-export list)
Step 1: Write a failing dispatch test in src/factor/mod.rs

Open src/factor/mod.rs. Inside the existing #[cfg(test)] mod tests { ... } block (around line 105), add:

#[test]
fn builtin_factor_dispatches_to_margin() {
    use super::margin::MarginFactor;
    let mut vars = VarStore::new();
    let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
    let mut f = BuiltinFactor::Margin(MarginFactor::new(diff, 5.0, 1.0));

    f.propagate(&mut vars);

    let result = vars.get(diff);
    assert!((result.mu() - 4.864864864864865).abs() < 1e-12);

    let logz = f.log_evidence(&vars);
    assert!((logz - (-3.061357379815869)).abs() < 1e-10);
}

Step 2: Run the test to verify it fails

Run: cargo test --lib factor::tests::builtin_factor_dispatches_to_margin

Expected: FAIL with no variant named Margin found for enum BuiltinFactor.

Step 3: Add the enum variant + Factor impl arms

Replace the current BuiltinFactor definition and its Factor impl (currently src/factor/mod.rs:76-98):

/// Enum dispatcher for the built-in factor types.
///
/// Using an enum instead of `Box<dyn Factor>` keeps factor data inline and
/// avoids virtual-call overhead in the hot inference loop.
#[derive(Debug)]
pub enum BuiltinFactor {
    TeamSum(team_sum::TeamSumFactor),
    RankDiff(rank_diff::RankDiffFactor),
    Trunc(trunc::TruncFactor),
    Margin(margin::MarginFactor),
}

impl Factor for BuiltinFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        match self {
            Self::TeamSum(f) => f.propagate(vars),
            Self::RankDiff(f) => f.propagate(vars),
            Self::Trunc(f) => f.propagate(vars),
            Self::Margin(f) => f.propagate(vars),
        }
    }

    fn log_evidence(&self, vars: &VarStore) -> f64 {
        match self {
            Self::Trunc(f) => f.log_evidence(vars),
            Self::Margin(f) => f.log_evidence(vars),
            _ => 0.0,
        }
    }
}

Step 4: Re-export MarginFactor from src/factors.rs

Replace the body of src/factors.rs (lines 7-13) with:

pub use crate::{
    factor::{
        BuiltinFactor, Factor, VarId, VarStore, margin::MarginFactor,
        rank_diff::RankDiffFactor, team_sum::TeamSumFactor, trunc::TruncFactor,
    },
    schedule::{EpsilonOrMax, Schedule, ScheduleReport},
};

Step 5: Run the test to verify it passes

Run: cargo test --lib factor::tests::builtin_factor_dispatches_to_margin

Expected: PASS.

Step 6: Run the full lib test suite to confirm no regressions

Run: cargo test --lib

Expected: all tests pass (current count + 5 new from Tasks 1–2).

Step 7: Format and commit

cargo +nightly fmt
git add src/factor/mod.rs src/factors.rs
git commit -m "feat(factor): dispatch MarginFactor through BuiltinFactor enum"

Task 3: Add `Outcome::Scored` variant and accessors

Files:

Modify: src/outcome.rs
Step 1: Write failing tests in src/outcome.rs

Add to the existing #[cfg(test)] mod tests { ... } block (after winner_out_of_range_panics, around line 86):

#[test]
fn scored_two_teams() {
    let o = Outcome::scores([10.0, 4.0]);
    assert_eq!(o.team_count(), 2);
    assert_eq!(o.as_scores(), Some(&[10.0, 4.0][..]));
    assert_eq!(o.as_ranks(), None);
}

#[test]
fn scored_team_count_matches_input() {
    let o = Outcome::scores([3.0, 1.0, 2.0, 0.0]);
    assert_eq!(o.team_count(), 4);
}

#[test]
fn ranked_as_scores_returns_none() {
    let o = Outcome::winner(0, 2);
    assert!(o.as_scores().is_none());
    assert!(o.as_ranks().is_some());
}

Step 2: Run the tests to verify they fail

Run: cargo test --lib outcome::tests

Expected: FAIL — no function or associated item named scores found, etc.

Step 3: Implement the Scored variant and helpers

Replace the body of src/outcome.rs with:

//! Outcome of a match.
//!
//! `Ranked(ranks)` for ordinal results; `Scored(scores)` for continuous
//! per-team scores (engages `MarginFactor` in the engine).

use smallvec::SmallVec;

/// Final outcome of a match.
///
/// `Ranked(ranks)`: lower rank = better. Equal ranks mean a tie between those
/// teams. `ranks.len()` must equal the number of teams in the event.
///
/// `Scored(scores)`: higher score = better. Adjacent (sorted) pairs feed
/// observed margins to `MarginFactor`. `scores.len()` must equal the number
/// of teams in the event.
#[derive(Clone, Debug, PartialEq)]
#[non_exhaustive]
pub enum Outcome {
    Ranked(SmallVec<[u32; 4]>),
    Scored(SmallVec<[f64; 4]>),
}

impl Outcome {
    /// `n`-team outcome where team `winner` won and everyone else tied for last.
    ///
    /// Panics if `winner >= n`.
    pub fn winner(winner: u32, n: u32) -> Self {
        assert!(winner < n, "winner index {winner} out of range 0..{n}");
        let ranks: SmallVec<[u32; 4]> = (0..n).map(|i| if i == winner { 0 } else { 1 }).collect();
        Self::Ranked(ranks)
    }

    /// All `n` teams tied.
    pub fn draw(n: u32) -> Self {
        Self::Ranked(SmallVec::from_vec(vec![0; n as usize]))
    }

    /// Explicit per-team ranking.
    pub fn ranking<I: IntoIterator<Item = u32>>(ranks: I) -> Self {
        Self::Ranked(ranks.into_iter().collect())
    }

    /// Explicit per-team continuous scores; higher = better.
    pub fn scores<I: IntoIterator<Item = f64>>(scores: I) -> Self {
        Self::Scored(scores.into_iter().collect())
    }

    pub fn team_count(&self) -> usize {
        match self {
            Self::Ranked(r) => r.len(),
            Self::Scored(s) => s.len(),
        }
    }

    pub(crate) fn as_ranks(&self) -> Option<&[u32]> {
        match self {
            Self::Ranked(r) => Some(r),
            Self::Scored(_) => None,
        }
    }

    pub(crate) fn as_scores(&self) -> Option<&[f64]> {
        match self {
            Self::Scored(s) => Some(s),
            Self::Ranked(_) => None,
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn winner_two_teams() {
        let o = Outcome::winner(0, 2);
        assert_eq!(o.as_ranks(), Some(&[0u32, 1][..]));
        assert_eq!(o.team_count(), 2);
    }

    #[test]
    fn winner_three_teams_second_wins() {
        let o = Outcome::winner(1, 3);
        assert_eq!(o.as_ranks(), Some(&[1u32, 0, 1][..]));
    }

    #[test]
    fn draw_three_teams() {
        let o = Outcome::draw(3);
        assert_eq!(o.as_ranks(), Some(&[0u32, 0, 0][..]));
    }

    #[test]
    fn ranking_from_iter() {
        let o = Outcome::ranking([2, 0, 1]);
        assert_eq!(o.as_ranks(), Some(&[2u32, 0, 1][..]));
    }

    #[test]
    #[should_panic(expected = "winner index 2 out of range")]
    fn winner_out_of_range_panics() {
        let _ = Outcome::winner(2, 2);
    }

    #[test]
    fn scored_two_teams() {
        let o = Outcome::scores([10.0, 4.0]);
        assert_eq!(o.team_count(), 2);
        assert_eq!(o.as_scores(), Some(&[10.0, 4.0][..]));
        assert_eq!(o.as_ranks(), None);
    }

    #[test]
    fn scored_team_count_matches_input() {
        let o = Outcome::scores([3.0, 1.0, 2.0, 0.0]);
        assert_eq!(o.team_count(), 4);
    }

    #[test]
    fn ranked_as_scores_returns_none() {
        let o = Outcome::winner(0, 2);
        assert!(o.as_scores().is_none());
        assert!(o.as_ranks().is_some());
    }
}

Note: the existing as_ranks returned &[u32] and was #[allow(dead_code)]. The new signature returns Option<&[u32]> because Ranked is no longer the only variant. All in-tree call sites that used as_ranks() (we'll update them in later tasks) must now handle the Option.

Step 4: Run the outcome tests to verify they pass

Run: cargo test --lib outcome

Expected: 8 passed.

Step 5: Update existing call sites to handle the new Option<&[u32]> return

Two call sites use as_ranks() today. Update each to expect Option:

In src/history.rs:672, change:

let ranks = ev.outcome.as_ranks();
if ranks.len() != ev.teams.len() {

to:

let ranks = match ev.outcome.as_ranks() {
    Some(r) => r,
    None => {
        // Scored path will be wired in Task 7; for now it's an error.
        return Err(InferenceError::MismatchedShape {
            kind: "outcome variant",
            expected: 0,
            got: 0,
        });
    }
};
if ranks.len() != ev.teams.len() {

In src/history.rs:701, change:

let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
let inverted: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();

(no change needed — ranks is already &[u32] here).

In src/game.rs:312, change:

let ranks = outcome.as_ranks();
let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
let result: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();

to:

let ranks = outcome.as_ranks().ok_or(crate::InferenceError::MismatchedShape {
    kind: "Game::ranked requires Outcome::Ranked",
    expected: 0,
    got: 0,
})?;
let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
let result: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();

Step 6: Verify the full lib still compiles and tests pass

Run: cargo test --lib

Expected: all tests pass (call sites updated cleanly).

Step 7: Format and commit

cargo +nightly fmt
git add src/outcome.rs src/history.rs src/game.rs
git commit -m "feat(outcome): add Scored variant; switch as_ranks/as_scores to Option"

Task 4: Internal `DiffFactor` enum to dispatch Trunc vs Margin per-pair

Files:

Modify: src/game.rs (top of file, before Game impl)
Step 1: Write a failing test in src/game.rs's test module

In the #[cfg(test)] mod tests { ... } block at the bottom of src/game.rs, add (after test_2vs2_weighted):

#[test]
fn diff_factor_dispatch_trunc_and_margin() {
    use crate::factor::{margin::MarginFactor, trunc::TruncFactor, VarStore};
    use super::DiffFactor;

    let mut vars = VarStore::new();
    let dt = vars.alloc(Gaussian::from_ms(0.0, 6.0));
    let dm = vars.alloc(Gaussian::from_ms(0.0, 6.0));

    let mut t = DiffFactor::Trunc(TruncFactor::new(dt, 0.0, false));
    let mut m = DiffFactor::Margin(MarginFactor::new(dm, 5.0, 1.0));

    let _ = t.propagate(&mut vars);
    let _ = m.propagate(&mut vars);

    // Smoke: both diffs got written; their msgs are non-N_INF.
    assert!(t.msg().pi() > 0.0);
    assert!(m.msg().pi() > 0.0);
    assert_eq!(t.diff(), dt);
    assert_eq!(m.diff(), dm);
}

Step 2: Run the test to verify it fails

Run: cargo test --lib game::tests::diff_factor_dispatch_trunc_and_margin

Expected: FAIL — cannot find type DiffFactor in this scope.

Step 3: Add the DiffFactor enum at the top of src/game.rs

Insert after the existing use block (around line 14, before pub struct GameOptions):

use crate::factor::margin::MarginFactor;

/// Per-adjacent-pair link factor in the game's diff chain.
///
/// `Trunc` is used for `Outcome::Ranked` (rank-based truncation).
/// `Margin` is used for `Outcome::Scored` (Gaussian observation on the diff).
#[derive(Debug)]
pub(crate) enum DiffFactor {
    Trunc(TruncFactor),
    Margin(MarginFactor),
}

impl DiffFactor {
    pub(crate) fn diff(&self) -> crate::factor::VarId {
        match self {
            Self::Trunc(f) => f.diff,
            Self::Margin(f) => f.diff,
        }
    }

    pub(crate) fn msg(&self) -> Gaussian {
        match self {
            Self::Trunc(f) => f.msg,
            Self::Margin(f) => f.msg,
        }
    }

    pub(crate) fn evidence(&self) -> f64 {
        match self {
            Self::Trunc(f) => f.evidence_cached.unwrap_or(1.0),
            Self::Margin(f) => f.evidence_cached.unwrap_or(1.0),
        }
    }

    pub(crate) fn propagate(&mut self, vars: &mut crate::factor::VarStore) -> (f64, f64) {
        use crate::factor::Factor;
        match self {
            Self::Trunc(f) => f.propagate(vars),
            Self::Margin(f) => f.propagate(vars),
        }
    }
}

Step 4: Refactor Game::likelihoods to drive Vec<DiffFactor> instead of Vec<TruncFactor>

This is a mechanical rename inside Game::likelihoods (currently src/game.rs:135-273). The loop logic is unchanged; we just move the per-pair object behind the enum. Replace the body of Game::likelihoods from where let mut trunc: Vec<TruncFactor> = ... is constructed (around line 160) to its last use (around line 243):

        // One DiffFactor per adjacent sorted-team pair; each owns a diff VarId.
        let mut links: Vec<DiffFactor> = (0..n_diffs)
            .map(|i| {
                let tie = self.result[arena.sort_buf[i]] == self.result[arena.sort_buf[i + 1]];
                let margin = if self.p_draw == 0.0 {
                    0.0
                } else {
                    let a: f64 = self.teams[arena.sort_buf[i]]
                        .iter()
                        .map(|p| p.beta.powi(2))
                        .sum();
                    let b: f64 = self.teams[arena.sort_buf[i + 1]]
                        .iter()
                        .map(|p| p.beta.powi(2))
                        .sum();
                    compute_margin(self.p_draw, (a + b).sqrt())
                };
                let vid = arena.vars.alloc(N_INF);
                DiffFactor::Trunc(TruncFactor::new(vid, margin, tie))
            })
            .collect();

        // Per-team messages from neighbouring RankDiff factors (replaces TeamMessage).
        arena.lhood_lose.resize(n_teams, N_INF);
        arena.lhood_win.resize(n_teams, N_INF);

        let mut step = (f64::INFINITY, f64::INFINITY);
        let mut iter = 0;

        while tuple_gt(step, 1e-6) && iter < 10 {
            step = (0.0_f64, 0.0_f64);

            for (e, lf) in links[..n_diffs.saturating_sub(1)].iter_mut().enumerate() {
                let pw = arena.team_prior[e] * arena.lhood_lose[e];
                let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
                let raw = pw - pl;
                arena.vars.set(lf.diff(), raw * lf.msg());
                let d = lf.propagate(&mut arena.vars);
                step = tuple_max(step, d);

                let new_ll = pw - lf.msg();
                step = tuple_max(step, arena.lhood_lose[e + 1].delta(new_ll));
                arena.lhood_lose[e + 1] = new_ll;
            }

            for (rev_i, lf) in links[1..].iter_mut().rev().enumerate() {
                let e = n_diffs - 1 - rev_i;
                let pw = arena.team_prior[e] * arena.lhood_lose[e];
                let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
                let raw = pw - pl;
                arena.vars.set(lf.diff(), raw * lf.msg());
                let d = lf.propagate(&mut arena.vars);
                step = tuple_max(step, d);

                let new_lw = pl + lf.msg();
                step = tuple_max(step, arena.lhood_win[e].delta(new_lw));
                arena.lhood_win[e] = new_lw;
            }

            iter += 1;
        }

        if n_diffs == 1 {
            let raw = (arena.team_prior[0] * arena.lhood_lose[0])
                - (arena.team_prior[1] * arena.lhood_win[1]);
            arena.vars.set(links[0].diff(), raw * links[0].msg());
            links[0].propagate(&mut arena.vars);
        }

        if n_diffs > 0 {
            let pl1 = arena.team_prior[1] * arena.lhood_win[1];
            arena.lhood_win[0] = pl1 + links[0].msg();
            let pw_last = arena.team_prior[n_teams - 2] * arena.lhood_lose[n_teams - 2];
            arena.lhood_lose[n_teams - 1] = pw_last - links[n_diffs - 1].msg();
        }

        self.evidence = links.iter().map(|l| l.evidence()).product();

(Everything below the evidence line is unchanged.) Also remove the now-unused use crate::factor::trunc::TruncFactor; from the file's top imports if it becomes unused — but we still construct TruncFactor directly above, so it stays.

Step 5: Run the full lib test suite to verify the refactor preserves all golden values

Run: cargo test --lib

Expected: all tests pass with identical assertions — this is a pure refactor.

Step 6: Run the integration tests

Run: cargo test

Expected: all pass.

Step 7: Format and commit

cargo +nightly fmt
git add src/game.rs
git commit -m "refactor(game): dispatch per-diff link factors via DiffFactor enum"

Task 5: Add `score_sigma` to `GameOptions` and the scored path in `Game::likelihoods`

Files:

Modify: src/game.rs
Step 1: Write a failing test for the scored path

In src/game.rs's test module, after the new dispatch test from Task 4, add:

#[test]
fn scored_path_sharper_when_margin_is_large() {
    // Same prior on both sides; large positive observed margin should pull
    // team A above team B.
    let prior = R::new(
        Gaussian::from_ms(25.0, 25.0 / 3.0),
        25.0 / 6.0,
        ConstantDrift(25.0 / 300.0),
    );
    let teams = vec![vec![prior], vec![prior]];
    let result = vec![10.0, 0.0]; // a beat b by 10
    let weights = [vec![1.0], vec![1.0]];
    let mut arena = ScratchArena::new();
    let g = Game::scored_with_arena(
        teams,
        &result,
        &weights,
        1.0, // score_sigma
        &mut arena,
    );
    let p = g.posteriors();
    let a = p[0][0];
    let b = p[1][0];
    assert!(a.mu() > b.mu(), "expected team a posterior mu > team b; got {} vs {}", a.mu(), b.mu());

    // Tighter score_sigma should produce a stronger update.
    let mut arena2 = ScratchArena::new();
    let g_tight = Game::scored_with_arena(
        vec![vec![prior], vec![prior]],
        &result,
        &weights,
        0.1, // tighter score_sigma
        &mut arena2,
    );
    let p_tight = g_tight.posteriors();
    let a_tight = p_tight[0][0];
    assert!(a_tight.mu() > a.mu(), "expected tighter sigma to push posterior further; {} vs {}", a_tight.mu(), a.mu());
}

Step 2: Run the test to verify it fails

Run: cargo test --lib game::tests::scored_path_sharper_when_margin_is_large

Expected: FAIL — no function or associated item named scored_with_arena.

Step 3: Add score_sigma to GameOptions

Replace the GameOptions definition (around src/game.rs:15-28):

#[derive(Clone, Copy, Debug)]
pub struct GameOptions {
    pub p_draw: f64,
    pub score_sigma: f64,
    pub convergence: crate::ConvergenceOptions,
}

impl Default for GameOptions {
    fn default() -> Self {
        Self {
            p_draw: crate::P_DRAW,
            score_sigma: 1.0,
            convergence: crate::ConvergenceOptions::default(),
        }
    }
}

Step 4: Add Game::scored_with_arena and friends

In Game<'a, T, D>'s impl block (the one with ranked_with_arena, around src/game.rs:90-133), add a new method right after ranked_with_arena:

    pub(crate) fn scored_with_arena(
        teams: Vec<Vec<Rating<T, D>>>,
        scores: &'a [f64],
        weights: &'a [Vec<f64>],
        score_sigma: f64,
        arena: &mut ScratchArena,
    ) -> Self {
        debug_assert!(
            scores.len() == teams.len(),
            "scores must have the same length as teams"
        );
        debug_assert!(
            weights
                .iter()
                .zip(teams.iter())
                .all(|(w, t)| w.len() == t.len()),
            "weights must have the same dimensions as teams"
        );
        debug_assert!(score_sigma > 0.0, "score_sigma must be positive");

        let mut this = Self {
            teams,
            result: scores,
            weights,
            p_draw: 0.0,
            likelihoods: Vec::new(),
            evidence: 0.0,
        };

        this.likelihoods_scored(arena, score_sigma);
        this
    }

Step 5: Add likelihoods_scored (parallel to likelihoods)

Right after fn likelihoods (around line 273), add:

    fn likelihoods_scored(&mut self, arena: &mut ScratchArena, score_sigma: f64) {
        arena.reset();

        let n_teams = self.teams.len();

        arena.sort_buf.extend(0..n_teams);
        arena.sort_buf.sort_by(|&i, &j| {
            self.result[j]
                .partial_cmp(&self.result[i])
                .unwrap_or(Ordering::Equal)
        });

        arena.team_prior.extend(arena.sort_buf.iter().map(|&t| {
            self.teams[t]
                .iter()
                .zip(self.weights[t].iter())
                .fold(N00, |p, (player, &w)| p + (player.performance() * w))
        }));

        let n_diffs = n_teams.saturating_sub(1);

        // One MarginFactor per adjacent sorted-team pair, observed m_obs ≥ 0.
        let mut links: Vec<DiffFactor> = (0..n_diffs)
            .map(|i| {
                let m_obs = self.result[arena.sort_buf[i]] - self.result[arena.sort_buf[i + 1]];
                let vid = arena.vars.alloc(N_INF);
                DiffFactor::Margin(MarginFactor::new(vid, m_obs, score_sigma))
            })
            .collect();

        arena.lhood_lose.resize(n_teams, N_INF);
        arena.lhood_win.resize(n_teams, N_INF);

        let mut step = (f64::INFINITY, f64::INFINITY);
        let mut iter = 0;

        while tuple_gt(step, 1e-6) && iter < 10 {
            step = (0.0_f64, 0.0_f64);

            for (e, lf) in links[..n_diffs.saturating_sub(1)].iter_mut().enumerate() {
                let pw = arena.team_prior[e] * arena.lhood_lose[e];
                let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
                let raw = pw - pl;
                arena.vars.set(lf.diff(), raw * lf.msg());
                let d = lf.propagate(&mut arena.vars);
                step = tuple_max(step, d);

                let new_ll = pw - lf.msg();
                step = tuple_max(step, arena.lhood_lose[e + 1].delta(new_ll));
                arena.lhood_lose[e + 1] = new_ll;
            }

            for (rev_i, lf) in links[1..].iter_mut().rev().enumerate() {
                let e = n_diffs - 1 - rev_i;
                let pw = arena.team_prior[e] * arena.lhood_lose[e];
                let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
                let raw = pw - pl;
                arena.vars.set(lf.diff(), raw * lf.msg());
                let d = lf.propagate(&mut arena.vars);
                step = tuple_max(step, d);

                let new_lw = pl + lf.msg();
                step = tuple_max(step, arena.lhood_win[e].delta(new_lw));
                arena.lhood_win[e] = new_lw;
            }

            iter += 1;
        }

        if n_diffs == 1 {
            let raw = (arena.team_prior[0] * arena.lhood_lose[0])
                - (arena.team_prior[1] * arena.lhood_win[1]);
            arena.vars.set(links[0].diff(), raw * links[0].msg());
            links[0].propagate(&mut arena.vars);
        }

        if n_diffs > 0 {
            let pl1 = arena.team_prior[1] * arena.lhood_win[1];
            arena.lhood_win[0] = pl1 + links[0].msg();
            let pw_last = arena.team_prior[n_teams - 2] * arena.lhood_lose[n_teams - 2];
            arena.lhood_lose[n_teams - 1] = pw_last - links[n_diffs - 1].msg();
        }

        self.evidence = links.iter().map(|l| l.evidence()).product();

        arena.inv_buf.resize(n_teams, 0);
        for (si, &orig_i) in arena.sort_buf.iter().enumerate() {
            arena.inv_buf[orig_i] = si;
        }

        self.likelihoods = self
            .teams
            .iter()
            .zip(self.weights.iter())
            .enumerate()
            .map(|(orig_i, (players, weights))| {
                let si = arena.inv_buf[orig_i];
                let m = arena.lhood_win[si] * arena.lhood_lose[si];
                let performance = players
                    .iter()
                    .zip(weights.iter())
                    .fold(N00, |p, (player, &w)| p + (player.performance() * w));
                players
                    .iter()
                    .zip(weights.iter())
                    .map(|(player, &w)| {
                        ((m - performance.exclude(player.performance() * w)) * (1.0 / w))
                            .forget(player.beta.powi(2))
                    })
                    .collect::<Vec<_>>()
            })
            .collect::<Vec<_>>();
    }

The body is identical to likelihoods except for the per-pair factor construction (no draw-margin computation, MarginFactor instead of TruncFactor). DRY would let us extract the loop, but the duplication is small (~50 lines) and the divergence may grow as more factor kinds are added; we accept it for clarity. Revisit in T4-Synergy if it gets unwieldy.

Step 6: Run the test to verify it passes

Run: cargo test --lib game::tests::scored_path_sharper_when_margin_is_large

Expected: PASS.

Step 7: Run the full test suite

Run: cargo test

Expected: all pass.

Step 8: Format and commit

cargo +nightly fmt
git add src/game.rs
git commit -m "feat(game): add scored_with_arena driving MarginFactor links"

Task 6: Public `Game::scored` constructor and `OwnedGame` support

Files:

Modify: src/game.rs
Step 1: Write a failing test in src/game.rs's test module

#[test]
fn game_scored_public_ctor() {
    use crate::Outcome;
    let prior = R::new(
        Gaussian::from_ms(25.0, 25.0 / 3.0),
        25.0 / 6.0,
        ConstantDrift(25.0 / 300.0),
    );
    let opts = GameOptions {
        score_sigma: 1.0,
        ..GameOptions::default()
    };
    let g = Game::scored(&[&[prior], &[prior]], Outcome::scores([8.0, 2.0]), &opts).unwrap();
    let p = g.posteriors();
    assert!(p[0][0].mu() > p[1][0].mu());
}

#[test]
fn game_scored_rejects_ranked_outcome() {
    let prior = R::new(
        Gaussian::from_ms(25.0, 25.0 / 3.0),
        25.0 / 6.0,
        ConstantDrift(25.0 / 300.0),
    );
    let err = Game::scored(
        &[&[prior], &[prior]],
        crate::Outcome::winner(0, 2),
        &GameOptions::default(),
    )
    .unwrap_err();
    assert!(matches!(err, crate::InferenceError::MismatchedShape { .. }));
}

Step 2: Run the tests to verify they fail

Run: cargo test --lib game::tests::game_scored_public_ctor game::tests::game_scored_rejects_ranked_outcome

Expected: FAIL — no function or associated item named scored.

Step 3: Add OwnedGame::new_scored constructor

In OwnedGame<T, D>'s impl (around src/game.rs:46-78), add right after new:

    pub(crate) fn new_scored(
        teams: Vec<Vec<Rating<T, D>>>,
        scores: Vec<f64>,
        weights: Vec<Vec<f64>>,
        score_sigma: f64,
    ) -> Self {
        let mut arena = ScratchArena::new();
        let g = Game::scored_with_arena(teams.clone(), &scores, &weights, score_sigma, &mut arena);
        let likelihoods = g.likelihoods;
        let evidence = g.evidence;
        Self {
            teams,
            result: scores,
            weights,
            p_draw: 0.0,
            likelihoods,
            evidence,
        }
    }

Step 4: Add Game::scored public method

In the impl<T: Time, D: Drift<T>> Game<'_, T, D> block (around src/game.rs:293-349), add right after ranked:

    pub fn scored(
        teams: &[&[Rating<T, D>]],
        outcome: crate::Outcome,
        options: &GameOptions,
    ) -> Result<OwnedGame<T, D>, crate::InferenceError> {
        if options.score_sigma <= 0.0 {
            return Err(crate::InferenceError::InvalidProbability {
                value: options.score_sigma,
            });
        }
        if outcome.team_count() != teams.len() {
            return Err(crate::InferenceError::MismatchedShape {
                kind: "outcome scores vs teams",
                expected: teams.len(),
                got: outcome.team_count(),
            });
        }
        let scores = outcome
            .as_scores()
            .ok_or(crate::InferenceError::MismatchedShape {
                kind: "Game::scored requires Outcome::Scored",
                expected: 0,
                got: 0,
            })?
            .to_vec();
        let teams_owned: Vec<Vec<Rating<T, D>>> = teams.iter().map(|t| t.to_vec()).collect();
        let weights: Vec<Vec<f64>> = teams.iter().map(|t| vec![1.0; t.len()]).collect();
        Ok(OwnedGame::new_scored(teams_owned, scores, weights, options.score_sigma))
    }

Step 5: Run the new tests to verify they pass

Run: cargo test --lib game::tests::game_scored_public_ctor game::tests::game_scored_rejects_ranked_outcome

Expected: both PASS.

Step 6: Run the full test suite

Run: cargo test

Expected: all pass.

Step 7: Format and commit

cargo +nightly fmt
git add src/game.rs
git commit -m "feat(game): add public Game::scored constructor"

Task 7: Plumb `Outcome::Scored` through `TimeSlice` and `History::add_events`

Files:

Modify: src/time_slice.rs
Modify: src/history.rs

The per-event Event struct in src/time_slice.rs:80-85 is { teams, evidence, weights }. We add a kind: EventKind field that selects which Game::*_with_arena to call. Score noise (score_sigma) lives inside the Scored variant so events can in principle have per-event sigma, though the public API only exposes one history-wide knob today.

Step 1: Add EventKind to src/time_slice.rs and a kind field on Event

In src/time_slice.rs, immediately above the struct Event definition (currently around line 80), add:

#[derive(Debug, Clone, Copy)]
pub(crate) enum EventKind {
    Ranked,
    Scored { score_sigma: f64 },
}

Modify struct Event (currently lines 81-85) to:

#[derive(Debug)]
pub(crate) struct Event {
    teams: Vec<Team>,
    evidence: f64,
    weights: Vec<Vec<f64>>,
    kind: EventKind,
}

Step 2: Dispatch on kind in Event::iteration_direct

Replace the body of Event::iteration_direct (currently src/time_slice.rs:123-144):

    fn iteration_direct<T: Time, D: Drift<T>>(
        &mut self,
        skills: &mut SkillStore,
        agents: &CompetitorStore<T, D>,
        p_draw: f64,
        arena: &mut ScratchArena,
    ) {
        let teams = self.within_priors(false, false, skills, agents);
        let result = self.outputs();
        let g = match self.kind {
            EventKind::Ranked => {
                Game::ranked_with_arena(teams, &result, &self.weights, p_draw, arena)
            }
            EventKind::Scored { score_sigma } => {
                Game::scored_with_arena(teams, &result, &self.weights, score_sigma, arena)
            }
        };

        for (t, team) in self.teams.iter_mut().enumerate() {
            for (i, item) in team.items.iter_mut().enumerate() {
                let old_likelihood = skills.get(item.agent).unwrap().likelihood;
                let new_likelihood = (old_likelihood / item.likelihood) * g.likelihoods[t][i];
                skills.get_mut(item.agent).unwrap().likelihood = new_likelihood;
                item.likelihood = g.likelihoods[t][i];
            }
        }

        self.evidence = g.evidence;
    }

Step 3: Dispatch on kind in TimeSlice::iteration (sequential branch)

Inside TimeSlice::iteration (currently src/time_slice.rs:295-325), replace the body of the if from > 0 || self.color_groups.is_empty() branch's inner for event in ... loop. The Game::ranked_with_arena(...) call (lines 302-308) becomes:

                let g = match event.kind {
                    EventKind::Ranked => Game::ranked_with_arena(
                        teams,
                        &result,
                        &event.weights,
                        self.p_draw,
                        &mut self.arena,
                    ),
                    EventKind::Scored { score_sigma } => Game::scored_with_arena(
                        teams,
                        &result,
                        &event.weights,
                        score_sigma,
                        &mut self.arena,
                    ),
                };

(The rest of that loop body — likelihood update + event.evidence = g.evidence — is unchanged.)

Step 4: Dispatch on kind in TimeSlice::log_evidence

TimeSlice::log_evidence (currently src/time_slice.rs:467-532) calls Game::ranked_with_arena in three places (lines 482-490, 506-514). For each, change to a match on event.kind mirroring Step 2.

Add a helper inside the impl to keep the call sites tidy:

    fn run_event<D: Drift<T>>(
        &self,
        event: &Event,
        online: bool,
        forward: bool,
        agents: &CompetitorStore<T, D>,
        arena: &mut ScratchArena,
    ) -> f64 {
        let teams = event.within_priors(online, forward, &self.skills, agents);
        let result = event.outputs();
        match event.kind {
            EventKind::Ranked => {
                Game::ranked_with_arena(teams, &result, &event.weights, self.p_draw, arena).evidence
            }
            EventKind::Scored { score_sigma } => {
                Game::scored_with_arena(teams, &result, &event.weights, score_sigma, arena)
                    .evidence
            }
        }
    }

Then replace the inline Game::ranked_with_arena(...).evidence.ln() calls with self.run_event(event, online, forward, agents, &mut arena).ln().

Step 5: Extend TimeSlice::add_events signature with per-event kinds

Change the add_events signature (currently src/time_slice.rs:203-209) to:

    pub fn add_events<D: Drift<T>>(
        &mut self,
        composition: Vec<Vec<Vec<Index>>>,
        results: Vec<Vec<f64>>,
        weights: Vec<Vec<Vec<f64>>>,
        kinds: Vec<EventKind>,
        agents: &CompetitorStore<T, D>,
    ) {

Inside the same method, update the event-construction map (around line 240). Each constructed Event gets its kind from kinds[e]:

            Event {
                teams,
                evidence: 0.0,
                weights,
                kind: kinds[e],
            }

Step 6: Update TimeSlice::add_events's tests to pass the new argument

Three call sites in src/time_slice.rs:604, :680, :759, :790, :855 (the unit tests test_one_event_each, test_same_strength, test_add_events, time_slice_color_groups_reorders_events) all call time_slice.add_events(...). Add a fourth argument vec![EventKind::Ranked; n_events] between weights and &agents for each call. Example:

        time_slice.add_events(
            vec![
                vec![vec![a], vec![b]],
                vec![vec![c], vec![d]],
                vec![vec![e], vec![f]],
            ],
            vec![vec![1.0, 0.0], vec![0.0, 1.0], vec![1.0, 0.0]],
            vec![],
            vec![EventKind::Ranked; 3],
            &agents,
        );

Step 7: Update the History callers of TimeSlice::add_events

In src/history.rs:562 and :572, the calls pass composition, results, weights, &self.agents. Add the kinds vector. We'll thread the per-event EventKind through add_events_with_prior in Step 8 and pass it in here as kinds_chunk.

Step 8: Extend History::add_events_with_prior to accept and route per-event kinds

In src/history.rs:447-454, change the signature to:

    pub(crate) fn add_events_with_prior(
        &mut self,
        composition: Vec<Vec<Vec<Index>>>,
        results: Vec<Vec<f64>>,
        times: Vec<T>,
        weights: Vec<Vec<Vec<f64>>>,
        kinds: Vec<crate::time_slice::EventKind>,
        mut priors: HashMap<Index, Rating<T, D>>,
    ) -> Result<(), InferenceError> {

Around line 543, alongside the existing per-batch slicing of composition, results, and weights, add:

            let kinds_chunk: Vec<crate::time_slice::EventKind> =
                (i..j).map(|e| kinds[o[e]]).collect();

Update the two time_slice.add_events(composition, results, weights, &self.agents) call sites (lines 562 and 572) to:

                time_slice.add_events(composition, results, weights, kinds_chunk, &self.agents);

(For both branches — existing-slice and new-slice. Use kinds_chunk.clone() if the borrow checker complains; the vec is small.)

Validation: also add a length check at the top of the function alongside the existing ones:

        if !kinds.is_empty() && kinds.len() != composition.len() {
            return Err(InferenceError::MismatchedShape {
                kind: "kinds",
                expected: composition.len(),
                got: kinds.len(),
            });
        }

Step 9: Update record_winner and record_draw to pass kinds

In src/history.rs:617-647, update both calls:

        self.add_events_with_prior(
            vec![vec![vec![w], vec![l]]],
            vec![vec![1.0, 0.0]],
            vec![time],
            vec![],
            vec![crate::time_slice::EventKind::Ranked],
            HashMap::new(),
        )

Same shape for record_draw.

Step 10: Update History::add_events to compute kinds per event and pass through

Replace the placeholder match arm added in Task 3 Step 5 (around src/history.rs:672-680). The full updated event-loop body of History::add_events (around lines 671-705) becomes:

        let mut kinds: Vec<crate::time_slice::EventKind> = Vec::with_capacity(events.len());

        for ev in events {
            let team_count = ev.teams.len();

            let (results_for_event, kind): (Vec<f64>, crate::time_slice::EventKind) = match &ev.outcome {
                Outcome::Ranked(ranks) => {
                    if ranks.len() != team_count {
                        return Err(InferenceError::MismatchedShape {
                            kind: "outcome ranks vs teams",
                            expected: team_count,
                            got: ranks.len(),
                        });
                    }
                    let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
                    let inverted: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();
                    (inverted, crate::time_slice::EventKind::Ranked)
                }
                Outcome::Scored(scores) => {
                    if scores.len() != team_count {
                        return Err(InferenceError::MismatchedShape {
                            kind: "outcome scores vs teams",
                            expected: team_count,
                            got: scores.len(),
                        });
                    }
                    (
                        scores.to_vec(),
                        crate::time_slice::EventKind::Scored {
                            score_sigma: self.score_sigma,
                        },
                    )
                }
            };

            let mut event_comp: Vec<Vec<Index>> = Vec::with_capacity(team_count);
            let mut event_weights: Vec<Vec<f64>> = Vec::with_capacity(team_count);

            for team in ev.teams {
                let mut team_indices: Vec<Index> = Vec::with_capacity(team.members.len());
                let mut team_weights: Vec<f64> = Vec::with_capacity(team.members.len());
                for member in team.members {
                    let idx = self.keys.get_or_create(&member.key);
                    team_indices.push(idx);
                    team_weights.push(member.weight);
                    if let Some(prior) = member.prior {
                        priors.insert(idx, Rating::new(prior, self.beta, self.drift));
                    }
                }
                event_comp.push(team_indices);
                event_weights.push(team_weights);
            }
            composition.push(event_comp);
            weights.push(event_weights);
            results.push(results_for_event);
            times.push(ev.time);
            kinds.push(kind);
        }

        self.add_events_with_prior(composition, results, times, weights, kinds, priors)

(Note EventKind needs to be re-exported from time_slice. Confirm pub(crate) enum EventKind in time_slice.rs is reachable from history.rs via crate::time_slice::EventKind.)

Step 11: Add score_sigma: f64 field to History and HistoryBuilder

In src/history.rs:21-37 (HistoryBuilder struct), add field score_sigma: f64,.

In the Default impl (around line 121), set score_sigma: 1.0.

In History::builder_with_key (around line 170), set score_sigma: 1.0.

In each builder transition method that constructs a new HistoryBuilder (drift at line 55, observer at line 85), copy the score_sigma field through.

Add a builder method (insert near p_draw, around line 70):

    pub fn score_sigma(mut self, score_sigma: f64) -> Self {
        self.score_sigma = score_sigma;
        self
    }

In HistoryBuilder::build (around line 100), set score_sigma: self.score_sigma, on the constructed History.

In the History struct (around line 135), add score_sigma: f64,.

Step 12: Write a failing integration test in tests/scored.rs (new file)

Create tests/scored.rs:

use smallvec::smallvec;
use trueskill_tt::{ConstantDrift, Event, History, Member, Outcome, Team};

#[test]
fn scored_two_team_one_event_pulls_winner_up() {
    let mut h = History::builder()
        .mu(25.0)
        .sigma(25.0 / 3.0)
        .beta(25.0 / 6.0)
        .drift(ConstantDrift(0.0))
        .build();

    let events: Vec<Event<i64, &'static str>> = vec![Event {
        time: 1,
        teams: smallvec![
            Team::with_members([Member::new("alice")]),
            Team::with_members([Member::new("bob")]),
        ],
        outcome: Outcome::scores([10.0, 0.0]),
    }];
    h.add_events(events).unwrap();
    h.converge().unwrap();

    let alice = h.current_skill(&"alice").unwrap();
    let bob = h.current_skill(&"bob").unwrap();
    assert!(alice.mu() > 25.0, "alice mu should exceed prior; got {}", alice.mu());
    assert!(bob.mu() < 25.0, "bob mu should be below prior; got {}", bob.mu());
}

#[test]
fn scored_zero_margin_treats_as_tie() {
    let mut h = History::builder()
        .mu(25.0)
        .sigma(25.0 / 3.0)
        .beta(25.0 / 6.0)
        .drift(ConstantDrift(0.0))
        .build();

    let events: Vec<Event<i64, &'static str>> = vec![Event {
        time: 1,
        teams: smallvec![
            Team::with_members([Member::new("alice")]),
            Team::with_members([Member::new("bob")]),
        ],
        outcome: Outcome::scores([3.0, 3.0]),
    }];
    h.add_events(events).unwrap();
    h.converge().unwrap();

    let alice = h.current_skill(&"alice").unwrap();
    let bob = h.current_skill(&"bob").unwrap();
    assert!((alice.mu() - bob.mu()).abs() < 1e-6, "tied scores -> equal mu; got {} vs {}", alice.mu(), bob.mu());
    // Sigma should still tighten (we have evidence diff ≈ 0).
    assert!(alice.sigma() < 25.0 / 3.0);
}

#[test]
fn scored_three_team_partial_order() {
    let mut h = History::builder()
        .mu(25.0)
        .sigma(25.0 / 3.0)
        .beta(25.0 / 6.0)
        .drift(ConstantDrift(0.0))
        .build();

    let events: Vec<Event<i64, &'static str>> = vec![Event {
        time: 1,
        teams: smallvec![
            Team::with_members([Member::new("a")]),
            Team::with_members([Member::new("b")]),
            Team::with_members([Member::new("c")]),
        ],
        outcome: Outcome::scores([20.0, 10.0, 5.0]),
    }];
    h.add_events(events).unwrap();
    h.converge().unwrap();

    let a = h.current_skill(&"a").unwrap();
    let b = h.current_skill(&"b").unwrap();
    let c = h.current_skill(&"c").unwrap();
    assert!(a.mu() > b.mu());
    assert!(b.mu() > c.mu());
}

#[test]
fn scored_rejects_outcome_team_count_mismatch() {
    use trueskill_tt::InferenceError;
    let mut h: History = History::builder().build();
    let events: Vec<Event<i64, &'static str>> = vec![Event {
        time: 1,
        teams: smallvec![
            Team::with_members([Member::new("a")]),
            Team::with_members([Member::new("b")]),
        ],
        outcome: Outcome::scores([1.0, 2.0, 3.0]),
    }];
    let err = h.add_events(events).unwrap_err();
    assert!(matches!(err, InferenceError::MismatchedShape { .. }));
}

Step 13: Run the integration tests

Run: cargo test --test scored

Expected: all four tests PASS (the wiring from Steps 1–11 is now complete).

Step 14: Run the full test suite + clippy

Run: cargo test && cargo clippy --all-targets -- -D warnings

Expected: all pass, no clippy warnings. Pay particular attention to the existing time_slice unit tests — they were updated in Step 6 and need to use EventKind::Ranked.

Step 15: Format and commit

cargo +nightly fmt
git add src/history.rs src/time_slice.rs tests/scored.rs
git commit -m "feat(history): route Outcome::Scored events through MarginFactor path"

Task 8: `EventBuilder::scores` convenience

Files:

Modify: src/event_builder.rs
Modify: tests/api_shape.rs (add a fluent-builder scored test)
Step 1: Write failing tests in tests/api_shape.rs

Append to the existing test list:

#[test]
fn fluent_event_builder_scores() {
    use trueskill_tt::ConstantDrift;
    let mut h = History::builder()
        .mu(25.0)
        .sigma(25.0 / 3.0)
        .beta(25.0 / 6.0)
        .drift(ConstantDrift(0.0))
        .build();

    h.event(1)
        .team(["alice"])
        .team(["bob"])
        .scores([12.0, 4.0])
        .commit()
        .unwrap();
    h.converge().unwrap();

    let a = h.current_skill(&"alice").unwrap();
    let b = h.current_skill(&"bob").unwrap();
    assert!(a.mu() > b.mu());
}

Step 2: Run the test to verify it fails

Run: cargo test --test api_shape fluent_event_builder_scores

Expected: FAIL — no method named scores.

Step 3: Add .scores to EventBuilder

In src/event_builder.rs, alongside .ranking/.winner/.draw (around line 73), add:

    /// Set explicit per-team continuous scores; higher = better.
    pub fn scores<I: IntoIterator<Item = f64>>(mut self, scores: I) -> Self {
        self.event.outcome = crate::Outcome::scores(scores);
        self
    }

Step 4: Run the test to verify it passes

Run: cargo test --test api_shape fluent_event_builder_scores

Expected: PASS.

Step 5: Run the full test suite

Run: cargo test

Expected: all pass.

Step 6: Format and commit

cargo +nightly fmt
git add src/event_builder.rs tests/api_shape.rs
git commit -m "feat(event-builder): add .scores convenience for Outcome::Scored"

Task 9: Worked example — scored matches end-to-end

Files:

Create: examples/scored.rs
Step 1: Create the example

//! Worked example: continuous-score outcomes via `Outcome::Scored`.
//!
//! Three players play a small round-robin where the score margin matters,
//! not just who won. We show how `score_sigma` controls how much weight
//! the engine places on the observed margin.
//!
//! Run with: `cargo run --example scored --release`

use smallvec::smallvec;
use trueskill_tt::{ConstantDrift, Event, History, Member, Outcome, Team};

fn main() {
    let mut h = History::builder()
        .mu(25.0)
        .sigma(25.0 / 3.0)
        .beta(25.0 / 6.0)
        .drift(ConstantDrift(0.03))
        .score_sigma(2.0) // tune to data; smaller = trust margins more
        .build();

    let events: Vec<Event<i64, &'static str>> = vec![
        Event {
            time: 1,
            teams: smallvec![
                Team::with_members([Member::new("alice")]),
                Team::with_members([Member::new("bob")]),
            ],
            outcome: Outcome::scores([21.0, 9.0]),
        },
        Event {
            time: 2,
            teams: smallvec![
                Team::with_members([Member::new("bob")]),
                Team::with_members([Member::new("carol")]),
            ],
            outcome: Outcome::scores([21.0, 18.0]),
        },
        Event {
            time: 3,
            teams: smallvec![
                Team::with_members([Member::new("alice")]),
                Team::with_members([Member::new("carol")]),
            ],
            outcome: Outcome::scores([21.0, 21.0]),
        },
    ];
    h.add_events(events).unwrap();

    let report = h.converge().unwrap();
    println!(
        "converged={}, iterations={}, log_evidence={:.4}",
        report.converged, report.iterations, report.log_evidence
    );

    for who in &["alice", "bob", "carol"] {
        let s = h.current_skill(who).unwrap();
        println!("{:>6}: mu={:>7.3}  sigma={:.3}", who, s.mu(), s.sigma());
    }
}

Step 2: Confirm the example compiles and runs

Run: cargo run --example scored --release

Expected: prints converged=true with three player skills; alice highest, bob middle, carol lowest (or close to bob — depends on score_sigma).

Step 3: Commit

cargo +nightly fmt
git add examples/scored.rs
git commit -m "docs(examples): worked Outcome::Scored example"

Task 10: Benchmark — scored ingestion + convergence

Files:

Create: benches/scored.rs
Modify: Cargo.toml (add [[bench]] entry if needed)
Step 1: Check Cargo.toml for the existing bench wiring

Run: cat Cargo.toml | grep -A 3 'bench'

If auto-bench = false is set or each bench is registered explicitly, add a new entry:

[[bench]]
name = "scored"
harness = false

Step 2: Create benches/scored.rs modeled on benches/batch.rs

use criterion::{Criterion, criterion_group, criterion_main};
use smallvec::smallvec;
use trueskill_tt::{ConstantDrift, Event, History, Member, Outcome, Team};

fn bench_scored_history(c: &mut Criterion) {
    c.bench_function("scored_history_60_events_30_iter", |bencher| {
        bencher.iter(|| {
            let mut h = History::builder()
                .mu(25.0)
                .sigma(25.0 / 3.0)
                .beta(25.0 / 6.0)
                .drift(ConstantDrift(0.03))
                .score_sigma(2.0)
                .build();

            let mut events: Vec<Event<i64, String>> = Vec::with_capacity(60);
            for i in 0..60 {
                let a = format!("p{}", i % 20);
                let b = format!("p{}", (i + 7) % 20);
                let s_a = (i as f64 * 0.3).sin().abs() * 21.0;
                let s_b = (i as f64 * 0.3).cos().abs() * 21.0;
                events.push(Event {
                    time: 1 + (i / 6) as i64,
                    teams: smallvec![
                        Team::with_members([Member::new(a)]),
                        Team::with_members([Member::new(b)]),
                    ],
                    outcome: Outcome::scores([s_a, s_b]),
                });
            }
            h.add_events(events).unwrap();
            h.converge().unwrap();
        });
    });
}

criterion_group!(benches, bench_scored_history);
criterion_main!(benches);

The History here uses String keys to match the typical real-world bench shape; if History<i64, _, _, String> requires builder_with_key, adapt accordingly.

Step 3: Verify the benchmark compiles

Run: cargo bench --no-run --bench scored

Expected: builds without error.

Step 4: Run the benchmark and capture a baseline number

Run: cargo bench --bench scored 2>&1 | tee benches/scored_baseline.txt

(Save the result alongside the existing benches/baseline.txt so future tiers can compare.)

Step 5: Commit

cargo +nightly fmt
git add benches/scored.rs benches/scored_baseline.txt Cargo.toml
git commit -m "bench(scored): add criterion bench mirroring batch bench"

Task 11: Documentation — README + CLAUDE.md status update

Files:

Modify: README.md
Modify: CLAUDE.md
Modify: docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md (mark MarginFactor done)
Step 1: Add a "Scored outcomes" subsection to README.md

Find the existing ## Usage section (or equivalent) and add:

### Scored outcomes

Use `Outcome::scores([...])` when you have continuous per-team scores rather
than just ranks. Adjacent score margins flow into a `MarginFactor` that adds
soft Gaussian evidence about the latent performance diff. Configure
`HistoryBuilder::score_sigma(σ)` to control how much you trust the margins
(smaller σ = more trust).

```rust
use trueskill_tt::{History, Outcome};

let mut h = History::builder().score_sigma(2.0).build();
h.event(1)
    .team(["alice"])
    .team(["bob"])
    .scores([21.0, 9.0])
    .commit()
    .unwrap();
h.converge().unwrap();
```

(Replace the backticks-surrounded fence indicators above (```rust and `````) with proper triple backticks; the zero-width chars are there to avoid breaking this plan file's nesting.)

Step 2: Update CLAUDE.md architecture notes

In CLAUDE.md, add to the existing factor list (or near the architecture section):

- `MarginFactor` (factor/margin.rs) — Gaussian observation factor on a diff variable; engaged by `Outcome::Scored`.

Step 3: Mark the T4-Margin item complete in the spec

In docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md, find the T4 section (line 577 onward):

- `MarginFactor` → enables `Outcome::Scored`.

Change to:

- `MarginFactor` → enables `Outcome::Scored`. **Done** (see `docs/superpowers/plans/2026-04-27-t4-margin-factor.md`).

Step 4: Final full test + clippy + fmt run

Run:

cargo +nightly fmt
cargo clippy --all-targets -- -D warnings
cargo test
cargo bench --no-run

Expected: all green, no warnings, all bench targets compile.

Step 5: Commit

git add README.md CLAUDE.md docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md
git commit -m "docs(t4-margin): document Outcome::Scored and mark spec item done"

Acceptance criteria

All existing lib + integration tests still pass with their existing golden values (Trunc path is bit-for-bit unchanged after the DiffFactor refactor in Task 4).
cargo test --test scored passes all four tests added in Task 7.
cargo run --example scored --release runs and prints sensible posteriors.
cargo bench --bench scored produces a baseline result saved under benches/.
cargo clippy --all-targets -- -D warnings is clean.
Outcome::Scored is accepted by the public API: History::add_events, History::event(...).scores(...), and Game::scored.
score_sigma is configurable via HistoryBuilder::score_sigma and GameOptions::score_sigma, default 1.0.

Out of scope (deferred to later T4 plans)

Damped / Residual schedules
SynergyFactor
ScoreFactor (continuous outcome variable distinct from observed margin)
Per-event score_sigma overrides (currently history-wide)
Tie-band MarginFactor variant (m_obs band rather than point observation)

64 KiB Raw Blame History Unescape Escape

T4 — MarginFactor + Outcome::Scored Implementation Plan

File Structure

Background — math the implementer needs

Task 1: MarginFactor core (file + struct + Factor impl + unit tests)

Task 2: Wire MarginFactor into BuiltinFactor enum dispatch

Task 3: Add Outcome::Scored variant and accessors

Task 4: Internal DiffFactor enum to dispatch Trunc vs Margin per-pair

Task 5: Add score_sigma to GameOptions and the scored path in Game::likelihoods

Task 6: Public Game::scored constructor and OwnedGame support

Task 7: Plumb Outcome::Scored through TimeSlice and History::add_events

Task 8: EventBuilder::scores convenience

Task 9: Worked example — scored matches end-to-end

Task 10: Benchmark — scored ingestion + convergence

Task 11: Documentation — README + CLAUDE.md status update

Acceptance criteria

Out of scope (deferred to later T4 plans)

64 KiB

Raw Blame History

Task 1: `MarginFactor` core (file + struct + Factor impl + unit tests)

Task 2: Wire `MarginFactor` into `BuiltinFactor` enum dispatch

Task 3: Add `Outcome::Scored` variant and accessors

Task 4: Internal `DiffFactor` enum to dispatch Trunc vs Margin per-pair

Task 5: Add `score_sigma` to `GameOptions` and the scored path in `Game::likelihoods`

Task 6: Public `Game::scored` constructor and `OwnedGame` support

Task 7: Plumb `Outcome::Scored` through `TimeSlice` and `History::add_events`

Task 8: `EventBuilder::scores` convenience