trueskill-tt/docs/superpowers/plans/2026-04-27-t4-margin-factor.md

# T4 — MarginFactor + Outcome::Scored Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Add a `MarginFactor` (Gaussian observation factor on a diff variable) and an `Outcome::Scored(scores)` variant, so users can supply continuous per-team scores instead of just ranks. Per-pair score margins become soft EP evidence about the latent performance diff.

**Architecture:**
- Sort scored teams by score descending; for each adjacent pair compute `m_obs = score_higher − score_lower ≥ 0`. Per pair: `RankDiffFactor` writes `diff = team_a − team_b`, then a `MarginFactor` multiplies in the Gaussian observation `N(m_obs, score_sigma²)`. This replaces the `TruncFactor` for scored outcomes; ranked outcomes are unchanged.
- A new internal enum `DiffFactor { Trunc(TruncFactor), Margin(MarginFactor) }` lets `Game::likelihoods` keep its single hand-rolled forward/backward sweep loop while dispatching the per-diff factor by enum.
- `score_sigma` is configurable on `GameOptions` and `HistoryBuilder` (default `1.0`).
- `Outcome` is already `#[non_exhaustive]`, so adding `Scored` is non-breaking for downstream `match` arms.

**Tech Stack:** Rust 2024, smallvec, rayon (already in tree). No new crate dependencies.

---

## File Structure

| Path | Status | Responsibility |
|---|---|---|
| `src/factor/margin.rs` | **create** | `MarginFactor` struct + `Factor` impl + cavity-cached evidence + unit tests |
| `src/factor/mod.rs` | modify | `pub mod margin;`, `BuiltinFactor::Margin(...)` variant + dispatch arms |
| `src/factors.rs` | modify | re-export `MarginFactor` |
| `src/outcome.rs` | modify | `Outcome::Scored(SmallVec<[f64; 4]>)` variant, `scores()` ctor, `as_scores()` accessor, `team_count` arm |
| `src/game.rs` | modify | `pub(crate) enum DiffFactor`, scored path in `likelihoods`, `Game::scored()` ctor, `GameOptions::score_sigma` |
| `src/event_builder.rs` | modify | `.scores([...])` builder method |
| `src/history.rs` | modify | match `Outcome::Scored` in `add_events`; `HistoryBuilder::score_sigma`; new internal `add_events_scored_with_prior` (or extra arg) |
| `tests/scored.rs` | **create** | end-to-end Scored integration tests |
| `examples/scored.rs` | **create** | worked example using `Outcome::Scored` |
| `benches/scored.rs` | **create** | criterion benchmark mirroring `batch.rs` with scored events |
| `CLAUDE.md` | modify | mark T4-MarginFactor complete in the architecture notes |

---

## Background — math the implementer needs

For a diff variable `D` with current marginal `D_marg`, the MarginFactor models an observation `m_obs ~ N(D, σ²)` where `σ = score_sigma`. Standard EP for a Gaussian-likelihood factor:

1. **Cavity:** `D_cav = D_marg / msg`  (where `msg` is this factor's stored outgoing message; init `N_INF` so the first cavity = the current marginal).
2. **Tilted distribution:** `D_cav · N(m_obs, σ²)` — a product of two Gaussians; closed-form, no approximation needed (so it converges in one propagation).
3. **New marginal:** the tilted distribution.
4. **New outgoing message:** `new_msg = new_marginal / D_cav`. Because the tilted distribution is exact, `new_msg = N(m_obs, σ²)` (a constant in `m_obs` and `σ`).
5. **Cavity evidence:** `Z_cav = pdf(m_obs; D_cav.mu(), sqrt(D_cav.sigma()² + σ²))` (the marginal likelihood of `m_obs` under the cavity). Cache on first propagate, identical to `TruncFactor`'s pattern. `log_evidence = Z_cav.ln()`.

Practical consequence: `MarginFactor::propagate` returns a non-zero delta on its first call (because `msg` jumps from `N_INF` to `N(m_obs, σ²)`) and exactly zero afterwards, since `new_msg` is a constant.

A Gaussian `N(m, σ)` constructed via `Gaussian::from_ms(m, σ)`. Multiplication adds nat-params (`pi += other.pi; tau += other.tau`). Division subtracts. The `pdf(x, mu, sigma)` helper already exists in `lib.rs` (private, but importable as `crate::pdf`).

**Concrete numerical check for tests:** With cavity `N(0, 6)` and observation `m_obs=5, σ=1`:
- `D_cav.pi = 1/36 ≈ 0.027778`, `D_cav.tau = 0`.
- New marginal: `pi = 0.027778 + 1 = 1.027778`, `tau = 0 + 5 = 5`. So `mu = 5 / 1.027778 ≈ 4.864865`, `sigma = 1/sqrt(1.027778) ≈ 0.986394`.
- `Z_cav = pdf(5, 0, sqrt(36 + 1)) = pdf(5, 0, sqrt(37)) ≈ 0.046827`. So `log_evidence ≈ -3.0613`.

---

### Task 1: `MarginFactor` core (file + struct + Factor impl + unit tests)

**Files:**
- Create: `src/factor/margin.rs`
- Modify: `src/factor/mod.rs:100-102` (add `pub mod margin;` next to the existing `pub mod` lines)

- [ ] **Step 1: Add the module declaration so the new file compiles**

In `src/factor/mod.rs`, find the existing block:

```rust
pub mod rank_diff;
pub mod team_sum;
pub mod trunc;
```

Replace with:

```rust
pub mod margin;
pub mod rank_diff;
pub mod team_sum;
pub mod trunc;
```

- [ ] **Step 2: Create `src/factor/margin.rs` with the failing tests first**

```rust
use crate::{
    N_INF, cdf, pdf,
    factor::{Factor, VarId, VarStore},
    gaussian::Gaussian,
};

/// Gaussian observation factor on a diff variable.
///
/// Encodes the soft evidence `m_obs ~ N(diff, sigma²)`. The outgoing message
/// to `diff` is the constant `N(m_obs, sigma²)`, so this factor converges in a
/// single propagation: subsequent calls return a zero delta.
#[derive(Debug)]
pub struct MarginFactor {
    pub diff: VarId,
    pub m_obs: f64,
    pub sigma: f64,
    pub(crate) msg: Gaussian,
    pub(crate) evidence_cached: Option<f64>,
}

impl MarginFactor {
    pub fn new(diff: VarId, m_obs: f64, sigma: f64) -> Self {
        debug_assert!(sigma > 0.0, "score sigma must be positive");
        Self {
            diff,
            m_obs,
            sigma,
            msg: N_INF,
            evidence_cached: None,
        }
    }
}

impl Factor for MarginFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        let marginal = vars.get(self.diff);
        let cavity = marginal / self.msg;

        if self.evidence_cached.is_none() {
            self.evidence_cached = Some(cavity_evidence(cavity, self.m_obs, self.sigma));
        }

        let new_msg = Gaussian::from_ms(self.m_obs, self.sigma);
        let new_marginal = cavity * new_msg;
        let old_msg = self.msg;
        self.msg = new_msg;
        vars.set(self.diff, new_marginal);

        old_msg.delta(new_msg)
    }

    fn log_evidence(&self, _vars: &VarStore) -> f64 {
        self.evidence_cached.unwrap_or(1.0).ln()
    }
}

fn cavity_evidence(cavity: Gaussian, m_obs: f64, sigma: f64) -> f64 {
    let combined_sigma = (cavity.sigma().powi(2) + sigma.powi(2)).sqrt();
    pdf(m_obs, cavity.mu(), combined_sigma)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn first_propagate_writes_tilted_marginal() {
        let mut vars = VarStore::new();
        let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
        let mut f = MarginFactor::new(diff, 5.0, 1.0);

        f.propagate(&mut vars);

        let result = vars.get(diff);
        // pi = 1/36 + 1 ≈ 1.027778; tau = 0 + 5 = 5
        // mu = 5 / 1.027778 ≈ 4.864865; sigma = 1/sqrt(1.027778) ≈ 0.986394
        assert!((result.mu() - 4.864864864864865).abs() < 1e-12);
        assert!((result.sigma() - 0.986393923832144).abs() < 1e-12);
    }

    #[test]
    fn converges_in_one_step() {
        let mut vars = VarStore::new();
        let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
        let mut f = MarginFactor::new(diff, 5.0, 1.0);

        f.propagate(&mut vars);
        let (dmu, dsig) = f.propagate(&mut vars);
        assert!(dmu < 1e-12, "expected ~0 delta on second propagate, got {dmu}");
        assert!(dsig < 1e-12);
    }

    #[test]
    fn evidence_cached_on_first_propagate() {
        let mut vars = VarStore::new();
        let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
        let mut f = MarginFactor::new(diff, 5.0, 1.0);
        assert!(f.evidence_cached.is_none());

        f.propagate(&mut vars);
        let z = f.evidence_cached.unwrap();
        // pdf(5, 0, sqrt(37)) ≈ 0.046827
        assert!((z - 0.04682752233851171).abs() < 1e-10);

        // Subsequent propagations don't change it.
        f.propagate(&mut vars);
        assert_eq!(f.evidence_cached.unwrap(), z);
    }

    #[test]
    fn log_evidence_matches_cached_ln() {
        let mut vars = VarStore::new();
        let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
        let mut f = MarginFactor::new(diff, 5.0, 1.0);
        f.propagate(&mut vars);
        let logz = f.log_evidence(&vars);
        assert!((logz - (-3.061357379815869)).abs() < 1e-10);
    }

    // Silence unused-import warning for cdf until/if a tie-band variant is added.
    #[allow(dead_code)]
    fn _cdf_smoke() -> f64 {
        cdf(0.0, 0.0, 1.0)
    }
}
```

> Note: the unused `cdf` import keeps parity with `trunc.rs` style and reserves the spot if a tie-band MarginFactor variant gets added later. If you'd rather drop it, remove the `cdf` from the import list and delete `_cdf_smoke`.

- [ ] **Step 3: Run the new tests to verify they pass once added (after Step 2 they will pass; this step is the guard)**

Run: `cargo test --lib factor::margin`

Expected: 4 passed.

- [ ] **Step 4: Verify the module compiles cleanly with no warnings**

Run: `cargo build` and `cargo clippy --lib -- -D warnings`

Expected: no warnings, no errors.

- [ ] **Step 5: Format and commit**

```bash
cargo +nightly fmt
git add src/factor/margin.rs src/factor/mod.rs
git commit -m "feat(factor): add MarginFactor for scored-margin EP evidence"
```

---

### Task 2: Wire `MarginFactor` into `BuiltinFactor` enum dispatch

**Files:**
- Modify: `src/factor/mod.rs:76-98` (the `BuiltinFactor` enum and its `Factor` impl)
- Modify: `src/factors.rs:7-13` (the public re-export list)

- [ ] **Step 1: Write a failing dispatch test in `src/factor/mod.rs`**

Open `src/factor/mod.rs`. Inside the existing `#[cfg(test)] mod tests { ... }` block (around line 105), add:

```rust
#[test]
fn builtin_factor_dispatches_to_margin() {
    use super::margin::MarginFactor;
    let mut vars = VarStore::new();
    let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
    let mut f = BuiltinFactor::Margin(MarginFactor::new(diff, 5.0, 1.0));

    f.propagate(&mut vars);

    let result = vars.get(diff);
    assert!((result.mu() - 4.864864864864865).abs() < 1e-12);

    let logz = f.log_evidence(&vars);
    assert!((logz - (-3.061357379815869)).abs() < 1e-10);
}
```

- [ ] **Step 2: Run the test to verify it fails**

Run: `cargo test --lib factor::tests::builtin_factor_dispatches_to_margin`

Expected: FAIL with `no variant named Margin found for enum BuiltinFactor`.

- [ ] **Step 3: Add the enum variant + Factor impl arms**

Replace the current `BuiltinFactor` definition and its `Factor` impl (currently `src/factor/mod.rs:76-98`):

```rust
/// Enum dispatcher for the built-in factor types.
///
/// Using an enum instead of `Box<dyn Factor>` keeps factor data inline and
/// avoids virtual-call overhead in the hot inference loop.
#[derive(Debug)]
pub enum BuiltinFactor {
    TeamSum(team_sum::TeamSumFactor),
    RankDiff(rank_diff::RankDiffFactor),
    Trunc(trunc::TruncFactor),
    Margin(margin::MarginFactor),
}

impl Factor for BuiltinFactor {
    fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
        match self {
            Self::TeamSum(f) => f.propagate(vars),
            Self::RankDiff(f) => f.propagate(vars),
            Self::Trunc(f) => f.propagate(vars),
            Self::Margin(f) => f.propagate(vars),
        }
    }

    fn log_evidence(&self, vars: &VarStore) -> f64 {
        match self {
            Self::Trunc(f) => f.log_evidence(vars),
            Self::Margin(f) => f.log_evidence(vars),
            _ => 0.0,
        }
    }
}
```

- [ ] **Step 4: Re-export `MarginFactor` from `src/factors.rs`**

Replace the body of `src/factors.rs` (lines 7-13) with:

```rust
pub use crate::{
    factor::{
        BuiltinFactor, Factor, VarId, VarStore, margin::MarginFactor,
        rank_diff::RankDiffFactor, team_sum::TeamSumFactor, trunc::TruncFactor,
    },
    schedule::{EpsilonOrMax, Schedule, ScheduleReport},
};
```

- [ ] **Step 5: Run the test to verify it passes**

Run: `cargo test --lib factor::tests::builtin_factor_dispatches_to_margin`

Expected: PASS.

- [ ] **Step 6: Run the full lib test suite to confirm no regressions**

Run: `cargo test --lib`

Expected: all tests pass (current count + 5 new from Tasks 1–2).

- [ ] **Step 7: Format and commit**

```bash
cargo +nightly fmt
git add src/factor/mod.rs src/factors.rs
git commit -m "feat(factor): dispatch MarginFactor through BuiltinFactor enum"
```

---

### Task 3: Add `Outcome::Scored` variant and accessors

**Files:**
- Modify: `src/outcome.rs`

- [ ] **Step 1: Write failing tests in `src/outcome.rs`**

Add to the existing `#[cfg(test)] mod tests { ... }` block (after `winner_out_of_range_panics`, around line 86):

```rust
#[test]
fn scored_two_teams() {
    let o = Outcome::scores([10.0, 4.0]);
    assert_eq!(o.team_count(), 2);
    assert_eq!(o.as_scores(), Some(&[10.0, 4.0][..]));
    assert_eq!(o.as_ranks(), None);
}

#[test]
fn scored_team_count_matches_input() {
    let o = Outcome::scores([3.0, 1.0, 2.0, 0.0]);
    assert_eq!(o.team_count(), 4);
}

#[test]
fn ranked_as_scores_returns_none() {
    let o = Outcome::winner(0, 2);
    assert!(o.as_scores().is_none());
    assert!(o.as_ranks().is_some());
}
```

- [ ] **Step 2: Run the tests to verify they fail**

Run: `cargo test --lib outcome::tests`

Expected: FAIL — `no function or associated item named scores found`, etc.

- [ ] **Step 3: Implement the `Scored` variant and helpers**

Replace the body of `src/outcome.rs` with:

```rust
//! Outcome of a match.
//!
//! `Ranked(ranks)` for ordinal results; `Scored(scores)` for continuous
//! per-team scores (engages `MarginFactor` in the engine).

use smallvec::SmallVec;

/// Final outcome of a match.
///
/// `Ranked(ranks)`: lower rank = better. Equal ranks mean a tie between those
/// teams. `ranks.len()` must equal the number of teams in the event.
///
/// `Scored(scores)`: higher score = better. Adjacent (sorted) pairs feed
/// observed margins to `MarginFactor`. `scores.len()` must equal the number
/// of teams in the event.
#[derive(Clone, Debug, PartialEq)]
#[non_exhaustive]
pub enum Outcome {
    Ranked(SmallVec<[u32; 4]>),
    Scored(SmallVec<[f64; 4]>),
}

impl Outcome {
    /// `n`-team outcome where team `winner` won and everyone else tied for last.
    ///
    /// Panics if `winner >= n`.
    pub fn winner(winner: u32, n: u32) -> Self {
        assert!(winner < n, "winner index {winner} out of range 0..{n}");
        let ranks: SmallVec<[u32; 4]> = (0..n).map(|i| if i == winner { 0 } else { 1 }).collect();
        Self::Ranked(ranks)
    }

    /// All `n` teams tied.
    pub fn draw(n: u32) -> Self {
        Self::Ranked(SmallVec::from_vec(vec![0; n as usize]))
    }

    /// Explicit per-team ranking.
    pub fn ranking<I: IntoIterator<Item = u32>>(ranks: I) -> Self {
        Self::Ranked(ranks.into_iter().collect())
    }

    /// Explicit per-team continuous scores; higher = better.
    pub fn scores<I: IntoIterator<Item = f64>>(scores: I) -> Self {
        Self::Scored(scores.into_iter().collect())
    }

    pub fn team_count(&self) -> usize {
        match self {
            Self::Ranked(r) => r.len(),
            Self::Scored(s) => s.len(),
        }
    }

    pub(crate) fn as_ranks(&self) -> Option<&[u32]> {
        match self {
            Self::Ranked(r) => Some(r),
            Self::Scored(_) => None,
        }
    }

    pub(crate) fn as_scores(&self) -> Option<&[f64]> {
        match self {
            Self::Scored(s) => Some(s),
            Self::Ranked(_) => None,
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn winner_two_teams() {
        let o = Outcome::winner(0, 2);
        assert_eq!(o.as_ranks(), Some(&[0u32, 1][..]));
        assert_eq!(o.team_count(), 2);
    }

    #[test]
    fn winner_three_teams_second_wins() {
        let o = Outcome::winner(1, 3);
        assert_eq!(o.as_ranks(), Some(&[1u32, 0, 1][..]));
    }

    #[test]
    fn draw_three_teams() {
        let o = Outcome::draw(3);
        assert_eq!(o.as_ranks(), Some(&[0u32, 0, 0][..]));
    }

    #[test]
    fn ranking_from_iter() {
        let o = Outcome::ranking([2, 0, 1]);
        assert_eq!(o.as_ranks(), Some(&[2u32, 0, 1][..]));
    }

    #[test]
    #[should_panic(expected = "winner index 2 out of range")]
    fn winner_out_of_range_panics() {
        let _ = Outcome::winner(2, 2);
    }

    #[test]
    fn scored_two_teams() {
        let o = Outcome::scores([10.0, 4.0]);
        assert_eq!(o.team_count(), 2);
        assert_eq!(o.as_scores(), Some(&[10.0, 4.0][..]));
        assert_eq!(o.as_ranks(), None);
    }

    #[test]
    fn scored_team_count_matches_input() {
        let o = Outcome::scores([3.0, 1.0, 2.0, 0.0]);
        assert_eq!(o.team_count(), 4);
    }

    #[test]
    fn ranked_as_scores_returns_none() {
        let o = Outcome::winner(0, 2);
        assert!(o.as_scores().is_none());
        assert!(o.as_ranks().is_some());
    }
}
```

> Note: the existing `as_ranks` returned `&[u32]` and was `#[allow(dead_code)]`. The new signature returns `Option<&[u32]>` because `Ranked` is no longer the only variant. All in-tree call sites that used `as_ranks()` (we'll update them in later tasks) must now handle the `Option`.

- [ ] **Step 4: Run the outcome tests to verify they pass**

Run: `cargo test --lib outcome`

Expected: 8 passed.

- [ ] **Step 5: Update existing call sites to handle the new `Option<&[u32]>` return**

Two call sites use `as_ranks()` today. Update each to expect `Option`:

In `src/history.rs:672`, change:

```rust
let ranks = ev.outcome.as_ranks();
if ranks.len() != ev.teams.len() {
```

to:

```rust
let ranks = match ev.outcome.as_ranks() {
    Some(r) => r,
    None => {
        // Scored path will be wired in Task 7; for now it's an error.
        return Err(InferenceError::MismatchedShape {
            kind: "outcome variant",
            expected: 0,
            got: 0,
        });
    }
};
if ranks.len() != ev.teams.len() {
```

In `src/history.rs:701`, change:

```rust
let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
let inverted: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();
```

(no change needed — `ranks` is already `&[u32]` here).

In `src/game.rs:312`, change:

```rust
let ranks = outcome.as_ranks();
let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
let result: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();
```

to:

```rust
let ranks = outcome.as_ranks().ok_or(crate::InferenceError::MismatchedShape {
    kind: "Game::ranked requires Outcome::Ranked",
    expected: 0,
    got: 0,
})?;
let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
let result: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();
```

- [ ] **Step 6: Verify the full lib still compiles and tests pass**

Run: `cargo test --lib`

Expected: all tests pass (call sites updated cleanly).

- [ ] **Step 7: Format and commit**

```bash
cargo +nightly fmt
git add src/outcome.rs src/history.rs src/game.rs
git commit -m "feat(outcome): add Scored variant; switch as_ranks/as_scores to Option"
```

---

### Task 4: Internal `DiffFactor` enum to dispatch Trunc vs Margin per-pair

**Files:**
- Modify: `src/game.rs` (top of file, before `Game` impl)

- [ ] **Step 1: Write a failing test in `src/game.rs`'s test module**

In the `#[cfg(test)] mod tests { ... }` block at the bottom of `src/game.rs`, add (after `test_2vs2_weighted`):

```rust
#[test]
fn diff_factor_dispatch_trunc_and_margin() {
    use crate::factor::{margin::MarginFactor, trunc::TruncFactor, VarStore};
    use super::DiffFactor;

    let mut vars = VarStore::new();
    let dt = vars.alloc(Gaussian::from_ms(0.0, 6.0));
    let dm = vars.alloc(Gaussian::from_ms(0.0, 6.0));

    let mut t = DiffFactor::Trunc(TruncFactor::new(dt, 0.0, false));
    let mut m = DiffFactor::Margin(MarginFactor::new(dm, 5.0, 1.0));

    let _ = t.propagate(&mut vars);
    let _ = m.propagate(&mut vars);

    // Smoke: both diffs got written; their msgs are non-N_INF.
    assert!(t.msg().pi() > 0.0);
    assert!(m.msg().pi() > 0.0);
    assert_eq!(t.diff(), dt);
    assert_eq!(m.diff(), dm);
}
```

- [ ] **Step 2: Run the test to verify it fails**

Run: `cargo test --lib game::tests::diff_factor_dispatch_trunc_and_margin`

Expected: FAIL — `cannot find type DiffFactor in this scope`.

- [ ] **Step 3: Add the `DiffFactor` enum at the top of `src/game.rs`**

Insert after the existing `use` block (around line 14, before `pub struct GameOptions`):

```rust
use crate::factor::margin::MarginFactor;

/// Per-adjacent-pair link factor in the game's diff chain.
///
/// `Trunc` is used for `Outcome::Ranked` (rank-based truncation).
/// `Margin` is used for `Outcome::Scored` (Gaussian observation on the diff).
#[derive(Debug)]
pub(crate) enum DiffFactor {
    Trunc(TruncFactor),
    Margin(MarginFactor),
}

impl DiffFactor {
    pub(crate) fn diff(&self) -> crate::factor::VarId {
        match self {
            Self::Trunc(f) => f.diff,
            Self::Margin(f) => f.diff,
        }
    }

    pub(crate) fn msg(&self) -> Gaussian {
        match self {
            Self::Trunc(f) => f.msg,
            Self::Margin(f) => f.msg,
        }
    }

    pub(crate) fn evidence(&self) -> f64 {
        match self {
            Self::Trunc(f) => f.evidence_cached.unwrap_or(1.0),
            Self::Margin(f) => f.evidence_cached.unwrap_or(1.0),
        }
    }

    pub(crate) fn propagate(&mut self, vars: &mut crate::factor::VarStore) -> (f64, f64) {
        use crate::factor::Factor;
        match self {
            Self::Trunc(f) => f.propagate(vars),
            Self::Margin(f) => f.propagate(vars),
        }
    }
}
```

- [ ] **Step 4: Refactor `Game::likelihoods` to drive `Vec<DiffFactor>` instead of `Vec<TruncFactor>`**

This is a mechanical rename inside `Game::likelihoods` (currently `src/game.rs:135-273`). The loop logic is unchanged; we just move the per-pair object behind the enum. Replace the body of `Game::likelihoods` from where `let mut trunc: Vec<TruncFactor> = ...` is constructed (around line 160) to its last use (around line 243):

```rust
        // One DiffFactor per adjacent sorted-team pair; each owns a diff VarId.
        let mut links: Vec<DiffFactor> = (0..n_diffs)
            .map(|i| {
                let tie = self.result[arena.sort_buf[i]] == self.result[arena.sort_buf[i + 1]];
                let margin = if self.p_draw == 0.0 {
                    0.0
                } else {
                    let a: f64 = self.teams[arena.sort_buf[i]]
                        .iter()
                        .map(|p| p.beta.powi(2))
                        .sum();
                    let b: f64 = self.teams[arena.sort_buf[i + 1]]
                        .iter()
                        .map(|p| p.beta.powi(2))
                        .sum();
                    compute_margin(self.p_draw, (a + b).sqrt())
                };
                let vid = arena.vars.alloc(N_INF);
                DiffFactor::Trunc(TruncFactor::new(vid, margin, tie))
            })
            .collect();

        // Per-team messages from neighbouring RankDiff factors (replaces TeamMessage).
        arena.lhood_lose.resize(n_teams, N_INF);
        arena.lhood_win.resize(n_teams, N_INF);

        let mut step = (f64::INFINITY, f64::INFINITY);
        let mut iter = 0;

        while tuple_gt(step, 1e-6) && iter < 10 {
            step = (0.0_f64, 0.0_f64);

            for (e, lf) in links[..n_diffs.saturating_sub(1)].iter_mut().enumerate() {
                let pw = arena.team_prior[e] * arena.lhood_lose[e];
                let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
                let raw = pw - pl;
                arena.vars.set(lf.diff(), raw * lf.msg());
                let d = lf.propagate(&mut arena.vars);
                step = tuple_max(step, d);

                let new_ll = pw - lf.msg();
                step = tuple_max(step, arena.lhood_lose[e + 1].delta(new_ll));
                arena.lhood_lose[e + 1] = new_ll;
            }

            for (rev_i, lf) in links[1..].iter_mut().rev().enumerate() {
                let e = n_diffs - 1 - rev_i;
                let pw = arena.team_prior[e] * arena.lhood_lose[e];
                let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
                let raw = pw - pl;
                arena.vars.set(lf.diff(), raw * lf.msg());
                let d = lf.propagate(&mut arena.vars);
                step = tuple_max(step, d);

                let new_lw = pl + lf.msg();
                step = tuple_max(step, arena.lhood_win[e].delta(new_lw));
                arena.lhood_win[e] = new_lw;
            }

            iter += 1;
        }

        if n_diffs == 1 {
            let raw = (arena.team_prior[0] * arena.lhood_lose[0])
                - (arena.team_prior[1] * arena.lhood_win[1]);
            arena.vars.set(links[0].diff(), raw * links[0].msg());
            links[0].propagate(&mut arena.vars);
        }

        if n_diffs > 0 {
            let pl1 = arena.team_prior[1] * arena.lhood_win[1];
            arena.lhood_win[0] = pl1 + links[0].msg();
            let pw_last = arena.team_prior[n_teams - 2] * arena.lhood_lose[n_teams - 2];
            arena.lhood_lose[n_teams - 1] = pw_last - links[n_diffs - 1].msg();
        }

        self.evidence = links.iter().map(|l| l.evidence()).product();
```

(Everything below the evidence line is unchanged.) Also remove the now-unused `use crate::factor::trunc::TruncFactor;` from the file's top imports if it becomes unused — but we still construct `TruncFactor` directly above, so it stays.

- [ ] **Step 5: Run the full lib test suite to verify the refactor preserves all golden values**

Run: `cargo test --lib`

Expected: all tests pass with **identical** assertions — this is a pure refactor.

- [ ] **Step 6: Run the integration tests**

Run: `cargo test`

Expected: all pass.

- [ ] **Step 7: Format and commit**

```bash
cargo +nightly fmt
git add src/game.rs
git commit -m "refactor(game): dispatch per-diff link factors via DiffFactor enum"
```

---

### Task 5: Add `score_sigma` to `GameOptions` and the scored path in `Game::likelihoods`

**Files:**
- Modify: `src/game.rs`

- [ ] **Step 1: Write a failing test for the scored path**

In `src/game.rs`'s test module, after the new dispatch test from Task 4, add:

```rust
#[test]
fn scored_path_sharper_when_margin_is_large() {
    // Same prior on both sides; large positive observed margin should pull
    // team A above team B.
    let prior = R::new(
        Gaussian::from_ms(25.0, 25.0 / 3.0),
        25.0 / 6.0,
        ConstantDrift(25.0 / 300.0),
    );
    let teams = vec![vec![prior], vec![prior]];
    let result = vec![10.0, 0.0]; // a beat b by 10
    let weights = [vec![1.0], vec![1.0]];
    let mut arena = ScratchArena::new();
    let g = Game::scored_with_arena(
        teams,
        &result,
        &weights,
        1.0, // score_sigma
        &mut arena,
    );
    let p = g.posteriors();
    let a = p[0][0];
    let b = p[1][0];
    assert!(a.mu() > b.mu(), "expected team a posterior mu > team b; got {} vs {}", a.mu(), b.mu());

    // Tighter score_sigma should produce a stronger update.
    let mut arena2 = ScratchArena::new();
    let g_tight = Game::scored_with_arena(
        vec![vec![prior], vec![prior]],
        &result,
        &weights,
        0.1, // tighter score_sigma
        &mut arena2,
    );
    let p_tight = g_tight.posteriors();
    let a_tight = p_tight[0][0];
    assert!(a_tight.mu() > a.mu(), "expected tighter sigma to push posterior further; {} vs {}", a_tight.mu(), a.mu());
}
```

- [ ] **Step 2: Run the test to verify it fails**

Run: `cargo test --lib game::tests::scored_path_sharper_when_margin_is_large`

Expected: FAIL — `no function or associated item named scored_with_arena`.

- [ ] **Step 3: Add `score_sigma` to `GameOptions`**

Replace the `GameOptions` definition (around `src/game.rs:15-28`):

```rust
#[derive(Clone, Copy, Debug)]
pub struct GameOptions {
    pub p_draw: f64,
    pub score_sigma: f64,
    pub convergence: crate::ConvergenceOptions,
}

impl Default for GameOptions {
    fn default() -> Self {
        Self {
            p_draw: crate::P_DRAW,
            score_sigma: 1.0,
            convergence: crate::ConvergenceOptions::default(),
        }
    }
}
```

- [ ] **Step 4: Add `Game::scored_with_arena` and friends**

In `Game<'a, T, D>`'s `impl` block (the one with `ranked_with_arena`, around `src/game.rs:90-133`), add a new method right after `ranked_with_arena`:

```rust
    pub(crate) fn scored_with_arena(
        teams: Vec<Vec<Rating<T, D>>>,
        scores: &'a [f64],
        weights: &'a [Vec<f64>],
        score_sigma: f64,
        arena: &mut ScratchArena,
    ) -> Self {
        debug_assert!(
            scores.len() == teams.len(),
            "scores must have the same length as teams"
        );
        debug_assert!(
            weights
                .iter()
                .zip(teams.iter())
                .all(|(w, t)| w.len() == t.len()),
            "weights must have the same dimensions as teams"
        );
        debug_assert!(score_sigma > 0.0, "score_sigma must be positive");

        let mut this = Self {
            teams,
            result: scores,
            weights,
            p_draw: 0.0,
            likelihoods: Vec::new(),
            evidence: 0.0,
        };

        this.likelihoods_scored(arena, score_sigma);
        this
    }
```

- [ ] **Step 5: Add `likelihoods_scored` (parallel to `likelihoods`)**

Right after `fn likelihoods` (around line 273), add:

```rust
    fn likelihoods_scored(&mut self, arena: &mut ScratchArena, score_sigma: f64) {
        arena.reset();

        let n_teams = self.teams.len();

        arena.sort_buf.extend(0..n_teams);
        arena.sort_buf.sort_by(|&i, &j| {
            self.result[j]
                .partial_cmp(&self.result[i])
                .unwrap_or(Ordering::Equal)
        });

        arena.team_prior.extend(arena.sort_buf.iter().map(|&t| {
            self.teams[t]
                .iter()
                .zip(self.weights[t].iter())
                .fold(N00, |p, (player, &w)| p + (player.performance() * w))
        }));

        let n_diffs = n_teams.saturating_sub(1);

        // One MarginFactor per adjacent sorted-team pair, observed m_obs ≥ 0.
        let mut links: Vec<DiffFactor> = (0..n_diffs)
            .map(|i| {
                let m_obs = self.result[arena.sort_buf[i]] - self.result[arena.sort_buf[i + 1]];
                let vid = arena.vars.alloc(N_INF);
                DiffFactor::Margin(MarginFactor::new(vid, m_obs, score_sigma))
            })
            .collect();

        arena.lhood_lose.resize(n_teams, N_INF);
        arena.lhood_win.resize(n_teams, N_INF);

        let mut step = (f64::INFINITY, f64::INFINITY);
        let mut iter = 0;

        while tuple_gt(step, 1e-6) && iter < 10 {
            step = (0.0_f64, 0.0_f64);

            for (e, lf) in links[..n_diffs.saturating_sub(1)].iter_mut().enumerate() {
                let pw = arena.team_prior[e] * arena.lhood_lose[e];
                let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
                let raw = pw - pl;
                arena.vars.set(lf.diff(), raw * lf.msg());
                let d = lf.propagate(&mut arena.vars);
                step = tuple_max(step, d);

                let new_ll = pw - lf.msg();
                step = tuple_max(step, arena.lhood_lose[e + 1].delta(new_ll));
                arena.lhood_lose[e + 1] = new_ll;
            }

            for (rev_i, lf) in links[1..].iter_mut().rev().enumerate() {
                let e = n_diffs - 1 - rev_i;
                let pw = arena.team_prior[e] * arena.lhood_lose[e];
                let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
                let raw = pw - pl;
                arena.vars.set(lf.diff(), raw * lf.msg());
                let d = lf.propagate(&mut arena.vars);
                step = tuple_max(step, d);

                let new_lw = pl + lf.msg();
                step = tuple_max(step, arena.lhood_win[e].delta(new_lw));
                arena.lhood_win[e] = new_lw;
            }

            iter += 1;
        }

        if n_diffs == 1 {
            let raw = (arena.team_prior[0] * arena.lhood_lose[0])
                - (arena.team_prior[1] * arena.lhood_win[1]);
            arena.vars.set(links[0].diff(), raw * links[0].msg());
            links[0].propagate(&mut arena.vars);
        }

        if n_diffs > 0 {
            let pl1 = arena.team_prior[1] * arena.lhood_win[1];
            arena.lhood_win[0] = pl1 + links[0].msg();
            let pw_last = arena.team_prior[n_teams - 2] * arena.lhood_lose[n_teams - 2];
            arena.lhood_lose[n_teams - 1] = pw_last - links[n_diffs - 1].msg();
        }

        self.evidence = links.iter().map(|l| l.evidence()).product();

        arena.inv_buf.resize(n_teams, 0);
        for (si, &orig_i) in arena.sort_buf.iter().enumerate() {
            arena.inv_buf[orig_i] = si;
        }

        self.likelihoods = self
            .teams
            .iter()
            .zip(self.weights.iter())
            .enumerate()
            .map(|(orig_i, (players, weights))| {
                let si = arena.inv_buf[orig_i];
                let m = arena.lhood_win[si] * arena.lhood_lose[si];
                let performance = players
                    .iter()
                    .zip(weights.iter())
                    .fold(N00, |p, (player, &w)| p + (player.performance() * w));
                players
                    .iter()
                    .zip(weights.iter())
                    .map(|(player, &w)| {
                        ((m - performance.exclude(player.performance() * w)) * (1.0 / w))
                            .forget(player.beta.powi(2))
                    })
                    .collect::<Vec<_>>()
            })
            .collect::<Vec<_>>();
    }
```

> The body is identical to `likelihoods` except for the per-pair factor construction (no draw-margin computation, `MarginFactor` instead of `TruncFactor`). DRY would let us extract the loop, but the duplication is small (~50 lines) and the divergence may grow as more factor kinds are added; we accept it for clarity. Revisit in T4-Synergy if it gets unwieldy.

- [ ] **Step 6: Run the test to verify it passes**

Run: `cargo test --lib game::tests::scored_path_sharper_when_margin_is_large`

Expected: PASS.

- [ ] **Step 7: Run the full test suite**

Run: `cargo test`

Expected: all pass.

- [ ] **Step 8: Format and commit**

```bash
cargo +nightly fmt
git add src/game.rs
git commit -m "feat(game): add scored_with_arena driving MarginFactor links"
```

---

### Task 6: Public `Game::scored` constructor and `OwnedGame` support

**Files:**
- Modify: `src/game.rs`

- [ ] **Step 1: Write a failing test in `src/game.rs`'s test module**

```rust
#[test]
fn game_scored_public_ctor() {
    use crate::Outcome;
    let prior = R::new(
        Gaussian::from_ms(25.0, 25.0 / 3.0),
        25.0 / 6.0,
        ConstantDrift(25.0 / 300.0),
    );
    let opts = GameOptions {
        score_sigma: 1.0,
        ..GameOptions::default()
    };
    let g = Game::scored(&[&[prior], &[prior]], Outcome::scores([8.0, 2.0]), &opts).unwrap();
    let p = g.posteriors();
    assert!(p[0][0].mu() > p[1][0].mu());
}

#[test]
fn game_scored_rejects_ranked_outcome() {
    let prior = R::new(
        Gaussian::from_ms(25.0, 25.0 / 3.0),
        25.0 / 6.0,
        ConstantDrift(25.0 / 300.0),
    );
    let err = Game::scored(
        &[&[prior], &[prior]],
        crate::Outcome::winner(0, 2),
        &GameOptions::default(),
    )
    .unwrap_err();
    assert!(matches!(err, crate::InferenceError::MismatchedShape { .. }));
}
```

- [ ] **Step 2: Run the tests to verify they fail**

Run: `cargo test --lib game::tests::game_scored_public_ctor game::tests::game_scored_rejects_ranked_outcome`

Expected: FAIL — `no function or associated item named scored`.

- [ ] **Step 3: Add `OwnedGame::new_scored` constructor**

In `OwnedGame<T, D>`'s impl (around `src/game.rs:46-78`), add right after `new`:

```rust
    pub(crate) fn new_scored(
        teams: Vec<Vec<Rating<T, D>>>,
        scores: Vec<f64>,
        weights: Vec<Vec<f64>>,
        score_sigma: f64,
    ) -> Self {
        let mut arena = ScratchArena::new();
        let g = Game::scored_with_arena(teams.clone(), &scores, &weights, score_sigma, &mut arena);
        let likelihoods = g.likelihoods;
        let evidence = g.evidence;
        Self {
            teams,
            result: scores,
            weights,
            p_draw: 0.0,
            likelihoods,
            evidence,
        }
    }
```

- [ ] **Step 4: Add `Game::scored` public method**

In the `impl<T: Time, D: Drift<T>> Game<'_, T, D>` block (around `src/game.rs:293-349`), add right after `ranked`:

```rust
    pub fn scored(
        teams: &[&[Rating<T, D>]],
        outcome: crate::Outcome,
        options: &GameOptions,
    ) -> Result<OwnedGame<T, D>, crate::InferenceError> {
        if options.score_sigma <= 0.0 {
            return Err(crate::InferenceError::InvalidProbability {
                value: options.score_sigma,
            });
        }
        if outcome.team_count() != teams.len() {
            return Err(crate::InferenceError::MismatchedShape {
                kind: "outcome scores vs teams",
                expected: teams.len(),
                got: outcome.team_count(),
            });
        }
        let scores = outcome
            .as_scores()
            .ok_or(crate::InferenceError::MismatchedShape {
                kind: "Game::scored requires Outcome::Scored",
                expected: 0,
                got: 0,
            })?
            .to_vec();
        let teams_owned: Vec<Vec<Rating<T, D>>> = teams.iter().map(|t| t.to_vec()).collect();
        let weights: Vec<Vec<f64>> = teams.iter().map(|t| vec![1.0; t.len()]).collect();
        Ok(OwnedGame::new_scored(teams_owned, scores, weights, options.score_sigma))
    }
```

- [ ] **Step 5: Run the new tests to verify they pass**

Run: `cargo test --lib game::tests::game_scored_public_ctor game::tests::game_scored_rejects_ranked_outcome`

Expected: both PASS.

- [ ] **Step 6: Run the full test suite**

Run: `cargo test`

Expected: all pass.

- [ ] **Step 7: Format and commit**

```bash
cargo +nightly fmt
git add src/game.rs
git commit -m "feat(game): add public Game::scored constructor"
```

---

### Task 7: Plumb `Outcome::Scored` through `TimeSlice` and `History::add_events`

**Files:**
- Modify: `src/time_slice.rs`
- Modify: `src/history.rs`

The per-event `Event` struct in `src/time_slice.rs:80-85` is `{ teams, evidence, weights }`. We add a `kind: EventKind` field that selects which `Game::*_with_arena` to call. Score noise (`score_sigma`) lives inside the `Scored` variant so events can in principle have per-event sigma, though the public API only exposes one history-wide knob today.

- [ ] **Step 1: Add `EventKind` to `src/time_slice.rs` and a `kind` field on `Event`**

In `src/time_slice.rs`, immediately above the `struct Event` definition (currently around line 80), add:

```rust
#[derive(Debug, Clone, Copy)]
pub(crate) enum EventKind {
    Ranked,
    Scored { score_sigma: f64 },
}
```

Modify `struct Event` (currently lines 81-85) to:

```rust
#[derive(Debug)]
pub(crate) struct Event {
    teams: Vec<Team>,
    evidence: f64,
    weights: Vec<Vec<f64>>,
    kind: EventKind,
}
```

- [ ] **Step 2: Dispatch on `kind` in `Event::iteration_direct`**

Replace the body of `Event::iteration_direct` (currently `src/time_slice.rs:123-144`):

```rust
    fn iteration_direct<T: Time, D: Drift<T>>(
        &mut self,
        skills: &mut SkillStore,
        agents: &CompetitorStore<T, D>,
        p_draw: f64,
        arena: &mut ScratchArena,
    ) {
        let teams = self.within_priors(false, false, skills, agents);
        let result = self.outputs();
        let g = match self.kind {
            EventKind::Ranked => {
                Game::ranked_with_arena(teams, &result, &self.weights, p_draw, arena)
            }
            EventKind::Scored { score_sigma } => {
                Game::scored_with_arena(teams, &result, &self.weights, score_sigma, arena)
            }
        };

        for (t, team) in self.teams.iter_mut().enumerate() {
            for (i, item) in team.items.iter_mut().enumerate() {
                let old_likelihood = skills.get(item.agent).unwrap().likelihood;
                let new_likelihood = (old_likelihood / item.likelihood) * g.likelihoods[t][i];
                skills.get_mut(item.agent).unwrap().likelihood = new_likelihood;
                item.likelihood = g.likelihoods[t][i];
            }
        }

        self.evidence = g.evidence;
    }
```

- [ ] **Step 3: Dispatch on `kind` in `TimeSlice::iteration` (sequential branch)**

Inside `TimeSlice::iteration` (currently `src/time_slice.rs:295-325`), replace the body of the `if from > 0 || self.color_groups.is_empty()` branch's inner `for event in ...` loop. The `Game::ranked_with_arena(...)` call (lines 302-308) becomes:

```rust
                let g = match event.kind {
                    EventKind::Ranked => Game::ranked_with_arena(
                        teams,
                        &result,
                        &event.weights,
                        self.p_draw,
                        &mut self.arena,
                    ),
                    EventKind::Scored { score_sigma } => Game::scored_with_arena(
                        teams,
                        &result,
                        &event.weights,
                        score_sigma,
                        &mut self.arena,
                    ),
                };
```

(The rest of that loop body — likelihood update + `event.evidence = g.evidence` — is unchanged.)

- [ ] **Step 4: Dispatch on `kind` in `TimeSlice::log_evidence`**

`TimeSlice::log_evidence` (currently `src/time_slice.rs:467-532`) calls `Game::ranked_with_arena` in three places (lines 482-490, 506-514). For each, change to a match on `event.kind` mirroring Step 2.

Add a helper inside the impl to keep the call sites tidy:

```rust
    fn run_event<D: Drift<T>>(
        &self,
        event: &Event,
        online: bool,
        forward: bool,
        agents: &CompetitorStore<T, D>,
        arena: &mut ScratchArena,
    ) -> f64 {
        let teams = event.within_priors(online, forward, &self.skills, agents);
        let result = event.outputs();
        match event.kind {
            EventKind::Ranked => {
                Game::ranked_with_arena(teams, &result, &event.weights, self.p_draw, arena).evidence
            }
            EventKind::Scored { score_sigma } => {
                Game::scored_with_arena(teams, &result, &event.weights, score_sigma, arena)
                    .evidence
            }
        }
    }
```

Then replace the inline `Game::ranked_with_arena(...).evidence.ln()` calls with `self.run_event(event, online, forward, agents, &mut arena).ln()`.

- [ ] **Step 5: Extend `TimeSlice::add_events` signature with per-event `kinds`**

Change the `add_events` signature (currently `src/time_slice.rs:203-209`) to:

```rust
    pub fn add_events<D: Drift<T>>(
        &mut self,
        composition: Vec<Vec<Vec<Index>>>,
        results: Vec<Vec<f64>>,
        weights: Vec<Vec<Vec<f64>>>,
        kinds: Vec<EventKind>,
        agents: &CompetitorStore<T, D>,
    ) {
```

Inside the same method, update the event-construction map (around line 240). Each constructed `Event` gets its kind from `kinds[e]`:

```rust
            Event {
                teams,
                evidence: 0.0,
                weights,
                kind: kinds[e],
            }
```

- [ ] **Step 6: Update `TimeSlice::add_events`'s tests to pass the new argument**

Three call sites in `src/time_slice.rs:604`, `:680`, `:759`, `:790`, `:855` (the unit tests `test_one_event_each`, `test_same_strength`, `test_add_events`, `time_slice_color_groups_reorders_events`) all call `time_slice.add_events(...)`. Add a fourth argument `vec![EventKind::Ranked; n_events]` between `weights` and `&agents` for each call. Example:

```rust
        time_slice.add_events(
            vec![
                vec![vec![a], vec![b]],
                vec![vec![c], vec![d]],
                vec![vec![e], vec![f]],
            ],
            vec![vec![1.0, 0.0], vec![0.0, 1.0], vec![1.0, 0.0]],
            vec![],
            vec![EventKind::Ranked; 3],
            &agents,
        );
```

- [ ] **Step 7: Update the `History` callers of `TimeSlice::add_events`**

In `src/history.rs:562` and `:572`, the calls pass `composition, results, weights, &self.agents`. Add the kinds vector. We'll thread the per-event `EventKind` through `add_events_with_prior` in Step 8 and pass it in here as `kinds_chunk`.

- [ ] **Step 8: Extend `History::add_events_with_prior` to accept and route per-event kinds**

In `src/history.rs:447-454`, change the signature to:

```rust
    pub(crate) fn add_events_with_prior(
        &mut self,
        composition: Vec<Vec<Vec<Index>>>,
        results: Vec<Vec<f64>>,
        times: Vec<T>,
        weights: Vec<Vec<Vec<f64>>>,
        kinds: Vec<crate::time_slice::EventKind>,
        mut priors: HashMap<Index, Rating<T, D>>,
    ) -> Result<(), InferenceError> {
```

Around line 543, alongside the existing per-batch slicing of `composition`, `results`, and `weights`, add:

```rust
            let kinds_chunk: Vec<crate::time_slice::EventKind> =
                (i..j).map(|e| kinds[o[e]]).collect();
```

Update the two `time_slice.add_events(composition, results, weights, &self.agents)` call sites (lines 562 and 572) to:

```rust
                time_slice.add_events(composition, results, weights, kinds_chunk, &self.agents);
```

(For both branches — existing-slice and new-slice. Use `kinds_chunk.clone()` if the borrow checker complains; the vec is small.)

Validation: also add a length check at the top of the function alongside the existing ones:

```rust
        if !kinds.is_empty() && kinds.len() != composition.len() {
            return Err(InferenceError::MismatchedShape {
                kind: "kinds",
                expected: composition.len(),
                got: kinds.len(),
            });
        }
```

- [ ] **Step 9: Update `record_winner` and `record_draw` to pass `kinds`**

In `src/history.rs:617-647`, update both calls:

```rust
        self.add_events_with_prior(
            vec![vec![vec![w], vec![l]]],
            vec![vec![1.0, 0.0]],
            vec![time],
            vec![],
            vec![crate::time_slice::EventKind::Ranked],
            HashMap::new(),
        )
```

Same shape for `record_draw`.

- [ ] **Step 10: Update `History::add_events` to compute kinds per event and pass through**

Replace the placeholder match arm added in Task 3 Step 5 (around `src/history.rs:672-680`). The full updated event-loop body of `History::add_events` (around lines 671-705) becomes:

```rust
        let mut kinds: Vec<crate::time_slice::EventKind> = Vec::with_capacity(events.len());

        for ev in events {
            let team_count = ev.teams.len();

            let (results_for_event, kind): (Vec<f64>, crate::time_slice::EventKind) = match &ev.outcome {
                Outcome::Ranked(ranks) => {
                    if ranks.len() != team_count {
                        return Err(InferenceError::MismatchedShape {
                            kind: "outcome ranks vs teams",
                            expected: team_count,
                            got: ranks.len(),
                        });
                    }
                    let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
                    let inverted: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();
                    (inverted, crate::time_slice::EventKind::Ranked)
                }
                Outcome::Scored(scores) => {
                    if scores.len() != team_count {
                        return Err(InferenceError::MismatchedShape {
                            kind: "outcome scores vs teams",
                            expected: team_count,
                            got: scores.len(),
                        });
                    }
                    (
                        scores.to_vec(),
                        crate::time_slice::EventKind::Scored {
                            score_sigma: self.score_sigma,
                        },
                    )
                }
            };

            let mut event_comp: Vec<Vec<Index>> = Vec::with_capacity(team_count);
            let mut event_weights: Vec<Vec<f64>> = Vec::with_capacity(team_count);

            for team in ev.teams {
                let mut team_indices: Vec<Index> = Vec::with_capacity(team.members.len());
                let mut team_weights: Vec<f64> = Vec::with_capacity(team.members.len());
                for member in team.members {
                    let idx = self.keys.get_or_create(&member.key);
                    team_indices.push(idx);
                    team_weights.push(member.weight);
                    if let Some(prior) = member.prior {
                        priors.insert(idx, Rating::new(prior, self.beta, self.drift));
                    }
                }
                event_comp.push(team_indices);
                event_weights.push(team_weights);
            }
            composition.push(event_comp);
            weights.push(event_weights);
            results.push(results_for_event);
            times.push(ev.time);
            kinds.push(kind);
        }

        self.add_events_with_prior(composition, results, times, weights, kinds, priors)
```

(Note `EventKind` needs to be re-exported from `time_slice`. Confirm `pub(crate) enum EventKind` in time_slice.rs is reachable from history.rs via `crate::time_slice::EventKind`.)

- [ ] **Step 11: Add `score_sigma: f64` field to `History` and `HistoryBuilder`**

In `src/history.rs:21-37` (`HistoryBuilder` struct), add field `score_sigma: f64,`.

In the `Default` impl (around line 121), set `score_sigma: 1.0`.

In `History::builder_with_key` (around line 170), set `score_sigma: 1.0`.

In each builder transition method that constructs a new `HistoryBuilder` (`drift` at line 55, `observer` at line 85), copy the `score_sigma` field through.

Add a builder method (insert near `p_draw`, around line 70):

```rust
    pub fn score_sigma(mut self, score_sigma: f64) -> Self {
        self.score_sigma = score_sigma;
        self
    }
```

In `HistoryBuilder::build` (around line 100), set `score_sigma: self.score_sigma,` on the constructed `History`.

In the `History` struct (around line 135), add `score_sigma: f64,`.

- [ ] **Step 12: Write a failing integration test in `tests/scored.rs` (new file)**

Create `tests/scored.rs`:

```rust
use smallvec::smallvec;
use trueskill_tt::{ConstantDrift, Event, History, Member, Outcome, Team};

#[test]
fn scored_two_team_one_event_pulls_winner_up() {
    let mut h = History::builder()
        .mu(25.0)
        .sigma(25.0 / 3.0)
        .beta(25.0 / 6.0)
        .drift(ConstantDrift(0.0))
        .build();

    let events: Vec<Event<i64, &'static str>> = vec![Event {
        time: 1,
        teams: smallvec![
            Team::with_members([Member::new("alice")]),
            Team::with_members([Member::new("bob")]),
        ],
        outcome: Outcome::scores([10.0, 0.0]),
    }];
    h.add_events(events).unwrap();
    h.converge().unwrap();

    let alice = h.current_skill(&"alice").unwrap();
    let bob = h.current_skill(&"bob").unwrap();
    assert!(alice.mu() > 25.0, "alice mu should exceed prior; got {}", alice.mu());
    assert!(bob.mu() < 25.0, "bob mu should be below prior; got {}", bob.mu());
}

#[test]
fn scored_zero_margin_treats_as_tie() {
    let mut h = History::builder()
        .mu(25.0)
        .sigma(25.0 / 3.0)
        .beta(25.0 / 6.0)
        .drift(ConstantDrift(0.0))
        .build();

    let events: Vec<Event<i64, &'static str>> = vec![Event {
        time: 1,
        teams: smallvec![
            Team::with_members([Member::new("alice")]),
            Team::with_members([Member::new("bob")]),
        ],
        outcome: Outcome::scores([3.0, 3.0]),
    }];
    h.add_events(events).unwrap();
    h.converge().unwrap();

    let alice = h.current_skill(&"alice").unwrap();
    let bob = h.current_skill(&"bob").unwrap();
    assert!((alice.mu() - bob.mu()).abs() < 1e-6, "tied scores -> equal mu; got {} vs {}", alice.mu(), bob.mu());
    // Sigma should still tighten (we have evidence diff ≈ 0).
    assert!(alice.sigma() < 25.0 / 3.0);
}

#[test]
fn scored_three_team_partial_order() {
    let mut h = History::builder()
        .mu(25.0)
        .sigma(25.0 / 3.0)
        .beta(25.0 / 6.0)
        .drift(ConstantDrift(0.0))
        .build();

    let events: Vec<Event<i64, &'static str>> = vec![Event {
        time: 1,
        teams: smallvec![
            Team::with_members([Member::new("a")]),
            Team::with_members([Member::new("b")]),
            Team::with_members([Member::new("c")]),
        ],
        outcome: Outcome::scores([20.0, 10.0, 5.0]),
    }];
    h.add_events(events).unwrap();
    h.converge().unwrap();

    let a = h.current_skill(&"a").unwrap();
    let b = h.current_skill(&"b").unwrap();
    let c = h.current_skill(&"c").unwrap();
    assert!(a.mu() > b.mu());
    assert!(b.mu() > c.mu());
}

#[test]
fn scored_rejects_outcome_team_count_mismatch() {
    use trueskill_tt::InferenceError;
    let mut h: History = History::builder().build();
    let events: Vec<Event<i64, &'static str>> = vec![Event {
        time: 1,
        teams: smallvec![
            Team::with_members([Member::new("a")]),
            Team::with_members([Member::new("b")]),
        ],
        outcome: Outcome::scores([1.0, 2.0, 3.0]),
    }];
    let err = h.add_events(events).unwrap_err();
    assert!(matches!(err, InferenceError::MismatchedShape { .. }));
}
```

- [ ] **Step 13: Run the integration tests**

Run: `cargo test --test scored`

Expected: all four tests PASS (the wiring from Steps 1–11 is now complete).

- [ ] **Step 14: Run the full test suite + clippy**

Run: `cargo test && cargo clippy --all-targets -- -D warnings`

Expected: all pass, no clippy warnings. Pay particular attention to the existing `time_slice` unit tests — they were updated in Step 6 and need to use `EventKind::Ranked`.

- [ ] **Step 15: Format and commit**

```bash
cargo +nightly fmt
git add src/history.rs src/time_slice.rs tests/scored.rs
git commit -m "feat(history): route Outcome::Scored events through MarginFactor path"
```

---

### Task 8: `EventBuilder::scores` convenience

**Files:**
- Modify: `src/event_builder.rs`
- Modify: `tests/api_shape.rs` (add a fluent-builder scored test)

- [ ] **Step 1: Write failing tests in `tests/api_shape.rs`**

Append to the existing test list:

```rust
#[test]
fn fluent_event_builder_scores() {
    use trueskill_tt::ConstantDrift;
    let mut h = History::builder()
        .mu(25.0)
        .sigma(25.0 / 3.0)
        .beta(25.0 / 6.0)
        .drift(ConstantDrift(0.0))
        .build();

    h.event(1)
        .team(["alice"])
        .team(["bob"])
        .scores([12.0, 4.0])
        .commit()
        .unwrap();
    h.converge().unwrap();

    let a = h.current_skill(&"alice").unwrap();
    let b = h.current_skill(&"bob").unwrap();
    assert!(a.mu() > b.mu());
}
```

- [ ] **Step 2: Run the test to verify it fails**

Run: `cargo test --test api_shape fluent_event_builder_scores`

Expected: FAIL — `no method named scores`.

- [ ] **Step 3: Add `.scores` to `EventBuilder`**

In `src/event_builder.rs`, alongside `.ranking`/`.winner`/`.draw` (around line 73), add:

```rust
    /// Set explicit per-team continuous scores; higher = better.
    pub fn scores<I: IntoIterator<Item = f64>>(mut self, scores: I) -> Self {
        self.event.outcome = crate::Outcome::scores(scores);
        self
    }
```

- [ ] **Step 4: Run the test to verify it passes**

Run: `cargo test --test api_shape fluent_event_builder_scores`

Expected: PASS.

- [ ] **Step 5: Run the full test suite**

Run: `cargo test`

Expected: all pass.

- [ ] **Step 6: Format and commit**

```bash
cargo +nightly fmt
git add src/event_builder.rs tests/api_shape.rs
git commit -m "feat(event-builder): add .scores convenience for Outcome::Scored"
```

---

### Task 9: Worked example — scored matches end-to-end

**Files:**
- Create: `examples/scored.rs`

- [ ] **Step 1: Create the example**

```rust
//! Worked example: continuous-score outcomes via `Outcome::Scored`.
//!
//! Three players play a small round-robin where the score margin matters,
//! not just who won. We show how `score_sigma` controls how much weight
//! the engine places on the observed margin.
//!
//! Run with: `cargo run --example scored --release`

use smallvec::smallvec;
use trueskill_tt::{ConstantDrift, Event, History, Member, Outcome, Team};

fn main() {
    let mut h = History::builder()
        .mu(25.0)
        .sigma(25.0 / 3.0)
        .beta(25.0 / 6.0)
        .drift(ConstantDrift(0.03))
        .score_sigma(2.0) // tune to data; smaller = trust margins more
        .build();

    let events: Vec<Event<i64, &'static str>> = vec![
        Event {
            time: 1,
            teams: smallvec![
                Team::with_members([Member::new("alice")]),
                Team::with_members([Member::new("bob")]),
            ],
            outcome: Outcome::scores([21.0, 9.0]),
        },
        Event {
            time: 2,
            teams: smallvec![
                Team::with_members([Member::new("bob")]),
                Team::with_members([Member::new("carol")]),
            ],
            outcome: Outcome::scores([21.0, 18.0]),
        },
        Event {
            time: 3,
            teams: smallvec![
                Team::with_members([Member::new("alice")]),
                Team::with_members([Member::new("carol")]),
            ],
            outcome: Outcome::scores([21.0, 21.0]),
        },
    ];
    h.add_events(events).unwrap();

    let report = h.converge().unwrap();
    println!(
        "converged={}, iterations={}, log_evidence={:.4}",
        report.converged, report.iterations, report.log_evidence
    );

    for who in &["alice", "bob", "carol"] {
        let s = h.current_skill(who).unwrap();
        println!("{:>6}: mu={:>7.3}  sigma={:.3}", who, s.mu(), s.sigma());
    }
}
```

- [ ] **Step 2: Confirm the example compiles and runs**

Run: `cargo run --example scored --release`

Expected: prints converged=true with three player skills; alice highest, bob middle, carol lowest (or close to bob — depends on `score_sigma`).

- [ ] **Step 3: Commit**

```bash
cargo +nightly fmt
git add examples/scored.rs
git commit -m "docs(examples): worked Outcome::Scored example"
```

---

### Task 10: Benchmark — scored ingestion + convergence

**Files:**
- Create: `benches/scored.rs`
- Modify: `Cargo.toml` (add `[[bench]]` entry if needed)

- [ ] **Step 1: Check `Cargo.toml` for the existing bench wiring**

Run: `cat Cargo.toml | grep -A 3 'bench'`

If `auto-bench = false` is set or each bench is registered explicitly, add a new entry:

```toml
[[bench]]
name = "scored"
harness = false
```

- [ ] **Step 2: Create `benches/scored.rs` modeled on `benches/batch.rs`**

```rust
use criterion::{Criterion, criterion_group, criterion_main};
use smallvec::smallvec;
use trueskill_tt::{ConstantDrift, Event, History, Member, Outcome, Team};

fn bench_scored_history(c: &mut Criterion) {
    c.bench_function("scored_history_60_events_30_iter", |bencher| {
        bencher.iter(|| {
            let mut h = History::builder()
                .mu(25.0)
                .sigma(25.0 / 3.0)
                .beta(25.0 / 6.0)
                .drift(ConstantDrift(0.03))
                .score_sigma(2.0)
                .build();

            let mut events: Vec<Event<i64, String>> = Vec::with_capacity(60);
            for i in 0..60 {
                let a = format!("p{}", i % 20);
                let b = format!("p{}", (i + 7) % 20);
                let s_a = (i as f64 * 0.3).sin().abs() * 21.0;
                let s_b = (i as f64 * 0.3).cos().abs() * 21.0;
                events.push(Event {
                    time: 1 + (i / 6) as i64,
                    teams: smallvec![
                        Team::with_members([Member::new(a)]),
                        Team::with_members([Member::new(b)]),
                    ],
                    outcome: Outcome::scores([s_a, s_b]),
                });
            }
            h.add_events(events).unwrap();
            h.converge().unwrap();
        });
    });
}

criterion_group!(benches, bench_scored_history);
criterion_main!(benches);
```

> The `History` here uses `String` keys to match the typical real-world bench shape; if `History<i64, _, _, String>` requires `builder_with_key`, adapt accordingly.

- [ ] **Step 3: Verify the benchmark compiles**

Run: `cargo bench --no-run --bench scored`

Expected: builds without error.

- [ ] **Step 4: Run the benchmark and capture a baseline number**

Run: `cargo bench --bench scored 2>&1 | tee benches/scored_baseline.txt`

(Save the result alongside the existing `benches/baseline.txt` so future tiers can compare.)

- [ ] **Step 5: Commit**

```bash
cargo +nightly fmt
git add benches/scored.rs benches/scored_baseline.txt Cargo.toml
git commit -m "bench(scored): add criterion bench mirroring batch bench"
```

---

### Task 11: Documentation — README + CLAUDE.md status update

**Files:**
- Modify: `README.md`
- Modify: `CLAUDE.md`
- Modify: `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md` (mark MarginFactor done)

- [ ] **Step 1: Add a "Scored outcomes" subsection to `README.md`**

Find the existing `## Usage` section (or equivalent) and add:

```markdown
### Scored outcomes

Use `Outcome::scores([...])` when you have continuous per-team scores rather
than just ranks. Adjacent score margins flow into a `MarginFactor` that adds
soft Gaussian evidence about the latent performance diff. Configure
`HistoryBuilder::score_sigma(σ)` to control how much you trust the margins
(smaller σ = more trust).

```rust
use trueskill_tt::{History, Outcome};

let mut h = History::builder().score_sigma(2.0).build();
h.event(1)
    .team(["alice"])
    .team(["bob"])
    .scores([21.0, 9.0])
    .commit()
    .unwrap();
h.converge().unwrap();
```
```

(Replace the backticks-surrounded fence indicators above (````rust` and `````) with proper triple backticks; the zero-width chars are there to avoid breaking *this* plan file's nesting.)

- [ ] **Step 2: Update `CLAUDE.md` architecture notes**

In `CLAUDE.md`, add to the existing factor list (or near the architecture section):

```
- `MarginFactor` (factor/margin.rs) — Gaussian observation factor on a diff variable; engaged by `Outcome::Scored`.
```

- [ ] **Step 3: Mark the T4-Margin item complete in the spec**

In `docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`, find the T4 section (line 577 onward):

```markdown
- `MarginFactor` → enables `Outcome::Scored`.
```

Change to:

```markdown
- `MarginFactor` → enables `Outcome::Scored`. **Done** (see `docs/superpowers/plans/2026-04-27-t4-margin-factor.md`).
```

- [ ] **Step 4: Final full test + clippy + fmt run**

Run:

```bash
cargo +nightly fmt
cargo clippy --all-targets -- -D warnings
cargo test
cargo bench --no-run
```

Expected: all green, no warnings, all bench targets compile.

- [ ] **Step 5: Commit**

```bash
git add README.md CLAUDE.md docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md
git commit -m "docs(t4-margin): document Outcome::Scored and mark spec item done"
```

---

## Acceptance criteria

- All existing lib + integration tests still pass with their existing golden values (Trunc path is bit-for-bit unchanged after the `DiffFactor` refactor in Task 4).
- `cargo test --test scored` passes all four tests added in Task 7.
- `cargo run --example scored --release` runs and prints sensible posteriors.
- `cargo bench --bench scored` produces a baseline result saved under `benches/`.
- `cargo clippy --all-targets -- -D warnings` is clean.
- `Outcome::Scored` is accepted by the public API: `History::add_events`, `History::event(...).scores(...)`, and `Game::scored`.
- `score_sigma` is configurable via `HistoryBuilder::score_sigma` and `GameOptions::score_sigma`, default `1.0`.

## Out of scope (deferred to later T4 plans)

- Damped / Residual schedules
- SynergyFactor
- ScoreFactor (continuous outcome variable distinct from observed margin)
- Per-event `score_sigma` overrides (currently history-wide)
- Tie-band MarginFactor variant (`m_obs` band rather than point observation)