docs: fix stale numerics in t4-margin-factor plan

The plan's prose quoted Z_cav ≈ 0.046827 and log_evidence ≈ -3.0613, which diverged from the values asserted by the shipped test in src/factor/mod.rs (-3.062235327364623). Update prose and the matching code comment to 0.04678 / -3.0622.
refactor: make BuiltinFactor::log_evidence match exhaustive
2026-05-08 14:37:58 +02:00 · 2026-05-08 14:37:13 +02:00 · 2026-05-08 14:36:35 +02:00 · 2026-05-08 14:28:10 +02:00 · 2026-05-08 14:24:48 +02:00
5 changed files with 616 additions and 142 deletions
@@ -49,7 +49,7 @@ A Gaussian `N(m, σ)` constructed via `Gaussian::from_ms(m, σ)`. Multiplication
 **Concrete numerical check for tests:** With cavity `N(0, 6)` and observation `m_obs=5, σ=1`:
 - `D_cav.pi = 1/36 ≈ 0.027778`, `D_cav.tau = 0`.
 - New marginal: `pi = 0.027778 + 1 = 1.027778`, `tau = 0 + 5 = 5`. So `mu = 5 / 1.027778 ≈ 4.864865`, `sigma = 1/sqrt(1.027778) ≈ 0.986394`.
- `Z_cav = pdf(5, 0, sqrt(36 + 1)) = pdf(5, 0, sqrt(37)) ≈ 0.046827`. So `log_evidence ≈ -3.0613`.
+- `Z_cav = pdf(5, 0, sqrt(36 + 1)) = pdf(5, 0, sqrt(37)) ≈ 0.04678`. So `log_evidence ≈ -3.0622`.

 ---

@@ -182,7 +182,7 @@ mod tests {

        f.propagate(&mut vars);
        let z = f.evidence_cached.unwrap();
-        // pdf(5, 0, sqrt(37)) ≈ 0.046827
+        // pdf(5, 0, sqrt(37)) ≈ 0.04678
        assert!((z - 0.04682752233851171).abs() < 1e-10);

        // Subsequent propagations don't change it.
@@ -0,0 +1,444 @@
+# Tech Debt Cleanup Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Land three independent post-T4-MarginFactor cleanups: dedupe `Game::likelihoods` and `Game::likelihoods_scored` via a `run_chain` helper, make `BuiltinFactor::log_evidence` exhaustive, and fix stale numerics in the T4 plan doc.
+
+**Architecture:** Pure code-shape and doc fixes. No public-API change, no behavioral change, no new dependencies. The dedup is a pure refactor — bit-equal posteriors and evidence against existing test goldens. The exhaustive match is a future-proofing change with no runtime effect. The doc fix is two number swaps in prose plus one matching code-comment swap.
+
+**Tech Stack:** Rust 2024, `cargo +nightly fmt`, `cargo clippy`, `cargo test --lib`.
+
+---
+
+## Spec reference
+
+`docs/superpowers/specs/2026-05-08-tech-debt-cleanup-design.md`
+
+## File map
+
+| File | Why touched |
+|---|---|
+| `src/game.rs` | Add `run_chain` helper; rewrite `likelihoods` and `likelihoods_scored` to call it |
+| `src/factor/mod.rs` | Make `BuiltinFactor::log_evidence` match exhaustive |
+| `docs/superpowers/plans/2026-04-27-t4-margin-factor.md` | Fix two stale prose numbers and one matching code comment |
+
+---
+
+### Task 1: Extract `run_chain` helper, dedupe both likelihoods methods
+
+**Files:**
+- Modify: `src/game.rs:236-485` (replace both `likelihoods` and `likelihoods_scored` with one helper + two thin callers)
+
+**Context for the implementer (read this before touching anything):**
+
+`OwnedGame<T, D>` (defined at `src/game.rs:83-92`) holds `teams`, `result`, `weights`, `p_draw`, plus mutable output fields `likelihoods: Vec<Vec<Gaussian>>` and `evidence: f64`. Two private methods on `Game<'a, T, D>` (the borrowed sibling at `src/game.rs:148-156`) compute likelihoods:
+
+- `likelihoods(&mut self, arena: &mut ScratchArena)` — ranked outcomes; `src/game.rs:236-371`
+- `likelihoods_scored(&mut self, arena: &mut ScratchArena, score_sigma: f64)` — scored outcomes; `src/game.rs:373-485`
+
+The two are bit-identical except for the closure that builds the per-diff `DiffFactor` (defined at `src/game.rs:20-54`). `DiffFactor` has two variants: `Trunc(TruncFactor)` for ranked, `Margin(MarginFactor)` for scored.
+
+The shared body does, in order: `arena.reset()`, sort teams descending by `result` into `arena.sort_buf`, fill `arena.team_prior`, build `links: Vec<DiffFactor>` (the differing block), resize `arena.lhood_lose` / `arena.lhood_win` to `N_INF`, run a forward+backward sweep with a max-iter-10 fixed-point loop guarded by `tuple_gt(step, 1e-6)`, handle the `n_diffs == 1` special case, do boundary updates, multiply per-diff `evidence()` into `self.evidence`, build the inverse permutation in `arena.inv_buf`, then build `self.likelihoods` from the per-team `lhood_win * lhood_lose` and per-player `performance().exclude(...).forget(beta²)` math.
+
+**Refactor target:**
+
+```rust
+fn run_chain<F>(
+    &self,
+    arena: &mut ScratchArena,
+    mut make_link: F,
+) -> (f64, Vec<Vec<Gaussian>>)
+where
+    F: FnMut(usize, &[usize], &mut crate::factor::VarStore) -> DiffFactor,
+{ /* the entire shared body, returning (evidence, likelihoods) */ }
+```
+
+Helper takes `&self` (not `&mut self`) so the closure can capture `&self.result`, `&self.teams`, `&self.weights`, `&self.p_draw` without conflicting with the helper's own immutable borrow. The arena is borrowed `&mut` independently.
+
+The closure is invoked once per diff index `i ∈ 0..n_diffs`, after `arena.sort_buf` is filled. It receives `i`, `&arena.sort_buf[..]`, and `&mut arena.vars` so it can `alloc(N_INF)` the diff `VarId`. It returns the constructed `DiffFactor`.
+
+The two callers shrink to:
+
+```rust
+fn likelihoods(&mut self, arena: &mut ScratchArena) {
+    let p_draw = self.p_draw;
+    let result = &self.result;
+    let teams = &self.teams;
+    let (evidence, likelihoods) = Self::dummy_to_satisfy_borrowck(/* see below */);
+    // ... assigns self.evidence and self.likelihoods
+}
+```
+
+Wait — actually borrow-checker note: calling `self.run_chain(arena, |i, sort_buf, vars| { use_self_fields })` from a `&mut self` method is **fine** because `run_chain` takes `&self` and the closure captures `&self` immutably. Both share an immutable reborrow of `*self`. The arena is a separate `&mut` borrow. Verify the implementer doesn't accidentally make `run_chain` take `&mut self`.
+
+**Why a closure (not a trait, not a two-phase build).** A closure keeps caller-specific state (`p_draw`, `score_sigma`, beta sums) inline at the call site with zero ceremony. A trait would require a stateful builder per call. A two-phase build (caller produces `Vec<DiffFactor>` first, helper does the rest) would either re-do the sort or split arena ownership awkwardly between the phases.
+
+---
+
+- [ ] **Step 1: Run the existing test suite to capture the baseline**
+
+Run: `cargo test --lib`
+
+Expected: all tests pass. Note the count (should be 88+ lib tests) — the refactor must keep this number unchanged with all green.
+
+- [ ] **Step 2: Open `src/game.rs` and add the `run_chain` helper**
+
+Inside `impl<'a, T: Time, D: Drift<T>> Game<'a, T, D> { ... }` (the block starting at `src/game.rs:158`), add `run_chain` immediately above the existing `likelihoods` method (so above line 236). Use exactly this body — it is the merge of the two existing methods with the differing block replaced by the closure call:
+
+```rust
+fn run_chain<F>(
+    &self,
+    arena: &mut ScratchArena,
+    mut make_link: F,
+) -> (f64, Vec<Vec<Gaussian>>)
+where
+    F: FnMut(usize, &[usize], &mut crate::factor::VarStore) -> DiffFactor,
+{
+    arena.reset();
+
+    let n_teams = self.teams.len();
+
+    arena.sort_buf.extend(0..n_teams);
+    arena.sort_buf.sort_by(|&i, &j| {
+        self.result[j]
+            .partial_cmp(&self.result[i])
+            .unwrap_or(Ordering::Equal)
+    });
+
+    arena.team_prior.extend(arena.sort_buf.iter().map(|&t| {
+        self.teams[t]
+            .iter()
+            .zip(self.weights[t].iter())
+            .fold(N00, |p, (player, &w)| p + (player.performance() * w))
+    }));
+
+    let n_diffs = n_teams.saturating_sub(1);
+
+    let mut links: Vec<DiffFactor> = (0..n_diffs)
+        .map(|i| make_link(i, &arena.sort_buf, &mut arena.vars))
+        .collect();
+
+    arena.lhood_lose.resize(n_teams, N_INF);
+    arena.lhood_win.resize(n_teams, N_INF);
+
+    let mut step = (f64::INFINITY, f64::INFINITY);
+    let mut iter = 0;
+
+    while tuple_gt(step, 1e-6) && iter < 10 {
+        step = (0.0_f64, 0.0_f64);
+
+        for (e, lf) in links[..n_diffs.saturating_sub(1)].iter_mut().enumerate() {
+            let pw = arena.team_prior[e] * arena.lhood_lose[e];
+            let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
+            let raw = pw - pl;
+            arena.vars.set(lf.diff(), raw * lf.msg());
+            let d = lf.propagate(&mut arena.vars);
+            step = tuple_max(step, d);
+
+            let new_ll = pw - lf.msg();
+            step = tuple_max(step, arena.lhood_lose[e + 1].delta(new_ll));
+            arena.lhood_lose[e + 1] = new_ll;
+        }
+
+        for (rev_i, lf) in links[1..].iter_mut().rev().enumerate() {
+            let e = n_diffs - 1 - rev_i;
+            let pw = arena.team_prior[e] * arena.lhood_lose[e];
+            let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
+            let raw = pw - pl;
+            arena.vars.set(lf.diff(), raw * lf.msg());
+            let d = lf.propagate(&mut arena.vars);
+            step = tuple_max(step, d);
+
+            let new_lw = pl + lf.msg();
+            step = tuple_max(step, arena.lhood_win[e].delta(new_lw));
+            arena.lhood_win[e] = new_lw;
+        }
+
+        iter += 1;
+    }
+
+    if n_diffs == 1 {
+        let raw = (arena.team_prior[0] * arena.lhood_lose[0])
+            - (arena.team_prior[1] * arena.lhood_win[1]);
+        arena.vars.set(links[0].diff(), raw * links[0].msg());
+        links[0].propagate(&mut arena.vars);
+    }
+
+    if n_diffs > 0 {
+        let pl1 = arena.team_prior[1] * arena.lhood_win[1];
+        arena.lhood_win[0] = pl1 + links[0].msg();
+        let pw_last = arena.team_prior[n_teams - 2] * arena.lhood_lose[n_teams - 2];
+        arena.lhood_lose[n_teams - 1] = pw_last - links[n_diffs - 1].msg();
+    }
+
+    let evidence: f64 = links.iter().map(|l| l.evidence()).product();
+
+    arena.inv_buf.resize(n_teams, 0);
+    for (si, &orig_i) in arena.sort_buf.iter().enumerate() {
+        arena.inv_buf[orig_i] = si;
+    }
+
+    let likelihoods = self
+        .teams
+        .iter()
+        .zip(self.weights.iter())
+        .enumerate()
+        .map(|(orig_i, (players, weights))| {
+            let si = arena.inv_buf[orig_i];
+            let m = arena.lhood_win[si] * arena.lhood_lose[si];
+            let performance = players
+                .iter()
+                .zip(weights.iter())
+                .fold(N00, |p, (player, &w)| p + (player.performance() * w));
+            players
+                .iter()
+                .zip(weights.iter())
+                .map(|(player, &w)| {
+                    ((m - performance.exclude(player.performance() * w)) * (1.0 / w))
+                        .forget(player.beta.powi(2))
+                })
+                .collect::<Vec<_>>()
+        })
+        .collect::<Vec<_>>();
+
+    (evidence, likelihoods)
+}
+```
+
+- [ ] **Step 3: Replace `likelihoods` body with a thin caller**
+
+In `src/game.rs`, replace the entire body of `fn likelihoods(&mut self, arena: &mut ScratchArena)` (currently lines 236-371 — replace from the opening `{` to the closing `}` of that method) with:
+
+```rust
+fn likelihoods(&mut self, arena: &mut ScratchArena) {
+    let p_draw = self.p_draw;
+    // Capture pointers to fields the closure reads, to keep borrow scopes tight.
+    // Closure captures &self.result and &self.teams (both immutable) and the
+    // &mut arena passed in via run_chain — disjoint from `&self`.
+    let (evidence, likelihoods) = self.run_chain(arena, |i, sort_buf, vars| {
+        let tie = self.result[sort_buf[i]] == self.result[sort_buf[i + 1]];
+        let margin = if p_draw == 0.0 {
+            0.0
+        } else {
+            let a: f64 = self.teams[sort_buf[i]]
+                .iter()
+                .map(|p| p.beta.powi(2))
+                .sum();
+            let b: f64 = self.teams[sort_buf[i + 1]]
+                .iter()
+                .map(|p| p.beta.powi(2))
+                .sum();
+            compute_margin(p_draw, (a + b).sqrt())
+        };
+        let vid = vars.alloc(N_INF);
+        DiffFactor::Trunc(TruncFactor::new(vid, margin, tie))
+    });
+    self.evidence = evidence;
+    self.likelihoods = likelihoods;
+}
+```
+
+(Capturing `p_draw` as a local binding before the closure avoids a `self.p_draw` borrow inside; it's a `Copy` `f64` so this is free.)
+
+- [ ] **Step 4: Replace `likelihoods_scored` body with a thin caller**
+
+In `src/game.rs`, replace the entire body of `fn likelihoods_scored(&mut self, arena: &mut ScratchArena, score_sigma: f64)` (currently lines 373-485) with:
+
+```rust
+fn likelihoods_scored(&mut self, arena: &mut ScratchArena, score_sigma: f64) {
+    let (evidence, likelihoods) = self.run_chain(arena, |i, sort_buf, vars| {
+        // After descending-by-score sort, m_obs >= 0 for every adjacent pair.
+        let m_obs = self.result[sort_buf[i]] - self.result[sort_buf[i + 1]];
+        let vid = vars.alloc(N_INF);
+        DiffFactor::Margin(MarginFactor::new(vid, m_obs, score_sigma))
+    });
+    self.evidence = evidence;
+    self.likelihoods = likelihoods;
+}
+```
+
+- [ ] **Step 5: Build to confirm it compiles**
+
+Run: `cargo build`
+
+Expected: compiles cleanly. If the borrow checker complains that the closure conflicts with `self.run_chain(...)`, the most likely cause is `run_chain` accidentally being `&mut self` — confirm its signature is `fn run_chain<F>(&self, arena: &mut ScratchArena, mut make_link: F) -> (f64, Vec<Vec<Gaussian>>)`. If that's correct and there's still a conflict, double-check the closure's captures: it should capture `&self.result` and `&self.teams` (immutable), `p_draw: f64` by value (Copy), and `score_sigma: f64` by value (Copy). It must NOT touch `&mut self` in any form.
+
+- [ ] **Step 6: Run the full library test suite — must be all green, same count as Step 1**
+
+Run: `cargo test --lib`
+
+Expected: same number of tests as Step 1, all pass. Bit-equal goldens — every existing assertion (`test_1vs1`, `test_1vs1_draw`, `test_2vs1vs2_mixed`, MarginFactor end-to-end tests, etc.) must pass unchanged. If ANY test fails, the refactor is wrong; revert and re-inspect.
+
+- [ ] **Step 7: Run integration tests too**
+
+Run: `cargo test`
+
+Expected: all integration tests pass (28 noted in commit `8b53cac`).
+
+- [ ] **Step 8: Format and lint**
+
+Run: `cargo +nightly fmt && cargo clippy --lib -- -D warnings`
+
+Expected: no diffs from fmt, no clippy warnings.
+
+- [ ] **Step 9: Commit**
+
+```bash
+git add src/game.rs
+git commit -m "$(cat <<'EOF'
+refactor: dedupe Game::likelihoods and likelihoods_scored via run_chain
+
+Both methods were 95-line near-duplicates differing only in the closure
+that builds the per-diff DiffFactor. Extract the shared body as a
+private run_chain<F>(&self, arena, make_link) helper that returns
+(evidence, likelihoods); the two callers shrink to ~10 lines each.
+
+Pure code-shape change: posteriors and evidence remain bit-equal; all
+existing tests (lib + integration) pass unchanged.
+EOF
+)"
+```
+
+---
+
+### Task 2: Make `BuiltinFactor::log_evidence` match exhaustive
+
+**Files:**
+- Modify: `src/factor/mod.rs:94-100` (the `log_evidence` impl on `BuiltinFactor`)
+
+- [ ] **Step 1: Open `src/factor/mod.rs` and replace the `log_evidence` body**
+
+Replace the existing impl:
+
+```rust
+fn log_evidence(&self, vars: &VarStore) -> f64 {
+    match self {
+        Self::Trunc(f) => f.log_evidence(vars),
+        Self::Margin(f) => f.log_evidence(vars),
+        _ => 0.0,
+    }
+}
+```
+
+with:
+
+```rust
+fn log_evidence(&self, vars: &VarStore) -> f64 {
+    match self {
+        Self::Trunc(f) => f.log_evidence(vars),
+        Self::Margin(f) => f.log_evidence(vars),
+        Self::TeamSum(_) | Self::RankDiff(_) => 0.0,
+    }
+}
+```
+
+- [ ] **Step 2: Build and run tests**
+
+Run: `cargo build && cargo test --lib`
+
+Expected: compiles cleanly, all tests pass. Behavior is unchanged — `TeamSum` and `RankDiff` still return `0.0`, but a future variant will now produce a non-exhaustive-match error instead of being silently swallowed.
+
+- [ ] **Step 3: Format and lint**
+
+Run: `cargo +nightly fmt && cargo clippy --lib -- -D warnings`
+
+Expected: no diffs, no warnings.
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add src/factor/mod.rs
+git commit -m "$(cat <<'EOF'
+refactor: make BuiltinFactor::log_evidence match exhaustive
+
+Replace the `_ => 0.0` wildcard with explicit
+`Self::TeamSum(_) | Self::RankDiff(_) => 0.0`. No behavioral change;
+future variants now produce a compile error instead of being silently
+absorbed by the wildcard.
+EOF
+)"
+```
+
+---
+
+### Task 3: Fix stale numerics in T4 plan doc
+
+**Files:**
+- Modify: `docs/superpowers/plans/2026-04-27-t4-margin-factor.md` (lines 52 and 185)
+
+The shipped test in `src/factor/mod.rs:163,166` asserts:
+
+```
+assert!((result.mu() - 4.864864864864865).abs() < 1e-12);
+assert!((logz - (-3.062235327364623)).abs() < 1e-10);
+```
+
+The plan's prose at line 52 quotes pre-shipped values that no longer match. This task fixes the prose and the matching code-comment. The full-precision assertion blocks elsewhere in the plan are out of scope (they belong to the plan-as-written, and the spec's fix table only listed the rounded prose values).
+
+- [ ] **Step 1: Update the prose at line 52**
+
+Open `docs/superpowers/plans/2026-04-27-t4-margin-factor.md`. Find the line:
+
+```
+- `Z_cav = pdf(5, 0, sqrt(36 + 1)) = pdf(5, 0, sqrt(37)) ≈ 0.046827`. So `log_evidence ≈ -3.0613`.
+```
+
+Replace with:
+
+```
+- `Z_cav = pdf(5, 0, sqrt(36 + 1)) = pdf(5, 0, sqrt(37)) ≈ 0.04678`. So `log_evidence ≈ -3.0622`.
+```
+
+- [ ] **Step 2: Update the matching code-comment at line 185**
+
+In the same file, find:
+
+```
+        // pdf(5, 0, sqrt(37)) ≈ 0.046827
+```
+
+Replace with:
+
+```
+        // pdf(5, 0, sqrt(37)) ≈ 0.04678
+```
+
+- [ ] **Step 3: Verify nothing else changed**
+
+Run: `git diff docs/superpowers/plans/2026-04-27-t4-margin-factor.md`
+
+Expected: exactly three lines changed (one prose line containing both numbers, one comment line). Nothing else should be touched.
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add docs/superpowers/plans/2026-04-27-t4-margin-factor.md
+git commit -m "$(cat <<'EOF'
+docs: fix stale numerics in t4-margin-factor plan
+
+The plan's prose quoted Z_cav ≈ 0.046827 and log_evidence ≈ -3.0613,
+which diverged from the values asserted by the shipped test in
+src/factor/mod.rs (-3.062235327364623). Update prose and the matching
+code comment to 0.04678 / -3.0622.
+EOF
+)"
+```
+
+---
+
+## Self-review (writer's note)
+
+Spec coverage:
+- Spec Item 1 (dedupe `likelihoods`/`likelihoods_scored`) → Task 1 ✓
+- Spec Item 2 (exhaustive `BuiltinFactor::log_evidence`) → Task 2 ✓
+- Spec Item 3 (stale numerics in T4 plan) → Task 3 ✓
+- Spec out-of-scope items (`DiffFactor` collapse, per-event `score_sigma`) — correctly absent ✓
+
+Verification gates per the spec ("each item commits independently and ships behind a green `cargo test --lib`"): every task ends in fmt + clippy + tests + commit. Task 1 additionally runs `cargo test` for integration coverage.
+
+Type / signature consistency:
+- `run_chain` signature appears identically in the context header and Step 2 body ✓
+- Closure type `FnMut(usize, &[usize], &mut crate::factor::VarStore) -> DiffFactor` matches across Step 2 (definition) and Steps 3/4 (call sites) ✓
+- `DiffFactor::Trunc` / `DiffFactor::Margin` constructors match `src/game.rs:20-23` definitions ✓
+
+No placeholders detected.
@@ -0,0 +1,134 @@
+# Tech Debt Cleanup — Post-T4-MarginFactor
+
+## Summary
+
+Three small, independent cleanups left behind by the T4-MarginFactor merge
+(`8b53cac`). All three are pure code-shape or doc fixes. No public-API change,
+no numerics change, no new behavior.
+
+This batch deliberately excludes the `DiffFactor` ↔ `BuiltinFactor` overlap
+collapse (architectural change kept separate) and per-event `score_sigma`
+override (a feature, not debt).
+
+## Scope
+
+### Item 1 — Deduplicate `Game::likelihoods` and `Game::likelihoods_scored`
+
+**Current state.** `src/game.rs:236-371` and `src/game.rs:373-485` are 95-line
+near-duplicates of each other. They differ in exactly one block: the closure
+that maps a diff index to a `DiffFactor`. The ranked path builds
+`DiffFactor::Trunc(TruncFactor::new(vid, margin, tie))` with `margin`/`tie`
+derived from `p_draw` and adjacent-result equality. The scored path builds
+`DiffFactor::Margin(MarginFactor::new(vid, m_obs, score_sigma))` with `m_obs`
+the observed score gap. Everything else — sort, `team_prior`, sweep loop,
+boundary updates, evidence product, posterior `likelihoods` — is bit-identical.
+
+**Refactor.** Extract a private helper on `OwnedGame<T, D>`:
+
+```rust
+fn run_chain<F>(
+    &self,
+    arena: &mut ScratchArena,
+    make_link: F,
+) -> (f64, Vec<Vec<Gaussian>>)
+where
+    F: FnMut(usize, &[usize], &mut VarStore) -> DiffFactor,
+```
+
+The closure receives the diff index `i`, the descending-by-result sort
+permutation `&arena.sort_buf`, and `&mut arena.vars` for `alloc(N_INF)`. It
+returns the `DiffFactor` for that diff slot.
+
+The helper takes `&self` (not `&mut self`) and returns
+`(evidence, likelihoods)`. Each caller writes the results back to its own
+`self.evidence` and `self.likelihoods` fields. The `&self` choice matters: the
+closure captures `&self.result` / `&self.teams` / `&self.weights` / `&self.p_draw`
+freely without conflicting with the helper's own immutable borrow.
+
+The two public methods shrink from ~125 lines each to ~10 lines that just
+construct the closure.
+
+**Why a closure (not a trait or two-phase build).** A closure keeps all
+caller-specific state (`p_draw`, `score_sigma`, beta sums for margin) inline at
+the call site. A trait would require a stateful object per call; a two-phase
+build (caller produces the `Vec<DiffFactor>` first, helper does the rest) would
+either re-do the sort or split state ownership awkwardly between phases.
+
+### Item 2 — Make `BuiltinFactor::log_evidence` exhaustive
+
+**Current state.** `src/factor/mod.rs:94-100` uses a `_ => 0.0` wildcard for
+`TeamSum` and `RankDiff`. When a future variant lands (e.g. `SynergyFactor`),
+the wildcard silently absorbs it instead of forcing a deliberate decision.
+
+**Refactor.**
+
+```rust
+fn log_evidence(&self, vars: &VarStore) -> f64 {
+    match self {
+        Self::Trunc(f) => f.log_evidence(vars),
+        Self::Margin(f) => f.log_evidence(vars),
+        Self::TeamSum(_) | Self::RankDiff(_) => 0.0,
+    }
+}
+```
+
+No behavioral change. Future variants now produce a non-exhaustive-match
+compile error.
+
+### Item 3 — Fix stale numerics in T4 plan doc
+
+**Current state.** `docs/superpowers/plans/2026-04-27-t4-margin-factor.md`
+contains two numbers that diverge from the values asserted by the shipped test
+in `src/factor/mod.rs:163,166`.
+
+**Fix.**
+
+| Doc value (wrong) | Implementation value (correct) |
+|---|---|
+| `0.046827` | `0.04678` |
+| `-3.0613` | `-3.0622` |
+
+Pure docs change. Verified by reading the asserted constants in the test.
+
+## Out of scope
+
+- **`DiffFactor` ↔ `BuiltinFactor` overlap.** Both enums list `Trunc` and
+  `Margin` variants. Collapsing into `BuiltinFactor::Diff(DiffFactor)` is
+  defensible but is an architectural change that wants its own design pass.
+  `DiffFactor` represents a real semantic subset (factors that operate on a
+  diff variable in a chain); the duplication is two enum variants, not a
+  large block of code.
+- **Per-event `EventKind::Scored.score_sigma` override.** Today
+  `score_sigma` is history-wide (set on `HistoryBuilder::score_sigma`). A
+  per-event override is a real feature ask, not tech debt.
+
+## Verification
+
+Each item commits independently and ships behind a green `cargo test --lib`
+run. The dedup is a pure code-shape change: posteriors and evidence must be
+**bit-equal** (not ULP-bounded) against the existing 88+28 test goldens.
+
+Per-item gate before committing:
+
+```bash
+cargo +nightly fmt
+cargo clippy
+cargo test --lib
+```
+
+## Commit shape
+
+Three commits, one per item, each independently revertable:
+
+1. `refactor: dedupe Game::likelihoods and likelihoods_scored via run_chain`
+2. `refactor: make BuiltinFactor::log_evidence match exhaustive`
+3. `docs: fix stale numerics in t4-margin-factor plan`
+
+## Risks
+
+- **Borrow-checker friction in Item 1.** The closure captures fields of
+  `&self` while the helper iterates over arena state. Mitigation: helper is
+  `&self` (not `&mut self`); arena passed as `&mut ScratchArena` separately.
+  Disjoint borrows.
+- **Compile error in Item 2 if a new variant ships before this lands.**
+  Trivial follow-on; the whole point is to surface that signal.
@@ -95,7 +95,7 @@ impl Factor for BuiltinFactor {
        match self {
            Self::Trunc(f) => f.log_evidence(vars),
            Self::Margin(f) => f.log_evidence(vars),
-            _ => 0.0,
+            Self::TeamSum(_) | Self::RankDiff(_) => 0.0,
        }
    }
 }
@@ -233,12 +233,14 @@ impl<'a, T: Time, D: Drift<T>> Game<'a, T, D> {
        this
    }

-    fn likelihoods(&mut self, arena: &mut ScratchArena) {
+    fn run_chain<F>(&self, arena: &mut ScratchArena, mut make_link: F) -> (f64, Vec<Vec<Gaussian>>)
+    where
+        F: FnMut(usize, &[usize], &mut crate::factor::VarStore) -> DiffFactor,
+    {
        arena.reset();

        let n_teams = self.teams.len();

-        // Sort teams by result descending; reuse arena.sort_buf to avoid allocation.
        arena.sort_buf.extend(0..n_teams);
        arena.sort_buf.sort_by(|&i, &j| {
            self.result[j]
@@ -246,7 +248,6 @@ impl<'a, T: Time, D: Drift<T>> Game<'a, T, D> {
                .unwrap_or(Ordering::Equal)
        });

-        // Team performance priors written into arena buffer (capacity reused across games).
        arena.team_prior.extend(arena.sort_buf.iter().map(|&t| {
            self.teams[t]
                .iter()
@@ -256,30 +257,10 @@ impl<'a, T: Time, D: Drift<T>> Game<'a, T, D> {

        let n_diffs = n_teams.saturating_sub(1);

-        // One DiffFactor per adjacent sorted-team pair; each owns a diff VarId.
-        // links stays local (fresh state per game; Vec capacity is typically small).
        let mut links: Vec<DiffFactor> = (0..n_diffs)
-            .map(|i| {
-                let tie = self.result[arena.sort_buf[i]] == self.result[arena.sort_buf[i + 1]];
-                let margin = if self.p_draw == 0.0 {
-                    0.0
-                } else {
-                    let a: f64 = self.teams[arena.sort_buf[i]]
-                        .iter()
-                        .map(|p| p.beta.powi(2))
-                        .sum();
-                    let b: f64 = self.teams[arena.sort_buf[i + 1]]
-                        .iter()
-                        .map(|p| p.beta.powi(2))
-                        .sum();
-                    compute_margin(self.p_draw, (a + b).sqrt())
-                };
-                let vid = arena.vars.alloc(N_INF);
-                DiffFactor::Trunc(TruncFactor::new(vid, margin, tie))
-            })
+            .map(|i| make_link(i, &arena.sort_buf, &mut arena.vars))
            .collect();

-        // Per-team messages from neighbouring RankDiff factors (replaces TeamMessage).
        arena.lhood_lose.resize(n_teams, N_INF);
        arena.lhood_win.resize(n_teams, N_INF);

@@ -289,7 +270,6 @@ impl<'a, T: Time, D: Drift<T>> Game<'a, T, D> {
        while tuple_gt(step, 1e-6) && iter < 10 {
            step = (0.0_f64, 0.0_f64);

-            // Forward sweep: diffs 0 .. n_diffs-2 (all but the last).
            for (e, lf) in links[..n_diffs.saturating_sub(1)].iter_mut().enumerate() {
                let pw = arena.team_prior[e] * arena.lhood_lose[e];
                let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
@@ -303,7 +283,6 @@ impl<'a, T: Time, D: Drift<T>> Game<'a, T, D> {
                arena.lhood_lose[e + 1] = new_ll;
            }

-            // Backward sweep: diffs n_diffs-1 .. 1 (reverse, all but the first).
            for (rev_i, lf) in links[1..].iter_mut().rev().enumerate() {
                let e = n_diffs - 1 - rev_i;
                let pw = arena.team_prior[e] * arena.lhood_lose[e];
@@ -337,8 +316,7 @@ impl<'a, T: Time, D: Drift<T>> Game<'a, T, D> {
            arena.lhood_lose[n_teams - 1] = pw_last - links[n_diffs - 1].msg();
        }

-        // Evidence = product of per-diff evidences (each cached on first propagation).
-        self.evidence = links.iter().map(|l| l.evidence()).product();
+        let evidence: f64 = links.iter().map(|l| l.evidence()).product();

        // Inverse permutation: inv_buf[orig_i] = sorted_i.
        arena.inv_buf.resize(n_teams, 0);
@@ -346,7 +324,7 @@ impl<'a, T: Time, D: Drift<T>> Game<'a, T, D> {
            arena.inv_buf[orig_i] = si;
        }

-        self.likelihoods = self
+        let likelihoods = self
            .teams
            .iter()
            .zip(self.weights.iter())
@@ -368,120 +346,38 @@ impl<'a, T: Time, D: Drift<T>> Game<'a, T, D> {
                    .collect::<Vec<_>>()
            })
            .collect::<Vec<_>>();
+
+        (evidence, likelihoods)
+    }
+
+    fn likelihoods(&mut self, arena: &mut ScratchArena) {
+        let (evidence, likelihoods) = self.run_chain(arena, |i, sort_buf, vars| {
+            let tie = self.result[sort_buf[i]] == self.result[sort_buf[i + 1]];
+            let margin = if self.p_draw == 0.0 {
+                0.0
+            } else {
+                let a: f64 = self.teams[sort_buf[i]].iter().map(|p| p.beta.powi(2)).sum();
+                let b: f64 = self.teams[sort_buf[i + 1]]
+                    .iter()
+                    .map(|p| p.beta.powi(2))
+                    .sum();
+                compute_margin(self.p_draw, (a + b).sqrt())
+            };
+            let vid = vars.alloc(N_INF);
+            DiffFactor::Trunc(TruncFactor::new(vid, margin, tie))
+        });
+        self.evidence = evidence;
+        self.likelihoods = likelihoods;
    }

    fn likelihoods_scored(&mut self, arena: &mut ScratchArena, score_sigma: f64) {
-        arena.reset();
-
-        let n_teams = self.teams.len();
-
-        arena.sort_buf.extend(0..n_teams);
-        arena.sort_buf.sort_by(|&i, &j| {
-            self.result[j]
-                .partial_cmp(&self.result[i])
-                .unwrap_or(Ordering::Equal)
+        let (evidence, likelihoods) = self.run_chain(arena, |i, sort_buf, vars| {
+            let m_obs = self.result[sort_buf[i]] - self.result[sort_buf[i + 1]];
+            let vid = vars.alloc(N_INF);
+            DiffFactor::Margin(MarginFactor::new(vid, m_obs, score_sigma))
        });
-
-        arena.team_prior.extend(arena.sort_buf.iter().map(|&t| {
-            self.teams[t]
-                .iter()
-                .zip(self.weights[t].iter())
-                .fold(N00, |p, (player, &w)| p + (player.performance() * w))
-        }));
-
-        let n_diffs = n_teams.saturating_sub(1);
-
-        let mut links: Vec<DiffFactor> = (0..n_diffs)
-            .map(|i| {
-                // After descending-by-score sort, m_obs >= 0 for every adjacent pair.
-                let m_obs = self.result[arena.sort_buf[i]] - self.result[arena.sort_buf[i + 1]];
-                let vid = arena.vars.alloc(N_INF);
-                DiffFactor::Margin(MarginFactor::new(vid, m_obs, score_sigma))
-            })
-            .collect();
-
-        arena.lhood_lose.resize(n_teams, N_INF);
-        arena.lhood_win.resize(n_teams, N_INF);
-
-        let mut step = (f64::INFINITY, f64::INFINITY);
-        let mut iter = 0;
-
-        while tuple_gt(step, 1e-6) && iter < 10 {
-            step = (0.0_f64, 0.0_f64);
-
-            for (e, lf) in links[..n_diffs.saturating_sub(1)].iter_mut().enumerate() {
-                let pw = arena.team_prior[e] * arena.lhood_lose[e];
-                let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
-                let raw = pw - pl;
-                arena.vars.set(lf.diff(), raw * lf.msg());
-                let d = lf.propagate(&mut arena.vars);
-                step = tuple_max(step, d);
-
-                let new_ll = pw - lf.msg();
-                step = tuple_max(step, arena.lhood_lose[e + 1].delta(new_ll));
-                arena.lhood_lose[e + 1] = new_ll;
-            }
-
-            for (rev_i, lf) in links[1..].iter_mut().rev().enumerate() {
-                let e = n_diffs - 1 - rev_i;
-                let pw = arena.team_prior[e] * arena.lhood_lose[e];
-                let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
-                let raw = pw - pl;
-                arena.vars.set(lf.diff(), raw * lf.msg());
-                let d = lf.propagate(&mut arena.vars);
-                step = tuple_max(step, d);
-
-                let new_lw = pl + lf.msg();
-                step = tuple_max(step, arena.lhood_win[e].delta(new_lw));
-                arena.lhood_win[e] = new_lw;
-            }
-
-            iter += 1;
-        }
-
-        if n_diffs == 1 {
-            let raw = (arena.team_prior[0] * arena.lhood_lose[0])
-                - (arena.team_prior[1] * arena.lhood_win[1]);
-            arena.vars.set(links[0].diff(), raw * links[0].msg());
-            links[0].propagate(&mut arena.vars);
-        }
-
-        if n_diffs > 0 {
-            let pl1 = arena.team_prior[1] * arena.lhood_win[1];
-            arena.lhood_win[0] = pl1 + links[0].msg();
-            let pw_last = arena.team_prior[n_teams - 2] * arena.lhood_lose[n_teams - 2];
-            arena.lhood_lose[n_teams - 1] = pw_last - links[n_diffs - 1].msg();
-        }
-
-        self.evidence = links.iter().map(|l| l.evidence()).product();
-
-        arena.inv_buf.resize(n_teams, 0);
-        for (si, &orig_i) in arena.sort_buf.iter().enumerate() {
-            arena.inv_buf[orig_i] = si;
-        }
-
-        self.likelihoods = self
-            .teams
-            .iter()
-            .zip(self.weights.iter())
-            .enumerate()
-            .map(|(orig_i, (players, weights))| {
-                let si = arena.inv_buf[orig_i];
-                let m = arena.lhood_win[si] * arena.lhood_lose[si];
-                let performance = players
-                    .iter()
-                    .zip(weights.iter())
-                    .fold(N00, |p, (player, &w)| p + (player.performance() * w));
-                players
-                    .iter()
-                    .zip(weights.iter())
-                    .map(|(player, &w)| {
-                        ((m - performance.exclude(player.performance() * w)) * (1.0 / w))
-                            .forget(player.beta.powi(2))
-                    })
-                    .collect::<Vec<_>>()
-            })
-            .collect::<Vec<_>>();
+        self.evidence = evidence;
+        self.likelihoods = likelihoods;
    }

    pub fn posteriors(&self) -> Vec<Vec<Gaussian>> {
Author	SHA1	Message	Date
logaritmisk	1445c08896	docs: fix stale numerics in t4-margin-factor plan The plan's prose quoted Z_cav ≈ 0.046827 and log_evidence ≈ -3.0613, which diverged from the values asserted by the shipped test in src/factor/mod.rs (-3.062235327364623). Update prose and the matching code comment to 0.04678 / -3.0622.	2026-05-08 14:37:58 +02:00
logaritmisk	f6a83e4dc6	refactor: make BuiltinFactor::log_evidence match exhaustive Replace the `_ => 0.0` wildcard with explicit `Self::TeamSum(_) \| Self::RankDiff(_) => 0.0`. No behavioral change; future variants now produce a compile error instead of being silently absorbed by the wildcard.	2026-05-08 14:37:13 +02:00
logaritmisk	68b589b965	refactor: dedupe Game::likelihoods and likelihoods_scored via run_chain Both methods were 95-line near-duplicates differing only in the closure that builds the per-diff DiffFactor. Extract the shared body as a private run_chain<F>(&self, arena, make_link) helper that returns (evidence, likelihoods); the two callers shrink to ~10 lines each. Pure code-shape change: posteriors and evidence remain bit-equal; all existing tests (lib + integration) pass unchanged.	2026-05-08 14:36:35 +02:00
logaritmisk	7481c31ad8	docs: implementation plan for post-T4-MarginFactor tech debt cleanup Three-task plan covering the run_chain dedup, exhaustive BuiltinFactor log_evidence match, and stale-numerics fix in the T4 plan doc.	2026-05-08 14:28:10 +02:00
logaritmisk	a69a3004b2	docs: spec for post-T4-MarginFactor tech debt cleanup Three independent cleanups: dedupe Game::likelihoods and likelihoods_scored via a run_chain helper taking a make_link closure, make BuiltinFactor's log_evidence match exhaustive, and fix stale numerics in the T4 plan doc.	2026-05-08 14:24:48 +02:00