Files

T

logaritmisk 7481c31ad8 docs: implementation plan for post-T4-MarginFactor tech debt cleanup

Three-task plan covering the run_chain dedup, exhaustive BuiltinFactor
log_evidence match, and stale-numerics fix in the T4 plan doc.

2026-05-08 14:28:10 +02:00

18 KiB

Raw Permalink Blame History

Tech Debt Cleanup Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Land three independent post-T4-MarginFactor cleanups: dedupe Game::likelihoods and Game::likelihoods_scored via a run_chain helper, make BuiltinFactor::log_evidence exhaustive, and fix stale numerics in the T4 plan doc.

Architecture: Pure code-shape and doc fixes. No public-API change, no behavioral change, no new dependencies. The dedup is a pure refactor — bit-equal posteriors and evidence against existing test goldens. The exhaustive match is a future-proofing change with no runtime effect. The doc fix is two number swaps in prose plus one matching code-comment swap.

Tech Stack: Rust 2024, cargo +nightly fmt, cargo clippy, cargo test --lib.

Spec reference

docs/superpowers/specs/2026-05-08-tech-debt-cleanup-design.md

File map

File	Why touched
`src/game.rs`	Add `run_chain` helper; rewrite `likelihoods` and `likelihoods_scored` to call it
`src/factor/mod.rs`	Make `BuiltinFactor::log_evidence` match exhaustive
`docs/superpowers/plans/2026-04-27-t4-margin-factor.md`	Fix two stale prose numbers and one matching code comment

Task 1: Extract `run_chain` helper, dedupe both likelihoods methods

Files:

Modify: src/game.rs:236-485 (replace both likelihoods and likelihoods_scored with one helper + two thin callers)

Context for the implementer (read this before touching anything):

OwnedGame<T, D> (defined at src/game.rs:83-92) holds teams, result, weights, p_draw, plus mutable output fields likelihoods: Vec<Vec<Gaussian>> and evidence: f64. Two private methods on Game<'a, T, D> (the borrowed sibling at src/game.rs:148-156) compute likelihoods:

likelihoods(&mut self, arena: &mut ScratchArena) — ranked outcomes; src/game.rs:236-371
likelihoods_scored(&mut self, arena: &mut ScratchArena, score_sigma: f64) — scored outcomes; src/game.rs:373-485

The two are bit-identical except for the closure that builds the per-diff DiffFactor (defined at src/game.rs:20-54). DiffFactor has two variants: Trunc(TruncFactor) for ranked, Margin(MarginFactor) for scored.

The shared body does, in order: arena.reset(), sort teams descending by result into arena.sort_buf, fill arena.team_prior, build links: Vec<DiffFactor> (the differing block), resize arena.lhood_lose / arena.lhood_win to N_INF, run a forward+backward sweep with a max-iter-10 fixed-point loop guarded by tuple_gt(step, 1e-6), handle the n_diffs == 1 special case, do boundary updates, multiply per-diff evidence() into self.evidence, build the inverse permutation in arena.inv_buf, then build self.likelihoods from the per-team lhood_win * lhood_lose and per-player performance().exclude(...).forget(beta²) math.

Refactor target:

fn run_chain<F>(
    &self,
    arena: &mut ScratchArena,
    mut make_link: F,
) -> (f64, Vec<Vec<Gaussian>>)
where
    F: FnMut(usize, &[usize], &mut crate::factor::VarStore) -> DiffFactor,
{ /* the entire shared body, returning (evidence, likelihoods) */ }

Helper takes &self (not &mut self) so the closure can capture &self.result, &self.teams, &self.weights, &self.p_draw without conflicting with the helper's own immutable borrow. The arena is borrowed &mut independently.

The closure is invoked once per diff index i ∈ 0..n_diffs, after arena.sort_buf is filled. It receives i, &arena.sort_buf[..], and &mut arena.vars so it can alloc(N_INF) the diff VarId. It returns the constructed DiffFactor.

The two callers shrink to:

fn likelihoods(&mut self, arena: &mut ScratchArena) {
    let p_draw = self.p_draw;
    let result = &self.result;
    let teams = &self.teams;
    let (evidence, likelihoods) = Self::dummy_to_satisfy_borrowck(/* see below */);
    // ... assigns self.evidence and self.likelihoods
}

Wait — actually borrow-checker note: calling self.run_chain(arena, |i, sort_buf, vars| { use_self_fields }) from a &mut self method is fine because run_chain takes &self and the closure captures &self immutably. Both share an immutable reborrow of *self. The arena is a separate &mut borrow. Verify the implementer doesn't accidentally make run_chain take &mut self.

Why a closure (not a trait, not a two-phase build). A closure keeps caller-specific state (p_draw, score_sigma, beta sums) inline at the call site with zero ceremony. A trait would require a stateful builder per call. A two-phase build (caller produces Vec<DiffFactor> first, helper does the rest) would either re-do the sort or split arena ownership awkwardly between the phases.

Step 1: Run the existing test suite to capture the baseline

Run: cargo test --lib

Expected: all tests pass. Note the count (should be 88+ lib tests) — the refactor must keep this number unchanged with all green.

Step 2: Open src/game.rs and add the run_chain helper

Inside impl<'a, T: Time, D: Drift<T>> Game<'a, T, D> { ... } (the block starting at src/game.rs:158), add run_chain immediately above the existing likelihoods method (so above line 236). Use exactly this body — it is the merge of the two existing methods with the differing block replaced by the closure call:

fn run_chain<F>(
    &self,
    arena: &mut ScratchArena,
    mut make_link: F,
) -> (f64, Vec<Vec<Gaussian>>)
where
    F: FnMut(usize, &[usize], &mut crate::factor::VarStore) -> DiffFactor,
{
    arena.reset();

    let n_teams = self.teams.len();

    arena.sort_buf.extend(0..n_teams);
    arena.sort_buf.sort_by(|&i, &j| {
        self.result[j]
            .partial_cmp(&self.result[i])
            .unwrap_or(Ordering::Equal)
    });

    arena.team_prior.extend(arena.sort_buf.iter().map(|&t| {
        self.teams[t]
            .iter()
            .zip(self.weights[t].iter())
            .fold(N00, |p, (player, &w)| p + (player.performance() * w))
    }));

    let n_diffs = n_teams.saturating_sub(1);

    let mut links: Vec<DiffFactor> = (0..n_diffs)
        .map(|i| make_link(i, &arena.sort_buf, &mut arena.vars))
        .collect();

    arena.lhood_lose.resize(n_teams, N_INF);
    arena.lhood_win.resize(n_teams, N_INF);

    let mut step = (f64::INFINITY, f64::INFINITY);
    let mut iter = 0;

    while tuple_gt(step, 1e-6) && iter < 10 {
        step = (0.0_f64, 0.0_f64);

        for (e, lf) in links[..n_diffs.saturating_sub(1)].iter_mut().enumerate() {
            let pw = arena.team_prior[e] * arena.lhood_lose[e];
            let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
            let raw = pw - pl;
            arena.vars.set(lf.diff(), raw * lf.msg());
            let d = lf.propagate(&mut arena.vars);
            step = tuple_max(step, d);

            let new_ll = pw - lf.msg();
            step = tuple_max(step, arena.lhood_lose[e + 1].delta(new_ll));
            arena.lhood_lose[e + 1] = new_ll;
        }

        for (rev_i, lf) in links[1..].iter_mut().rev().enumerate() {
            let e = n_diffs - 1 - rev_i;
            let pw = arena.team_prior[e] * arena.lhood_lose[e];
            let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
            let raw = pw - pl;
            arena.vars.set(lf.diff(), raw * lf.msg());
            let d = lf.propagate(&mut arena.vars);
            step = tuple_max(step, d);

            let new_lw = pl + lf.msg();
            step = tuple_max(step, arena.lhood_win[e].delta(new_lw));
            arena.lhood_win[e] = new_lw;
        }

        iter += 1;
    }

    if n_diffs == 1 {
        let raw = (arena.team_prior[0] * arena.lhood_lose[0])
            - (arena.team_prior[1] * arena.lhood_win[1]);
        arena.vars.set(links[0].diff(), raw * links[0].msg());
        links[0].propagate(&mut arena.vars);
    }

    if n_diffs > 0 {
        let pl1 = arena.team_prior[1] * arena.lhood_win[1];
        arena.lhood_win[0] = pl1 + links[0].msg();
        let pw_last = arena.team_prior[n_teams - 2] * arena.lhood_lose[n_teams - 2];
        arena.lhood_lose[n_teams - 1] = pw_last - links[n_diffs - 1].msg();
    }

    let evidence: f64 = links.iter().map(|l| l.evidence()).product();

    arena.inv_buf.resize(n_teams, 0);
    for (si, &orig_i) in arena.sort_buf.iter().enumerate() {
        arena.inv_buf[orig_i] = si;
    }

    let likelihoods = self
        .teams
        .iter()
        .zip(self.weights.iter())
        .enumerate()
        .map(|(orig_i, (players, weights))| {
            let si = arena.inv_buf[orig_i];
            let m = arena.lhood_win[si] * arena.lhood_lose[si];
            let performance = players
                .iter()
                .zip(weights.iter())
                .fold(N00, |p, (player, &w)| p + (player.performance() * w));
            players
                .iter()
                .zip(weights.iter())
                .map(|(player, &w)| {
                    ((m - performance.exclude(player.performance() * w)) * (1.0 / w))
                        .forget(player.beta.powi(2))
                })
                .collect::<Vec<_>>()
        })
        .collect::<Vec<_>>();

    (evidence, likelihoods)
}

Step 3: Replace likelihoods body with a thin caller

In src/game.rs, replace the entire body of fn likelihoods(&mut self, arena: &mut ScratchArena) (currently lines 236-371 — replace from the opening { to the closing } of that method) with:

fn likelihoods(&mut self, arena: &mut ScratchArena) {
    let p_draw = self.p_draw;
    // Capture pointers to fields the closure reads, to keep borrow scopes tight.
    // Closure captures &self.result and &self.teams (both immutable) and the
    // &mut arena passed in via run_chain — disjoint from `&self`.
    let (evidence, likelihoods) = self.run_chain(arena, |i, sort_buf, vars| {
        let tie = self.result[sort_buf[i]] == self.result[sort_buf[i + 1]];
        let margin = if p_draw == 0.0 {
            0.0
        } else {
            let a: f64 = self.teams[sort_buf[i]]
                .iter()
                .map(|p| p.beta.powi(2))
                .sum();
            let b: f64 = self.teams[sort_buf[i + 1]]
                .iter()
                .map(|p| p.beta.powi(2))
                .sum();
            compute_margin(p_draw, (a + b).sqrt())
        };
        let vid = vars.alloc(N_INF);
        DiffFactor::Trunc(TruncFactor::new(vid, margin, tie))
    });
    self.evidence = evidence;
    self.likelihoods = likelihoods;
}

(Capturing p_draw as a local binding before the closure avoids a self.p_draw borrow inside; it's a Copy f64 so this is free.)

Step 4: Replace likelihoods_scored body with a thin caller

In src/game.rs, replace the entire body of fn likelihoods_scored(&mut self, arena: &mut ScratchArena, score_sigma: f64) (currently lines 373-485) with:

fn likelihoods_scored(&mut self, arena: &mut ScratchArena, score_sigma: f64) {
    let (evidence, likelihoods) = self.run_chain(arena, |i, sort_buf, vars| {
        // After descending-by-score sort, m_obs >= 0 for every adjacent pair.
        let m_obs = self.result[sort_buf[i]] - self.result[sort_buf[i + 1]];
        let vid = vars.alloc(N_INF);
        DiffFactor::Margin(MarginFactor::new(vid, m_obs, score_sigma))
    });
    self.evidence = evidence;
    self.likelihoods = likelihoods;
}

Step 5: Build to confirm it compiles

Run: cargo build

Expected: compiles cleanly. If the borrow checker complains that the closure conflicts with self.run_chain(...), the most likely cause is run_chain accidentally being &mut self — confirm its signature is fn run_chain<F>(&self, arena: &mut ScratchArena, mut make_link: F) -> (f64, Vec<Vec<Gaussian>>). If that's correct and there's still a conflict, double-check the closure's captures: it should capture &self.result and &self.teams (immutable), p_draw: f64 by value (Copy), and score_sigma: f64 by value (Copy). It must NOT touch &mut self in any form.

Step 6: Run the full library test suite — must be all green, same count as Step 1

Run: cargo test --lib

Expected: same number of tests as Step 1, all pass. Bit-equal goldens — every existing assertion (test_1vs1, test_1vs1_draw, test_2vs1vs2_mixed, MarginFactor end-to-end tests, etc.) must pass unchanged. If ANY test fails, the refactor is wrong; revert and re-inspect.

Step 7: Run integration tests too

Run: cargo test

Expected: all integration tests pass (28 noted in commit 8b53cac).

Step 8: Format and lint

Run: cargo +nightly fmt && cargo clippy --lib -- -D warnings

Expected: no diffs from fmt, no clippy warnings.

Step 9: Commit

git add src/game.rs
git commit -m "$(cat <<'EOF'
refactor: dedupe Game::likelihoods and likelihoods_scored via run_chain

Both methods were 95-line near-duplicates differing only in the closure
that builds the per-diff DiffFactor. Extract the shared body as a
private run_chain<F>(&self, arena, make_link) helper that returns
(evidence, likelihoods); the two callers shrink to ~10 lines each.

Pure code-shape change: posteriors and evidence remain bit-equal; all
existing tests (lib + integration) pass unchanged.
EOF
)"

Task 2: Make `BuiltinFactor::log_evidence` match exhaustive

Files:

Modify: src/factor/mod.rs:94-100 (the log_evidence impl on BuiltinFactor)
Step 1: Open src/factor/mod.rs and replace the log_evidence body

Replace the existing impl:

fn log_evidence(&self, vars: &VarStore) -> f64 {
    match self {
        Self::Trunc(f) => f.log_evidence(vars),
        Self::Margin(f) => f.log_evidence(vars),
        _ => 0.0,
    }
}

with:

fn log_evidence(&self, vars: &VarStore) -> f64 {
    match self {
        Self::Trunc(f) => f.log_evidence(vars),
        Self::Margin(f) => f.log_evidence(vars),
        Self::TeamSum(_) | Self::RankDiff(_) => 0.0,
    }
}

Step 2: Build and run tests

Run: cargo build && cargo test --lib

Expected: compiles cleanly, all tests pass. Behavior is unchanged — TeamSum and RankDiff still return 0.0, but a future variant will now produce a non-exhaustive-match error instead of being silently swallowed.

Step 3: Format and lint

Run: cargo +nightly fmt && cargo clippy --lib -- -D warnings

Expected: no diffs, no warnings.

Step 4: Commit

git add src/factor/mod.rs
git commit -m "$(cat <<'EOF'
refactor: make BuiltinFactor::log_evidence match exhaustive

Replace the `_ => 0.0` wildcard with explicit
`Self::TeamSum(_) | Self::RankDiff(_) => 0.0`. No behavioral change;
future variants now produce a compile error instead of being silently
absorbed by the wildcard.
EOF
)"

Task 3: Fix stale numerics in T4 plan doc

Files:

Modify: docs/superpowers/plans/2026-04-27-t4-margin-factor.md (lines 52 and 185)

The shipped test in src/factor/mod.rs:163,166 asserts:

assert!((result.mu() - 4.864864864864865).abs() < 1e-12);
assert!((logz - (-3.062235327364623)).abs() < 1e-10);

The plan's prose at line 52 quotes pre-shipped values that no longer match. This task fixes the prose and the matching code-comment. The full-precision assertion blocks elsewhere in the plan are out of scope (they belong to the plan-as-written, and the spec's fix table only listed the rounded prose values).

Step 1: Update the prose at line 52

Open docs/superpowers/plans/2026-04-27-t4-margin-factor.md. Find the line:

- `Z_cav = pdf(5, 0, sqrt(36 + 1)) = pdf(5, 0, sqrt(37)) ≈ 0.046827`. So `log_evidence ≈ -3.0613`.

Replace with:

- `Z_cav = pdf(5, 0, sqrt(36 + 1)) = pdf(5, 0, sqrt(37)) ≈ 0.04678`. So `log_evidence ≈ -3.0622`.

Step 2: Update the matching code-comment at line 185

In the same file, find:

        // pdf(5, 0, sqrt(37)) ≈ 0.046827

Replace with:

        // pdf(5, 0, sqrt(37)) ≈ 0.04678

Step 3: Verify nothing else changed

Run: git diff docs/superpowers/plans/2026-04-27-t4-margin-factor.md

Expected: exactly three lines changed (one prose line containing both numbers, one comment line). Nothing else should be touched.

Step 4: Commit

git add docs/superpowers/plans/2026-04-27-t4-margin-factor.md
git commit -m "$(cat <<'EOF'
docs: fix stale numerics in t4-margin-factor plan

The plan's prose quoted Z_cav ≈ 0.046827 and log_evidence ≈ -3.0613,
which diverged from the values asserted by the shipped test in
src/factor/mod.rs (-3.062235327364623). Update prose and the matching
code comment to 0.04678 / -3.0622.
EOF
)"

Self-review (writer's note)

Spec coverage:

Spec Item 1 (dedupe likelihoods/likelihoods_scored) → Task 1 ✓
Spec Item 2 (exhaustive BuiltinFactor::log_evidence) → Task 2 ✓
Spec Item 3 (stale numerics in T4 plan) → Task 3 ✓
Spec out-of-scope items (DiffFactor collapse, per-event score_sigma) — correctly absent ✓

Verification gates per the spec ("each item commits independently and ships behind a green cargo test --lib"): every task ends in fmt + clippy + tests + commit. Task 1 additionally runs cargo test for integration coverage.

Type / signature consistency:

run_chain signature appears identically in the context header and Step 2 body ✓
Closure type FnMut(usize, &[usize], &mut crate::factor::VarStore) -> DiffFactor matches across Step 2 (definition) and Steps 3/4 (call sites) ✓
DiffFactor::Trunc / DiffFactor::Margin constructors match src/game.rs:20-23 definitions ✓

No placeholders detected.

18 KiB Raw Permalink Blame History

Tech Debt Cleanup Implementation Plan

Spec reference

File map

Task 1: Extract run_chain helper, dedupe both likelihoods methods

Task 2: Make BuiltinFactor::log_evidence match exhaustive

Task 3: Fix stale numerics in T4 plan doc

Self-review (writer's note)

18 KiB

Raw Permalink Blame History

Task 1: Extract `run_chain` helper, dedupe both likelihoods methods

Task 2: Make `BuiltinFactor::log_evidence` match exhaustive