docs: spec for post-T4-MarginFactor tech debt cleanup

Three independent cleanups: dedupe Game::likelihoods and likelihoods_scored
via a run_chain helper taking a make_link closure, make BuiltinFactor's
log_evidence match exhaustive, and fix stale numerics in the T4 plan doc.
This commit is contained in:
2026-05-08 14:24:48 +02:00
parent dbaad0e7d2
commit a69a3004b2
@@ -0,0 +1,134 @@
# Tech Debt Cleanup — Post-T4-MarginFactor
## Summary
Three small, independent cleanups left behind by the T4-MarginFactor merge
(`8b53cac`). All three are pure code-shape or doc fixes. No public-API change,
no numerics change, no new behavior.
This batch deliberately excludes the `DiffFactor``BuiltinFactor` overlap
collapse (architectural change kept separate) and per-event `score_sigma`
override (a feature, not debt).
## Scope
### Item 1 — Deduplicate `Game::likelihoods` and `Game::likelihoods_scored`
**Current state.** `src/game.rs:236-371` and `src/game.rs:373-485` are 95-line
near-duplicates of each other. They differ in exactly one block: the closure
that maps a diff index to a `DiffFactor`. The ranked path builds
`DiffFactor::Trunc(TruncFactor::new(vid, margin, tie))` with `margin`/`tie`
derived from `p_draw` and adjacent-result equality. The scored path builds
`DiffFactor::Margin(MarginFactor::new(vid, m_obs, score_sigma))` with `m_obs`
the observed score gap. Everything else — sort, `team_prior`, sweep loop,
boundary updates, evidence product, posterior `likelihoods` — is bit-identical.
**Refactor.** Extract a private helper on `OwnedGame<T, D>`:
```rust
fn run_chain<F>(
&self,
arena: &mut ScratchArena,
make_link: F,
) -> (f64, Vec<Vec<Gaussian>>)
where
F: FnMut(usize, &[usize], &mut VarStore) -> DiffFactor,
```
The closure receives the diff index `i`, the descending-by-result sort
permutation `&arena.sort_buf`, and `&mut arena.vars` for `alloc(N_INF)`. It
returns the `DiffFactor` for that diff slot.
The helper takes `&self` (not `&mut self`) and returns
`(evidence, likelihoods)`. Each caller writes the results back to its own
`self.evidence` and `self.likelihoods` fields. The `&self` choice matters: the
closure captures `&self.result` / `&self.teams` / `&self.weights` / `&self.p_draw`
freely without conflicting with the helper's own immutable borrow.
The two public methods shrink from ~125 lines each to ~10 lines that just
construct the closure.
**Why a closure (not a trait or two-phase build).** A closure keeps all
caller-specific state (`p_draw`, `score_sigma`, beta sums for margin) inline at
the call site. A trait would require a stateful object per call; a two-phase
build (caller produces the `Vec<DiffFactor>` first, helper does the rest) would
either re-do the sort or split state ownership awkwardly between phases.
### Item 2 — Make `BuiltinFactor::log_evidence` exhaustive
**Current state.** `src/factor/mod.rs:94-100` uses a `_ => 0.0` wildcard for
`TeamSum` and `RankDiff`. When a future variant lands (e.g. `SynergyFactor`),
the wildcard silently absorbs it instead of forcing a deliberate decision.
**Refactor.**
```rust
fn log_evidence(&self, vars: &VarStore) -> f64 {
match self {
Self::Trunc(f) => f.log_evidence(vars),
Self::Margin(f) => f.log_evidence(vars),
Self::TeamSum(_) | Self::RankDiff(_) => 0.0,
}
}
```
No behavioral change. Future variants now produce a non-exhaustive-match
compile error.
### Item 3 — Fix stale numerics in T4 plan doc
**Current state.** `docs/superpowers/plans/2026-04-27-t4-margin-factor.md`
contains two numbers that diverge from the values asserted by the shipped test
in `src/factor/mod.rs:163,166`.
**Fix.**
| Doc value (wrong) | Implementation value (correct) |
|---|---|
| `0.046827` | `0.04678` |
| `-3.0613` | `-3.0622` |
Pure docs change. Verified by reading the asserted constants in the test.
## Out of scope
- **`DiffFactor``BuiltinFactor` overlap.** Both enums list `Trunc` and
`Margin` variants. Collapsing into `BuiltinFactor::Diff(DiffFactor)` is
defensible but is an architectural change that wants its own design pass.
`DiffFactor` represents a real semantic subset (factors that operate on a
diff variable in a chain); the duplication is two enum variants, not a
large block of code.
- **Per-event `EventKind::Scored.score_sigma` override.** Today
`score_sigma` is history-wide (set on `HistoryBuilder::score_sigma`). A
per-event override is a real feature ask, not tech debt.
## Verification
Each item commits independently and ships behind a green `cargo test --lib`
run. The dedup is a pure code-shape change: posteriors and evidence must be
**bit-equal** (not ULP-bounded) against the existing 88+28 test goldens.
Per-item gate before committing:
```bash
cargo +nightly fmt
cargo clippy
cargo test --lib
```
## Commit shape
Three commits, one per item, each independently revertable:
1. `refactor: dedupe Game::likelihoods and likelihoods_scored via run_chain`
2. `refactor: make BuiltinFactor::log_evidence match exhaustive`
3. `docs: fix stale numerics in t4-margin-factor plan`
## Risks
- **Borrow-checker friction in Item 1.** The closure captures fields of
`&self` while the helper iterates over arena state. Mitigation: helper is
`&self` (not `&mut self`); arena passed as `&mut ScratchArena` separately.
Disjoint borrows.
- **Compile error in Item 2 if a new variant ships before this lands.**
Trivial follow-on; the whole point is to surface that signal.