Files
trueskill-tt/docs/superpowers/plans/2026-04-23-t0-numerical-parity.md
Anders Olsson d11d2e8c6b docs: add T0 numerical-parity implementation plan
Bite-sized, TDD-style task breakdown for the first tier of the engine
redesign: Gaussian to natural-parameter storage, dense Vec storage
replacing HashMap, ScratchArena to eliminate per-event allocs,
Result-ifying the lone panic. No top-level public API change.

Acceptance gate: ≥3x speedup on Batch::iteration vs. baseline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 22:43:27 +02:00

48 KiB
Raw Blame History

T0 — Numerical Parity Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Internally rebuild the engine's data plane for performance — switch Gaussian to natural-parameter storage, replace HashMap<Index, _> with dense Vec<_> storage, eliminate per-event allocations via a reusable ScratchArena, and convert the lone panic! in mu_sigma into a propagated Result. No top-level public API change. Existing test suite must still pass.

Architecture: This is the first tier of a five-tier engine redesign documented in docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md. T0 is invisible to library users — same constructors, same methods, same outputs (within ULP-bounded floating-point drift) — but the internals are the new layout the rest of the redesign builds on.

Tech Stack: Rust 2024 edition, criterion for benchmarks, approx for floating-point comparisons, the existing trueskill-tt crate.

Acceptance criteria

  • All existing tests pass (cargo test --features approx). Where natural-parameter rounding shifts a hardcoded golden value, update the golden and add a parity comment explaining the ULP-bounded drift.
  • cargo bench --bench batch shows ≥3× speedup on Batch::iteration vs. baseline.
  • cargo bench --bench gaussian compiles (it currently doesn't — see Task 2).
  • cargo clippy --all-targets clean.
  • cargo fmt --check clean.
  • No change to the top-level public API (History, HistoryBuilder, Game::new, Game::posteriors, Player::new, quality(), Gaussian::from_ms, etc.). Internal pub mod types may change signatures (Batch::add_events, Agent) — these are exposed in lib.rs but were never user-facing, and the in-tree benches are the only consumers besides the engine itself.

File map

Created:

  • src/storage/mod.rs — module root for the dense storage types
  • src/storage/skill_store.rsSkillStore: dense Vec-backed replacement for HashMap<Index, Skill>
  • src/storage/agent_store.rsAgentStore<D>: dense Vec-backed replacement for HashMap<Index, Agent<D>>
  • src/arena.rsScratchArena: reusable scratch buffers for Game::likelihoods
  • benches/baseline.txt — captured baseline numbers for the acceptance gate (committed for reference)

Modified:

  • src/gaussian.rs — switch storage to (pi, tau); mu()/sigma() become public accessors; ops rewritten
  • src/lib.rs — module declarations for storage and arena; constants N00/N01/N_INF keep their public values but are constructed from natural params; mu_sigma becomes Result-returning
  • src/approx.rsAbsDiffEq/RelativeEq/UlpsEq impls compare via mu()/sigma() accessors instead of fields
  • src/agent.rsclean() takes &mut AgentStore<D> (changed from generic iterator); same observable behavior
  • src/batch.rsSkill map becomes SkillStore; add_events/iteration/etc. take &AgentStore<D> instead of &HashMap<Index, Agent<D>>; Batch owns a ScratchArena
  • src/history.rsagents: HashMap<…> becomes agents: AgentStore<D>; internal call sites updated
  • src/game.rsGame::new takes &mut ScratchArena; per-event Vec allocations replaced by arena slices
  • benches/gaussian.rs — uses public pi()/tau() accessors (was broken)
  • benches/batch.rs — passes &AgentStore<D> to Batch::add_events (built once before the bench loop)

Touched (test-only golden updates — see Task 3):

  • src/gaussian.rs (test module)
  • src/game.rs (test module)
  • src/batch.rs (test module)
  • src/history.rs (test module if affected)
  • src/lib.rs (test module)

Task 1: Establish baseline benchmark numbers

Files:

  • Create: benches/baseline.txt

The acceptance criterion is "≥3× speedup on Batch::iteration." Capture the floor first.

  • Step 1: Fix the broken gaussian bench so we can capture both baselines

The bench currently calls private methods. Make pi() and tau() pub before changing the representation. This is the only baseline-prep change.

Edit src/gaussian.rs:

//   fn pi(&self) -> f64 {
// becomes:
    pub fn pi(&self) -> f64 {

//   fn tau(&self) -> f64 {
// becomes:
    pub fn tau(&self) -> f64 {

Verify: cargo bench --no-run succeeds for both batch and gaussian.

  • Step 2: Run baseline benchmarks and save numbers
cargo bench --bench batch 2>&1 | tee /tmp/bench-batch-baseline.txt
cargo bench --bench gaussian 2>&1 | tee /tmp/bench-gaussian-baseline.txt

Extract the headline numbers (lines like Batch::iteration time: [X.XX µs Y.YY µs Z.ZZ µs]) and write them into benches/baseline.txt:

# Baseline numbers captured before T0 changes
# Hardware: <fill in hostname and CPU model from `uname -a` / `sysctl -n machdep.cpu.brand_string`>
# Date: 2026-04-23

Batch::iteration         <X.XX> µs
Gaussian::add            <X.XX> ns
Gaussian::sub            <X.XX> ns
Gaussian::mul            <X.XX> ns
Gaussian::div            <X.XX> ns
Gaussian::pi             <X.XX> ns
Gaussian::tau            <X.XX> ns
Gaussian::pi_tau_combined <X.XX> ns
  • Step 3: Commit
git add src/gaussian.rs benches/baseline.txt
git commit -m "$(cat <<'EOF'
bench: capture T0 baseline; expose pi/tau accessors

Promotes Gaussian::pi and Gaussian::tau to public so benches/gaussian.rs
compiles, then captures the baseline numbers we'll measure T0 against.
EOF
)"

Task 2: Switch Gaussian to natural-parameter storage

Files:

  • Modify: src/gaussian.rs (rewrite struct + ops)
  • Modify: src/lib.rs (constants)
  • Modify: src/approx.rs (compare via accessors)

This is the core change. Operations move from "store mu/sigma, recompute pi/tau on every mul/div" to "store pi/tau, recompute mu/sigma only when read." Hot-path mul/div become pure addition/subtraction of stored fields — no sqrt, no powi(-2), no division.

Mathematical mapping

Operation Before (mu/sigma stored) After (pi/tau stored)
pi() sigma⁻² (compute) field read
tau() mu · pi() (compute) field read
mu() field read tau / pi (or 0 if pi == 0)
sigma() field read 1 / √pi (or if pi == 0)
a * b (factor product) convert to nat, add, convert back Gaussian { pi: a.pi + b.pi, tau: a.tau + b.tau } — pure adds
a / b (cavity) convert to nat, sub, convert back Gaussian { pi: a.pi - b.pi, tau: a.tau - b.tau } — pure subs
a + b (variance addition) mu1+mu2, √(σ1²+σ2²) requires moment form: convert + compute + convert back
a - b (variance addition; same shape) mu1-mu2, √(σ1²+σ2²) same as + for sigma; requires moment form
a * scalar mu·k, σ·k pi: pi/k², tau: tau/k (derived: σ' = σ·k → pi' = pi/k²; mu' = mu·k → tau' = mu'·pi' = (mu·k)·(pi/k²) = (mu·pi)/k = tau/k)
forget(δ²) mu, √(σ² + δ²) requires moment form
delta(other) `( Δmu
exclude(other) mu1-mu2, √(σ1² - σ2²) requires moment form

Special-value table

Constant Public meaning mu/sigma form (today) pi/tau form (after)
N_INF uniform / no info { mu: 0, sigma: ∞ } { pi: 0.0, tau: 0.0 }
N00 additive identity { mu: 0, sigma: 0 } { pi: f64::INFINITY, tau: 0.0 }
N01 standard normal { mu: 0, sigma: 1 } { pi: 1.0, tau: 0.0 }

Tasks

  • Step 1: Replace src/gaussian.rs with the natural-parameter implementation
use std::ops;

use crate::{MU, N_INF, SIGMA};

/// A Gaussian distribution stored in natural parameters.
///
/// `pi  = 1 / sigma^2`  (precision)
/// `tau = mu * pi`      (precision-adjusted mean)
///
/// This representation makes message-passing operations (`*` and `/`) into
/// pure additions/subtractions of the stored fields, with no `sqrt` or
/// reciprocal in the hot path.
#[derive(Clone, Copy, PartialEq, Debug)]
pub struct Gaussian {
    pi: f64,
    tau: f64,
}

impl Gaussian {
    /// Construct from mean and standard deviation. Maintained for API
    /// compatibility with the previous representation.
    pub const fn from_ms(mu: f64, sigma: f64) -> Self {
        if sigma == f64::INFINITY {
            Self { pi: 0.0, tau: 0.0 }
        } else if sigma == 0.0 {
            // mu == 0 is the only N00-like usage in the codebase; we honour
            // any other point-mass construction using the IEEE convention
            // (tau = mu * inf = inf for nonzero mu).
            Self {
                pi: f64::INFINITY,
                tau: if mu == 0.0 { 0.0 } else { f64::INFINITY * mu },
            }
        } else {
            let pi = 1.0 / (sigma * sigma);
            Self { pi, tau: mu * pi }
        }
    }

    /// Construct directly from natural parameters. Internal helper.
    #[inline]
    pub(crate) const fn from_natural(pi: f64, tau: f64) -> Self {
        Self { pi, tau }
    }

    #[inline]
    pub fn pi(&self) -> f64 {
        self.pi
    }

    #[inline]
    pub fn tau(&self) -> f64 {
        self.tau
    }

    #[inline]
    pub fn mu(&self) -> f64 {
        if self.pi == 0.0 { 0.0 } else { self.tau / self.pi }
    }

    #[inline]
    pub fn sigma(&self) -> f64 {
        if self.pi == 0.0 {
            f64::INFINITY
        } else if self.pi.is_infinite() {
            0.0
        } else {
            1.0 / self.pi.sqrt()
        }
    }

    pub(crate) fn delta(&self, other: Gaussian) -> (f64, f64) {
        ((self.mu() - other.mu()).abs(), (self.sigma() - other.sigma()).abs())
    }

    pub(crate) fn exclude(&self, other: Gaussian) -> Self {
        let mu = self.mu() - other.mu();
        let var = self.sigma().powi(2) - other.sigma().powi(2);
        Self::from_ms(mu, var.sqrt())
    }

    pub(crate) fn forget(&self, variance_delta: f64) -> Self {
        let mu = self.mu();
        let var = self.sigma().powi(2) + variance_delta;
        Self::from_ms(mu, var.sqrt())
    }
}

impl Default for Gaussian {
    fn default() -> Self {
        Self::from_ms(MU, SIGMA)
    }
}

impl ops::Add<Gaussian> for Gaussian {
    type Output = Gaussian;
    fn add(self, rhs: Gaussian) -> Self::Output {
        // Variance addition: (mu1 + mu2, sqrt(s1^2 + s2^2)).
        // Used for combining performance and noise; rare relative to mul/div.
        let mu = self.mu() + rhs.mu();
        let var = self.sigma().powi(2) + rhs.sigma().powi(2);
        Self::from_ms(mu, var.sqrt())
    }
}

impl ops::Sub<Gaussian> for Gaussian {
    type Output = Gaussian;
    fn sub(self, rhs: Gaussian) -> Self::Output {
        // (mu1 - mu2, sqrt(s1^2 + s2^2)). Same sigma combination as add.
        let mu = self.mu() - rhs.mu();
        let var = self.sigma().powi(2) + rhs.sigma().powi(2);
        Self::from_ms(mu, var.sqrt())
    }
}

impl ops::Mul<Gaussian> for Gaussian {
    type Output = Gaussian;
    fn mul(self, rhs: Gaussian) -> Self::Output {
        // Factor product: nat-param add. Hot path.
        Self::from_natural(self.pi + rhs.pi, self.tau + rhs.tau)
    }
}

impl ops::Mul<f64> for Gaussian {
    type Output = Gaussian;
    fn mul(self, scalar: f64) -> Self::Output {
        if !scalar.is_finite() {
            return N_INF;
        }
        if scalar == 0.0 {
            return N_INF;
        }
        // sigma' = sigma * |scalar|  =>  pi' = pi / scalar^2
        // mu'    = mu * scalar       =>  tau' = mu' * pi' = (mu * scalar) * (pi / scalar^2)
        //                                              = (mu * pi) / scalar = tau / scalar
        Self::from_natural(self.pi / (scalar * scalar), self.tau / scalar)
    }
}

impl ops::Div<Gaussian> for Gaussian {
    type Output = Gaussian;
    fn div(self, rhs: Gaussian) -> Self::Output {
        // Cavity: nat-param sub. Hot path.
        Self::from_natural(self.pi - rhs.pi, self.tau - rhs.tau)
    }
}
  • Step 2: Update src/lib.rs constants

The pub const constants need to remain const-constructible. from_ms is const fn, so just rely on it:

// In src/lib.rs, no change needed to the lines:
pub const N01: Gaussian = Gaussian::from_ms(0.0, 1.0);
pub const N00: Gaussian = Gaussian::from_ms(0.0, 0.0);
pub const N_INF: Gaussian = Gaussian::from_ms(0.0, f64::INFINITY);

These continue to work because from_ms already handles sigma == 0.0 and sigma == f64::INFINITY as special cases producing the right nat-param values.

  • Step 3: Rewrite src/approx.rs to compare via accessors

The Gaussian struct no longer has mu/sigma as public fields; the approx impls used them directly. Switch to accessors.

use approx::{AbsDiffEq, RelativeEq, UlpsEq};

use crate::gaussian::Gaussian;

impl AbsDiffEq for Gaussian {
    type Epsilon = <f64 as AbsDiffEq>::Epsilon;

    fn default_epsilon() -> Self::Epsilon {
        f64::default_epsilon()
    }

    fn abs_diff_eq(&self, other: &Self, epsilon: Self::Epsilon) -> bool {
        f64::abs_diff_eq(&self.mu(), &other.mu(), epsilon)
            && f64::abs_diff_eq(&self.sigma(), &other.sigma(), epsilon)
    }
}

impl RelativeEq for Gaussian {
    fn default_max_relative() -> Self::Epsilon {
        f64::default_max_relative()
    }

    fn relative_eq(
        &self,
        other: &Self,
        epsilon: Self::Epsilon,
        max_relative: Self::Epsilon,
    ) -> bool {
        f64::relative_eq(&self.mu(), &other.mu(), epsilon, max_relative)
            && f64::relative_eq(&self.sigma(), &other.sigma(), epsilon, max_relative)
    }
}

impl UlpsEq for Gaussian {
    fn default_max_ulps() -> u32 {
        f64::default_max_ulps()
    }

    fn ulps_eq(&self, other: &Self, epsilon: Self::Epsilon, max_ulps: u32) -> bool {
        f64::ulps_eq(&self.mu(), &other.mu(), epsilon, max_ulps)
            && f64::ulps_eq(&self.sigma(), &other.sigma(), epsilon, max_ulps)
    }
}
  • Step 4: Update benches/gaussian.rs (now a no-op since Task 1 made the methods public)

No change needed — Task 1 already promoted pi()/tau() to pub. After Task 2 they remain pub and become trivial field reads. Verify:

cargo bench --no-run --bench gaussian

Expected: clean compile.

  • Step 5: Search for direct field access on Gaussian outside gaussian.rs and replace with accessors
grep -rn '\.mu\b\|\.sigma\b' src/ benches/ | grep -v 'gaussian.rs'

For each hit, change g.mug.mu() and g.sigmag.sigma(). Likely sites:

  • src/lib.rscdf(), pdf(), compute_margin(), evidence(), approx(), quality() — search for .mu and .sigma
  • src/gaussian.rs (own tests at the bottom)

Specifically known sites in src/lib.rs:

  • evidence(...): uses d[e].prior.mu and d[e].prior.sigma
  • approx(): uses n.mu, n.sigma
  • quality(): uses rating.mu and rating.sigma inside the loops

Update each.

  • Step 6: Run tests, fix broken golden values
cargo test --features approx 2>&1 | tee /tmp/t0-test-output.txt

For each assert_eq!(g, Gaussian { mu: X, sigma: Y }) that fails:

  • The struct literal pattern won't even compile (no public mu/sigma fields). Replace with Gaussian::from_ms(X, Y).
  • For exact assert_eq!, also check whether the failure is "compile error" (struct literal) or "runtime mismatch" (ULP drift). For runtime drift, capture the new value (it should differ from the old by at most a few ULPs in the last hex digit) and update the literal.

Likely affected tests:

  • src/gaussian.rs::tests::test_add — struct literal Gaussian { mu: ..., sigma: ... }
  • src/gaussian.rs::tests::test_sub
  • src/gaussian.rs::tests::test_mul
  • src/gaussian.rs::tests::test_div

Rewrite each like:

#[test]
fn test_mul() {
    let n = Gaussian::from_ms(25.0, 25.0 / 3.0);
    let m = Gaussian::from_ms(0.0, 1.0);

    let result = n * m;
    // Original goldens: mu = 0.35488958990536273, sigma = 0.992876838486922
    // Verify within ULPs (the natural-param mul should produce identical bits
    // for these inputs — if not, capture and document the new value).
    assert_eq!(result.mu(), 0.35488958990536273);
    assert_eq!(result.sigma(), 0.992876838486922);
}

If a golden differs at the last few ULPs, switch to approx::assert_ulps_eq! with max_ulps = 4:

use approx::assert_ulps_eq;
assert_ulps_eq!(result, Gaussian::from_ms(EXPECTED_MU, EXPECTED_SIGMA), max_ulps = 4);
  • Step 7: Run the full test suite, including game/batch/history
cargo test --features approx

Tests in game.rs, batch.rs, history.rs already use assert_ulps_eq! with epsilon = 1e-6 so they should be tolerant. If any fail beyond 1e-6, investigate (it would suggest a real bug in the rewrite, not just ULP drift).

  • Step 8: Lint and format
cargo clippy --all-targets --features approx -- -D warnings
cargo fmt
  • Step 9: Commit
git add src/gaussian.rs src/lib.rs src/approx.rs benches/gaussian.rs
# Plus any test files you adjusted goldens in
git add src/gaussian.rs  # test module
git commit -m "$(cat <<'EOF'
refactor(gaussian): switch to natural-parameter storage

Internal change: Gaussian now stores (pi, tau) rather than (mu, sigma).
Multiplication and division become pure adds/subs of stored fields,
eliminating per-op sqrt and reciprocal. mu()/sigma() become public
accessors. Add/Sub/forget/exclude continue to work in moment form
(rare relative to mul/div in the EP hot path).

Test golden values updated where natural-parameter rounding shifts
the last few ULPs; differences bounded by 1e-6 absolute, validated
against the prior moment-form arithmetic.

Part of T0 of docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md.
EOF
)"

Task 3: Convert mu_sigma panic to Result

Files:

  • Modify: src/lib.rs (or wherever mu_sigma lives — verify with grep)

The current mu_sigma panics on negative precision. This is an internal invariant violation and should be a structured error.

After Task 2, mu_sigma may have been deleted entirely (it was a helper for the old representation's mul/div). Check first:

grep -n 'fn mu_sigma\|mu_sigma(' src/
  • Step 1: If mu_sigma still exists, replace the panic with an error type

If the function survived the Task 2 rewrite (it might still be referenced indirectly), replace the body:

// In src/lib.rs (or src/gaussian.rs)
#[derive(Debug, Clone, PartialEq)]
pub enum InferenceError {
    NegativePrecision { pi: f64 },
}

impl std::fmt::Display for InferenceError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Self::NegativePrecision { pi } => {
                write!(f, "precision must be non-negative; got {pi}")
            }
        }
    }
}

impl std::error::Error for InferenceError {}

fn mu_sigma(tau: f64, pi: f64) -> Result<(f64, f64), InferenceError> {
    if pi > 0.0 {
        Ok((tau / pi, (1.0 / pi).sqrt()))
    } else if (pi + 1e-5) < 0.0 {
        Err(InferenceError::NegativePrecision { pi })
    } else {
        Ok((0.0, f64::INFINITY))
    }
}

If mu_sigma was deleted in Task 2, skip to Step 3 and just create the InferenceError enum.

  • Step 2: Update callers to propagate the Result

If mu_sigma survived, update callers to ? the error or unwrap with a debug_assert!. Hot-path callers (anything inside propagate/iteration) should be debug_assert! because the invariant is enforced at the API boundary. Public-API callers should propagate.

In practice after Task 2 there should be no surviving mu_sigma callers; all Gaussian arithmetic is direct nat-param work.

  • Step 3: Add InferenceError to public API

In src/lib.rs:

mod error;
pub use error::InferenceError;

Create src/error.rs:

#[derive(Debug, Clone, PartialEq)]
pub enum InferenceError {
    NegativePrecision { pi: f64 },
}

impl std::fmt::Display for InferenceError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Self::NegativePrecision { pi } => {
                write!(f, "precision must be non-negative; got {pi}")
            }
        }
    }
}

impl std::error::Error for InferenceError {}

This is the seed of the wider InferenceError enum that T2 will expand. Other variants come later.

  • Step 4: Test
cargo test --features approx
cargo clippy --all-targets --features approx -- -D warnings
cargo fmt
  • Step 5: Commit
git add src/lib.rs src/error.rs
git commit -m "$(cat <<'EOF'
refactor: introduce InferenceError; remove mu_sigma panic

The 'precision should be greater than 0' panic in mu_sigma becomes
a structured InferenceError::NegativePrecision. Hot-path call sites
that have invariant guarantees use debug_assert!; API-boundary
sites propagate.

Seed for the wider error enum that T2 will expand.
EOF
)"

Task 4: Introduce SkillStore (replace HashMap<Index, Skill> in Batch)

Files:

  • Create: src/storage/mod.rs
  • Create: src/storage/skill_store.rs
  • Modify: src/lib.rs (declare pub mod storage;)
  • Modify: src/batch.rs (use SkillStore)

Batch currently holds pub(crate) skills: HashMap<Index, Skill>. Every iteration hashes and resolves indices. Replace with a dense Vec-backed store.

Design

Two operations dominate:

  1. Lookup by Index (very common; in inner loop)
  2. Iterate over all present (Index, Skill) pairs (per iteration, per posterior recompute)

Use the dense + present mask layout from the spec (Section 3 open question, default proposal):

pub struct SkillStore {
    skills:  Vec<Skill>,    // dense, indexed by Index.0
    present: Vec<bool>,     // parallel mask
    n_present: usize,       // cached count for iteration sizing
}

get/get_mut return Option<&Skill>/Option<&mut Skill>. insert(idx, skill) sets and marks present. Iteration is present.iter().enumerate().filter(|(_, &p)| p).map(|(i, _)| (Index(i), &skills[i])).

Tasks

  • Step 1: Write the test for SkillStore

Create src/storage/skill_store.rs. Add tests as a #[cfg(test)] mod tests block at the bottom.

use crate::Index;
use crate::batch::Skill;

#[derive(Debug, Default)]
pub struct SkillStore {
    skills: Vec<Skill>,
    present: Vec<bool>,
    n_present: usize,
}

impl SkillStore {
    pub fn new() -> Self {
        Self::default()
    }

    pub fn with_capacity(cap: usize) -> Self {
        Self {
            skills: Vec::with_capacity(cap),
            present: Vec::with_capacity(cap),
            n_present: 0,
        }
    }

    fn ensure_capacity(&mut self, idx: usize) {
        if idx >= self.skills.len() {
            self.skills.resize_with(idx + 1, Skill::default);
            self.present.resize(idx + 1, false);
        }
    }

    pub fn insert(&mut self, idx: Index, skill: Skill) {
        self.ensure_capacity(idx.0);
        if !self.present[idx.0] {
            self.n_present += 1;
        }
        self.skills[idx.0] = skill;
        self.present[idx.0] = true;
    }

    pub fn get(&self, idx: Index) -> Option<&Skill> {
        if idx.0 < self.present.len() && self.present[idx.0] {
            Some(&self.skills[idx.0])
        } else {
            None
        }
    }

    pub fn get_mut(&mut self, idx: Index) -> Option<&mut Skill> {
        if idx.0 < self.present.len() && self.present[idx.0] {
            Some(&mut self.skills[idx.0])
        } else {
            None
        }
    }

    pub fn contains(&self, idx: Index) -> bool {
        idx.0 < self.present.len() && self.present[idx.0]
    }

    pub fn len(&self) -> usize {
        self.n_present
    }

    pub fn is_empty(&self) -> bool {
        self.n_present == 0
    }

    /// Iterate present (Index, &Skill) pairs.
    pub fn iter(&self) -> impl Iterator<Item = (Index, &Skill)> {
        self.present
            .iter()
            .enumerate()
            .filter_map(move |(i, &p)| if p { Some((Index(i), &self.skills[i])) } else { None })
    }

    /// Iterate present (Index, &mut Skill) pairs.
    pub fn iter_mut(&mut self) -> impl Iterator<Item = (Index, &mut Skill)> {
        let SkillStore { skills, present, .. } = self;
        skills
            .iter_mut()
            .zip(present.iter())
            .enumerate()
            .filter_map(|(i, (s, &p))| if p { Some((Index(i), s)) } else { None })
    }

    /// Just the present indices, for cheap iteration when only keys matter.
    pub fn keys(&self) -> impl Iterator<Item = Index> + '_ {
        self.present
            .iter()
            .enumerate()
            .filter_map(|(i, &p)| if p { Some(Index(i)) } else { None })
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn insert_then_get() {
        let mut store = SkillStore::new();
        let idx = Index(3);
        let skill = Skill::default();
        store.insert(idx, skill);

        assert!(store.contains(idx));
        assert_eq!(store.len(), 1);
        assert!(store.get(idx).is_some());
    }

    #[test]
    fn missing_returns_none() {
        let store = SkillStore::new();
        assert!(store.get(Index(0)).is_none());
        assert!(!store.contains(Index(42)));
    }

    #[test]
    fn iter_skips_absent_slots() {
        let mut store = SkillStore::new();
        store.insert(Index(0), Skill::default());
        store.insert(Index(5), Skill::default());

        let keys: Vec<Index> = store.keys().collect();
        assert_eq!(keys, vec![Index(0), Index(5)]);
    }

    #[test]
    fn double_insert_same_index_does_not_double_count() {
        let mut store = SkillStore::new();
        store.insert(Index(2), Skill::default());
        store.insert(Index(2), Skill::default());
        assert_eq!(store.len(), 1);
    }
}
  • Step 2: Create src/storage/mod.rs
mod skill_store;

pub use skill_store::SkillStore;
  • Step 3: Declare the module in src/lib.rs

Add to the module declarations:

pub mod storage;

(Public so the (still-public) batch module can return references to it; we may tighten visibility in T2.)

  • Step 4: Run the new tests
cargo test --features approx storage::skill_store

Expected: 4 passing tests.

  • Step 5: Replace HashMap<Index, Skill> in Batch

In src/batch.rs, the change is mechanical. Replace:

pub(crate) skills: HashMap<Index, Skill>,

with:

pub(crate) skills: SkillStore,

Add the import:

use crate::storage::SkillStore;

Now update every use site. HashMap and SkillStore differ in:

  • map[&idx] (panicking lookup) → store.get(idx).expect("present") or use store.get_mut(idx).unwrap()
  • map.insert(idx, val)store.insert(idx, val) (same signature)
  • map.iter() returning (&Index, &Skill)store.iter() returning (Index, &Skill) — note key is by-value; adjust borrow patterns
  • map.keys() returning &Indexstore.keys() returning Index
  • map.get(&idx)store.get(idx)
  • map.get_mut(&idx)store.get_mut(idx)

Walk through batch.rs and update each call. Specifically (line numbers approximate):

  • Batch::posteriors — uses iter()
  • Batch::add_events — uses get_mut()/insert()
  • Batch::iteration — uses get_mut()/get() heavily
  • Batch::convergence — uses posteriors() and the deltas
  • Batch::forward_prior_out, Batch::backward_prior_out — use get()
  • Batch::new_backward_info, Batch::new_forward_info — use iter_mut()

After the rewrite, the Batch::new constructor likely becomes:

pub fn new(time: i64, p_draw: f64) -> Self {
    Self {
        events: Vec::new(),
        skills: SkillStore::new(),
        time,
        p_draw,
    }
}
  • Step 6: Run the existing batch tests
cargo test --features approx batch

Expected: all batch tests pass. If iteration order through iter() changed (ours sorts by Index; HashMap order was arbitrary), some "first event" comparisons may need adjustment — but the EP fixed point is order-independent so posteriors should match within ULPs.

  • Step 7: Run full test suite to catch any cross-file fallout
cargo test --features approx
  • Step 8: Lint and format
cargo clippy --all-targets --features approx -- -D warnings
cargo fmt
  • Step 9: Commit
git add src/storage/ src/lib.rs src/batch.rs
git commit -m "$(cat <<'EOF'
refactor(batch): replace HashMap<Index, Skill> with dense SkillStore

Eliminates per-iteration hashing in the within-slice convergence loop
and improves cache locality. SkillStore is a Vec-backed dense store
with a parallel present mask; lookup is O(1) array indexing.

Iteration now visits indices in ascending order (HashMap was arbitrary);
the EP fixed point is order-independent so posteriors are unchanged
within ULPs.
EOF
)"

Task 5: Introduce AgentStore<D> (replace HashMap<Index, Agent<D>> in History)

Files:

  • Create: src/storage/agent_store.rs
  • Modify: src/storage/mod.rs (re-export)
  • Modify: src/agent.rs (clean() signature)
  • Modify: src/batch.rs (signatures: add_events, iteration, new_backward_info, new_forward_info, backward_prior_out, log_evidence take &AgentStore<D> not &HashMap<…>)
  • Modify: src/history.rs (use AgentStore<D> field)
  • Modify: benches/batch.rs (build AgentStore instead of HashMap)

Same idea as Task 4, parameterised over D: Drift.

  • Step 1: Write AgentStore<D> with tests

Create src/storage/agent_store.rs:

use crate::Index;
use crate::agent::Agent;
use crate::drift::Drift;

#[derive(Debug)]
pub struct AgentStore<D: Drift> {
    agents: Vec<Option<Agent<D>>>,
    n_present: usize,
}

impl<D: Drift> Default for AgentStore<D> {
    fn default() -> Self {
        Self {
            agents: Vec::new(),
            n_present: 0,
        }
    }
}

impl<D: Drift> AgentStore<D> {
    pub fn new() -> Self {
        Self::default()
    }

    pub fn with_capacity(cap: usize) -> Self {
        Self {
            agents: Vec::with_capacity(cap),
            n_present: 0,
        }
    }

    fn ensure_capacity(&mut self, idx: usize) {
        if idx >= self.agents.len() {
            self.agents.resize_with(idx + 1, || None);
        }
    }

    pub fn insert(&mut self, idx: Index, agent: Agent<D>) {
        self.ensure_capacity(idx.0);
        if self.agents[idx.0].is_none() {
            self.n_present += 1;
        }
        self.agents[idx.0] = Some(agent);
    }

    pub fn get(&self, idx: Index) -> Option<&Agent<D>> {
        self.agents.get(idx.0).and_then(|slot| slot.as_ref())
    }

    pub fn get_mut(&mut self, idx: Index) -> Option<&mut Agent<D>> {
        self.agents.get_mut(idx.0).and_then(|slot| slot.as_mut())
    }

    pub fn contains(&self, idx: Index) -> bool {
        self.get(idx).is_some()
    }

    pub fn len(&self) -> usize {
        self.n_present
    }

    pub fn is_empty(&self) -> bool {
        self.n_present == 0
    }

    pub fn iter(&self) -> impl Iterator<Item = (Index, &Agent<D>)> {
        self.agents
            .iter()
            .enumerate()
            .filter_map(|(i, slot)| slot.as_ref().map(|a| (Index(i), a)))
    }

    pub fn iter_mut(&mut self) -> impl Iterator<Item = (Index, &mut Agent<D>)> {
        self.agents
            .iter_mut()
            .enumerate()
            .filter_map(|(i, slot)| slot.as_mut().map(|a| (Index(i), a)))
    }

    pub fn values_mut(&mut self) -> impl Iterator<Item = &mut Agent<D>> {
        self.agents.iter_mut().filter_map(|s| s.as_mut())
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use crate::agent::Agent;
    use crate::drift::ConstantDrift;
    use crate::player::Player;

    #[test]
    fn insert_then_get_roundtrip() {
        let mut store: AgentStore<ConstantDrift> = AgentStore::new();
        let idx = Index(7);
        let agent = Agent {
            player: Player::default(),
            ..Default::default()
        };
        store.insert(idx, agent);
        assert!(store.contains(idx));
        assert_eq!(store.len(), 1);
    }

    #[test]
    fn iter_in_index_order() {
        let mut store: AgentStore<ConstantDrift> = AgentStore::new();
        store.insert(Index(2), Agent::default());
        store.insert(Index(0), Agent::default());
        store.insert(Index(5), Agent::default());

        let keys: Vec<Index> = store.iter().map(|(i, _)| i).collect();
        assert_eq!(keys, vec![Index(0), Index(2), Index(5)]);
    }
}
  • Step 2: Re-export from src/storage/mod.rs
mod agent_store;
mod skill_store;

pub use agent_store::AgentStore;
pub use skill_store::SkillStore;
  • Step 3: Run the new tests
cargo test --features approx storage::agent_store

Expected: 2 passing tests.

  • Step 4: Update src/agent.rs::clean

Current signature:

pub(crate) fn clean<'a, D: Drift + 'a, A: Iterator<Item = &'a mut Agent<D>>>(
    agents: A,
    last_time: bool,
) {
    for a in agents {
        a.message = N_INF;
        if last_time {
            a.last_time = i64::MIN;
        }
    }
}

Change to take &mut AgentStore<D>:

use crate::storage::AgentStore;

pub(crate) fn clean<D: Drift>(agents: &mut AgentStore<D>, last_time: bool) {
    for a in agents.values_mut() {
        a.message = N_INF;
        if last_time {
            a.last_time = i64::MIN;
        }
    }
}

Update call sites in src/history.rs:

  • agent::clean(self.agents.values_mut(), false);agent::clean(&mut self.agents, false);

  • Step 5: Update Batch method signatures

In src/batch.rs, replace &HashMap<Index, Agent<D>> with &AgentStore<D> in:

  • Batch::add_events
  • Batch::iteration
  • Batch::convergence
  • Batch::backward_prior_out
  • Batch::new_backward_info
  • Batch::new_forward_info
  • Batch::log_evidence
  • Item::within_prior (also takes &HashMap<Index, Agent<D>>)
  • Event::within_priors

The body changes are mechanical: agents[&idx]agents.get(idx).expect(...) (or agents.get(idx).unwrap() since the call sites have invariants that the index is present).

Add the import:

use crate::storage::{AgentStore, SkillStore};

Remove the now-unused use std::collections::HashMap; if there are no other uses.

  • Step 6: Update History to use AgentStore<D>

In src/history.rs:

// Change the field
pub struct History<D: Drift = ConstantDrift> {
    size: usize,
    pub(crate) batches: Vec<Batch>,
    agents: AgentStore<D>,           // was: HashMap<Index, Agent<D>>
    time: bool,
    // ...
}

// Change the constructors
impl Default for History<ConstantDrift> {
    fn default() -> Self {
        Self {
            // ...
            agents: AgentStore::new(),
            // ...
        }
    }
}

And in HistoryBuilder::build:

pub fn build(self) -> History<D> {
    History {
        // ...
        agents: AgentStore::new(),
        // ...
    }
}

Walk through every self.agents.foo(...) call site in history.rs and replace HashMap semantics with AgentStore semantics. Likely changes:

  • self.agents.contains_key(agent)self.agents.contains(*agent) (note: Index is Copy)

  • self.agents.insert(*agent, val)self.agents.insert(*agent, val) (same signature)

  • self.agents.get_mut(agent).unwrap()self.agents.get_mut(*agent).unwrap()

  • &self.agents passed to Batch::* methods now passes &AgentStore<D> directly (no .values()/.iter() needed)

  • Step 7: Update benches/batch.rs

The bench currently builds HashMap<Index, Agent> and passes &agents. Change to:

use trueskill_tt::{
    BETA, GAMMA, IndexMap, MU, P_DRAW, SIGMA, agent::Agent, batch::Batch, drift::ConstantDrift,
    gaussian::Gaussian, player::Player, storage::AgentStore,
};

fn criterion_benchmark(criterion: &mut Criterion) {
    let mut index = IndexMap::new();
    let a = index.get_or_create("a");
    let b = index.get_or_create("b");
    let c = index.get_or_create("c");

    let mut agents: AgentStore<ConstantDrift> = AgentStore::new();
    for idx in [a, b, c] {
        agents.insert(
            idx,
            Agent {
                player: Player::new(Gaussian::from_ms(MU, SIGMA), BETA, ConstantDrift(GAMMA)),
                ..Default::default()
            },
        );
    }

    // ... rest unchanged, but pass &agents (now &AgentStore)
}
  • Step 8: Run all tests
cargo test --features approx

Likely failure mode: tests in batch.rs and history.rs build their own HashMap<Index, Agent> and pass it. Update those tests to build AgentStore instead. Search:

grep -rn 'HashMap.*Agent\|HashMap::new.*agent' src/

For each test, replace:

let mut agents = HashMap::new();
agents.insert(a, Agent { ... });

with:

let mut agents: AgentStore<ConstantDrift> = AgentStore::new();
agents.insert(a, Agent { ... });
  • Step 9: Lint, format, commit
cargo clippy --all-targets --features approx -- -D warnings
cargo fmt
git add src/ benches/batch.rs
git commit -m "$(cat <<'EOF'
refactor: replace HashMap<Index, Agent> with dense AgentStore

History and Batch now hold agents in a Vec<Option<Agent<D>>> indexed
directly by Index. Eliminates HashMap overhead in the cross-history
forward/backward sweep and within-slice iteration.

Public Batch::* signatures now take &AgentStore<D> instead of
&HashMap<Index, Agent<D>>. The benches/batch.rs and tests are
updated to build AgentStore. Top-level History API is unchanged.
EOF
)"

Task 6: Introduce ScratchArena and reuse buffers across Game::new calls

Files:

  • Create: src/arena.rs
  • Modify: src/lib.rs (declare module)
  • Modify: src/game.rs (Game::new takes &mut ScratchArena; replace per-event Vecs)
  • Modify: src/batch.rs (Batch owns a ScratchArena; passes to Game::new)

Game::likelihoods allocates four Vecs per call:

  • team: Vec<TeamMessage> (length n_teams)
  • diff: Vec<DiffMessage> (length n_teams - 1)
  • tie: Vec<bool> (length n_teams - 1)
  • margin: Vec<f64> (length n_teams - 1)

For 60 games × 30 iterations × cross-history convergence, that's hundreds of thousands of small allocations per converge. Replace with a reusable arena owned by the Batch.

Design

pub struct ScratchArena {
    teams:   Vec<TeamMessage>,
    diffs:   Vec<DiffMessage>,
    ties:    Vec<bool>,
    margins: Vec<f64>,
}

Each scratch slice is reset by clear() (sets len = 0, retains capacity). Game::new borrows the arena, reserves enough capacity, and writes into the buffers. Lifetimes: the arena outlives any single Game; Game borrows mutably for the duration of construction and inference.

Easiest API: ScratchArena owns the buffers; Game::new takes &mut ScratchArena and reads Vec<Vec<Gaussian>> (the public likelihoods field) before returning. The lifetime of the borrowed buffers ends with Game::new.

Tasks

  • Step 1: Write ScratchArena with a small test

Create src/arena.rs:

use crate::message::{DiffMessage, TeamMessage};

#[derive(Debug, Default)]
pub struct ScratchArena {
    pub(crate) teams:   Vec<TeamMessage>,
    pub(crate) diffs:   Vec<DiffMessage>,
    pub(crate) ties:    Vec<bool>,
    pub(crate) margins: Vec<f64>,
}

impl ScratchArena {
    pub fn new() -> Self {
        Self::default()
    }

    pub fn with_capacity(n_teams: usize) -> Self {
        Self {
            teams:   Vec::with_capacity(n_teams),
            diffs:   Vec::with_capacity(n_teams.saturating_sub(1)),
            ties:    Vec::with_capacity(n_teams.saturating_sub(1)),
            margins: Vec::with_capacity(n_teams.saturating_sub(1)),
        }
    }

    pub fn reset(&mut self) {
        self.teams.clear();
        self.diffs.clear();
        self.ties.clear();
        self.margins.clear();
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn reset_keeps_capacity() {
        let mut arena = ScratchArena::with_capacity(8);
        let cap_before = arena.teams.capacity();
        arena.teams.push(TeamMessage::default());
        arena.reset();
        assert_eq!(arena.teams.len(), 0);
        assert_eq!(arena.teams.capacity(), cap_before);
    }
}

Note: TeamMessage and DiffMessage are currently pub(crate). The ScratchArena lives in the engine, so pub(crate) access is fine — ScratchArena's fields are also pub(crate).

  • Step 2: Declare the module

In src/lib.rs:

pub mod arena;

(Public for the same reason as storageBatch exposes it as a constructor argument; T2 can tighten.)

  • Step 3: Run the test
cargo test --features approx arena

Expected: 1 passing test.

  • Step 4: Update Game::new to take &mut ScratchArena

In src/game.rs:

use crate::arena::ScratchArena;

impl<'a, D: Drift> Game<'a, D> {
    pub fn new(
        teams: Vec<Vec<Player<D>>>,
        result: &'a [f64],
        weights: &'a [Vec<f64>],
        p_draw: f64,
        arena: &mut ScratchArena,
    ) -> Self {
        // ... same debug_assert! checks ...

        arena.reset();
        // Pre-size buffers to the exact required lengths.
        // Building team/diff/tie/margin into arena buffers.

        let mut this = Self {
            teams,
            result,
            weights,
            p_draw,
            likelihoods: Vec::new(),
            evidence: 0.0,
        };

        this.likelihoods(arena);
        this
    }

    fn likelihoods(&mut self, arena: &mut ScratchArena) {
        let o = sort_perm(self.result, true);

        // Build team messages into the arena instead of allocating fresh.
        arena.teams.clear();
        arena.teams.extend(o.iter().map(|&e| {
            let performance = self.teams[e]
                .iter()
                .zip(self.weights[e].iter())
                .fold(N00, |p, (player, &weight)| {
                    p + (player.performance() * weight)
                });
            TeamMessage {
                prior: performance,
                ..Default::default()
            }
        }));

        // Same for diffs/ties/margins. Use slice references for the inner loop.
        arena.diffs.clear();
        arena.diffs.extend(arena.teams.windows(2).map(|w| DiffMessage {
            prior: w[0].prior - w[1].prior,
            likelihood: N_INF,
        }));
        // ... etc, mirroring the original code ...
    }
}

(Full rewrite is mechanical — replace each Vec::new() / vec![…] / let mut x = … .collect() with arena.X.clear(); arena.X.extend(…).)

  • Step 5: Update callers of Game::new

Search:

grep -rn 'Game::new' src/ benches/

Each call site passes a &mut ScratchArena. There are two kinds of callers:

  1. Batch::iteration: holds an arena per Batch; passes &mut self.arena.
  2. Batch::log_evidence: same.
  3. History::log_evidence: routes through Batch::log_evidence, which already has access.
  4. Tests in src/game.rs::tests: each test creates a local let mut arena = ScratchArena::new(); and passes &mut arena into the Game::new call.

Update each.

  • Step 6: Add arena: ScratchArena field to Batch

In src/batch.rs:

pub struct Batch {
    pub(crate) events: Vec<Event>,
    pub(crate) skills: SkillStore,
    pub(crate) time: i64,
    p_draw: f64,
    arena: ScratchArena,
}

impl Batch {
    pub fn new(time: i64, p_draw: f64) -> Self {
        Self {
            events: Vec::new(),
            skills: SkillStore::new(),
            time,
            p_draw,
            arena: ScratchArena::new(),
        }
    }
}

In Batch::iteration, Batch::log_evidence, etc., where Game::new(...) is called, pass &mut self.arena.

  • Step 7: Run all tests
cargo test --features approx

The fixed point is unchanged; tests should pass without golden updates.

  • Step 8: Lint, format
cargo clippy --all-targets --features approx -- -D warnings
cargo fmt
  • Step 9: Commit
git add src/arena.rs src/lib.rs src/game.rs src/batch.rs
git commit -m "$(cat <<'EOF'
perf(game): reuse buffers via ScratchArena, eliminate per-event allocs

Game::likelihoods previously allocated four Vecs (teams, diffs, ties,
margins) on every call. Batch now owns one ScratchArena reused across
all events in the slice's iteration loops; Game::new clears and writes
into the arena instead of allocating fresh.

For a 60-event slice * 30 convergence iterations, this removes ~7200
small allocations per converge.
EOF
)"

Task 7: Final benchmark comparison and acceptance check

Files:

  • Modify: benches/baseline.txt (append T0 numbers)

  • Step 1: Run final benchmarks

cargo bench --bench batch 2>&1 | tee /tmp/bench-batch-t0.txt
cargo bench --bench gaussian 2>&1 | tee /tmp/bench-gaussian-t0.txt
  • Step 2: Append T0 numbers to benches/baseline.txt

Add to the file:

# After T0 (date, hardware as above)

Batch::iteration         <X.XX> µs   (speedup: <Y.YY>x)
Gaussian::add            <X.XX> ns   (speedup: <Y.YY>x)
Gaussian::sub            <X.XX> ns   (speedup: <Y.YY>x)
Gaussian::mul            <X.XX> ns   (speedup: <Y.YY>x)
Gaussian::div            <X.XX> ns   (speedup: <Y.YY>x)
Gaussian::pi             <X.XX> ns   (now field read)
Gaussian::tau            <X.XX> ns   (now field read)
  • Step 3: Verify acceptance criterion

Batch::iteration speedup must be ≥3.0×. If less:

  • Re-run cargo bench to rule out noise (criterion's confidence interval should make this clear).
  • Profile with cargo flamegraph --bench batch to find what didn't move. Common culprits: the sort_perm allocation, the posteriors() round-trip in Batch::convergence, the inner Vec<Vec<Gaussian>> builds in Game::posteriors (untouched in T0).
  • If the gap is real, write up a short post-mortem in the commit message and decide whether to add a remediation step or accept the lower speedup. Do not silently lower the bar.

If Gaussian::mul/Gaussian::div did not improve substantially (~5×+ expected since they become two adds vs. four floating-point ops + a sqrt + a reciprocal), investigate. Likely cause: compiler did not inline; check #[inline] annotations.

  • Step 4: Run the full test suite one last time
cargo test --features approx
cargo clippy --all-targets --features approx -- -D warnings
cargo fmt --check

All green.

  • Step 5: Commit the final benchmark numbers
git add benches/baseline.txt
git commit -m "$(cat <<'EOF'
bench: capture T0 final numbers; X.YYx speedup on Batch::iteration

T0 acceptance gate met: ≥3x speedup on Batch::iteration vs. baseline.
Closes the T0 tier of the engine redesign.
EOF
)"

Self-review notes (post-plan)

Spec coverage:

  • Gaussian → natural parameters: Task 2
  • HashMap → dense Vec: Tasks 4 (skills) and 5 (agents)
  • ScratchArena replacing per-event allocs: Task 6
  • Drop panic! in mu_sigma: Task 3
  • Acceptance: ≥3× speedup gate: Tasks 1 and 7
  • Existing tests pass within ULPs: handled in Task 2 step 6 and Task 4 step 6
  • No top-level public API change: confirmed in acceptance criteria; pub-but-internal Batch::* signatures shift but History/Game::posteriors/etc. unchanged
  • Sparse-vs-dense per-slice open question: defaulted to dense + present mask in Task 4 per spec

Open items deferred to T1+:

  • Factor graph machinery (Factor, VarStore, Schedule) — entire scope of T1
  • Schedule / convergence reporting — T1+T5
  • Renames (Player → Rating, etc.) — T2
  • API surface (Outcome, Member<K>, Event<T,K>) — T2
  • Concurrency, color groups, Rayon — T3
  • Richer factor types — T4

Things to watch during execution:

  • Iteration order through SkillStore/AgentStore is by-Index, where HashMap was arbitrary. EP fixed point is order-independent so posteriors should match — but evidence accumulation in the first iteration of Game::likelihoods only updates evidence from one diff in the loop (if iter == 0); if cross-event order shifts at the slice level, the per-iteration evidence may differ even though the converged posterior matches. The acceptance test is on posteriors (within ULPs); evidence delta at convergence should match.
  • pub mod batch; and pub mod agent; exposing internals: T0 keeps them pub to avoid disturbing benches. T2 will tighten visibility.
  • mu_sigma may be entirely deleted by Task 2 (its callers were the old Mul/Div impls). Task 3's wording handles both cases.