Files

T

logaritmisk d1d6b5136c docs: implementation plan for per-event score_sigma override

Three tasks: foundational Outcome variant change + ingest resolution
(atomic, every commit builds), additive EventBuilder fluent method,
and three end-to-end integration tests covering inheritance,
override-supersedes-default, and builder threading.

2026-05-08 16:12:33 +02:00

23 KiB

Raw Blame History

Per-Event `score_sigma` Override Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Let users specify a per-event score-sigma override on Outcome::Scored, defaulting to HistoryBuilder::score_sigma when not set.

Architecture: Outcome::Scored becomes a struct variant with an Option<f64> sigma field. History::add_events resolves sigma.unwrap_or(self.score_sigma) at ingest time, so downstream EventKind::Scored.score_sigma stays a plain f64 and TimeSlice / run_chain need zero changes. Two new constructors (Outcome::scores_with_sigma and EventBuilder::scores_with_sigma) cover the override path; existing scores(...) keeps its signature.

Tech Stack: Rust 2024, cargo +nightly fmt, cargo clippy, cargo test.

Spec reference

docs/superpowers/specs/2026-05-08-per-event-score-sigma-design.md

File map

File	Why touched
`src/outcome.rs`	`Outcome::Scored` variant becomes a struct; pattern matches in `team_count`, `as_scores`, `as_ranks`; new `scores_with_sigma` constructor; existing `scores` constructor body adapts
`src/history.rs`	The single ingest pattern match at `:735` resolves `sigma.unwrap_or(self.score_sigma)`; three new end-to-end tests
`src/event_builder.rs`	New `scores_with_sigma` builder method

Pre-flight context for the implementer

Outcome is pub. Currently a tuple-variant enum at src/outcome.rs:18-21. Changing Scored(SmallVec) → Scored { scores, sigma } is a breaking change to a public variant shape, acceptable in 0.1.x.
Pattern-match callsite inventory across the workspace (verified by grep): only ONE site destructures the variant — src/history.rs:735 (crate::Outcome::Scored(scores) => { ... }). Every other reference is either a constructor call (Outcome::scores(...)) or a string literal in a doc/error message. The constructors keep their existing signatures, so callsites don't need updating.
Outcome::scores(I) constructor at src/outcome.rs:44: keep the signature pub fn scores<I: IntoIterator<Item = f64>>(scores: I) -> Self. Only the body changes (it now builds Self::Scored { scores: ..., sigma: None }).
as_scores, as_ranks, team_count accessors at src/outcome.rs:48-67: their public signatures stay the same. Internal pattern matches adapt mechanically.
EventBuilder::scores(I) at src/event_builder.rs:79-82: keep unchanged. The new scores_with_sigma(I, f64) lives next to it.
History::score_sigma at src/history.rs:165: still the history-wide default. HistoryBuilder::score_sigma(s) builder method at src/history.rs:82-89 stays as-is.
EventKind::Scored { score_sigma: f64 } at src/time_slice.rs:51: already per-event-shaped. Don't touch.
Test baseline: 100 lib + 27 integration tests, all passing.

Task 1: `Outcome::Scored` becomes a struct variant + constructors

This is the foundational shape change. After this task: the new variant compiles, both scores and scores_with_sigma work on Outcome directly, but History::add_events (the only consumer that destructures the variant) hasn't yet been updated — Task 2 handles that.

Files:

Modify: src/outcome.rs (variant shape, three pattern-match arms, two existing tests, three new tests, two constructors)
Step 1: Write failing tests for the new constructor

In src/outcome.rs, inside the existing #[cfg(test)] mod tests block, add at the end:

#[test]
fn scores_with_sigma_round_trips() {
    let o = Outcome::scores_with_sigma([10.0, 4.0], 0.5);
    assert_eq!(o.team_count(), 2);
    assert_eq!(o.as_scores(), Some(&[10.0, 4.0][..]));
}

#[test]
fn scores_constructor_leaves_sigma_unset() {
    // After the variant change, the public Outcome::scores constructor
    // must build with sigma: None. We assert this indirectly via a match
    // on the variant.
    let o = Outcome::scores([3.0, 1.0]);
    match o {
        Outcome::Scored { scores: _, sigma } => assert!(sigma.is_none()),
        Outcome::Ranked(_) => panic!("expected Scored variant"),
    }
}

#[test]
fn scores_with_sigma_sets_sigma_some() {
    let o = Outcome::scores_with_sigma([3.0, 1.0], 2.0);
    match o {
        Outcome::Scored { scores: _, sigma } => assert_eq!(sigma, Some(2.0)),
        Outcome::Ranked(_) => panic!("expected Scored variant"),
    }
}

#[test]
#[should_panic(expected = "score_sigma must be > 0.0")]
fn scores_with_sigma_rejects_zero() {
    let _ = Outcome::scores_with_sigma([3.0, 1.0], 0.0);
}

Step 2: Run the new tests to verify they fail

Run: cargo test --lib outcome::tests

Expected: 4 errors. The first three fail to compile (no scores_with_sigma function; pattern destructure on Scored { ... } doesn't match the current tuple variant). The last fails because scores_with_sigma doesn't exist.

Step 3: Change the variant shape and update the constructor + accessors

In src/outcome.rs, replace the entire Outcome enum and impl Outcome block (currently src/outcome.rs:16-68) with:

/// Final outcome of a match.
///
/// `Ranked(ranks)`: lower rank = better. Equal ranks mean a tie between those
/// teams. `ranks.len()` must equal the number of teams in the event.
///
/// `Scored { scores, sigma }`: higher score = better. Adjacent (sorted) pairs
/// feed observed margins to `MarginFactor`. `scores.len()` must equal the
/// number of teams in the event. `sigma` overrides `HistoryBuilder::score_sigma`
/// when `Some`; `None` inherits the history default.
#[derive(Clone, Debug, PartialEq)]
#[non_exhaustive]
pub enum Outcome {
    Ranked(SmallVec<[u32; 4]>),
    Scored {
        scores: SmallVec<[f64; 4]>,
        /// Per-event noise override. `None` means inherit
        /// `HistoryBuilder::score_sigma`. Must be `> 0.0` if `Some`.
        sigma: Option<f64>,
    },
}

impl Outcome {
    /// `n`-team outcome where team `winner` won and everyone else tied for last.
    ///
    /// Panics if `winner >= n`.
    pub fn winner(winner: u32, n: u32) -> Self {
        assert!(winner < n, "winner index {winner} out of range 0..{n}");
        let ranks: SmallVec<[u32; 4]> = (0..n).map(|i| if i == winner { 0 } else { 1 }).collect();
        Self::Ranked(ranks)
    }

    /// All `n` teams tied.
    pub fn draw(n: u32) -> Self {
        Self::Ranked(SmallVec::from_vec(vec![0; n as usize]))
    }

    /// Explicit per-team ranking.
    pub fn ranking<I: IntoIterator<Item = u32>>(ranks: I) -> Self {
        Self::Ranked(ranks.into_iter().collect())
    }

    /// Explicit per-team continuous scores; higher = better.
    /// Inherits `HistoryBuilder::score_sigma` for the noise model.
    pub fn scores<I: IntoIterator<Item = f64>>(scores: I) -> Self {
        Self::Scored {
            scores: scores.into_iter().collect(),
            sigma: None,
        }
    }

    /// Explicit per-team continuous scores with a per-event noise override.
    ///
    /// `sigma` must be `> 0.0`; debug-asserts otherwise.
    pub fn scores_with_sigma<I: IntoIterator<Item = f64>>(scores: I, sigma: f64) -> Self {
        debug_assert!(sigma > 0.0, "score_sigma must be > 0.0 (got {sigma})");
        Self::Scored {
            scores: scores.into_iter().collect(),
            sigma: Some(sigma),
        }
    }

    pub fn team_count(&self) -> usize {
        match self {
            Self::Ranked(r) => r.len(),
            Self::Scored { scores, .. } => scores.len(),
        }
    }

    pub(crate) fn as_ranks(&self) -> Option<&[u32]> {
        match self {
            Self::Ranked(r) => Some(r),
            Self::Scored { .. } => None,
        }
    }

    pub(crate) fn as_scores(&self) -> Option<&[f64]> {
        match self {
            Self::Scored { scores, .. } => Some(scores),
            Self::Ranked(_) => None,
        }
    }
}

Step 4: Run the new tests

Run: cargo test --lib outcome::tests

Expected: all outcome tests pass (the 6 pre-existing tests + 4 new = 10 total in the outcome tests module).

If any pre-existing test fails, the issue is in this task — not Task 2. Most likely cause: a pattern-match arm in the rewritten impl Outcome block doesn't compile. Re-check the struct-variant destructure syntax (Self::Scored { scores, .. } for read-only access; Self::Scored { scores, sigma } when both fields are needed).

Step 5: Update History::add_events ingest arm to destructure the new variant

The variant change from Step 3 breaks the existing Outcome::Scored(scores) pattern match in src/history.rs:735. Fix it now (in the same commit) — the codebase must build at every commit boundary.

In src/history.rs, find the crate::Outcome::Scored(scores) => { ... } arm (currently at src/history.rs:735-740). Replace with:

crate::Outcome::Scored { scores, sigma } => {
    let resolved = sigma.unwrap_or(self.score_sigma);
    debug_assert!(
        resolved > 0.0,
        "resolved score_sigma must be > 0.0 (got {resolved})"
    );
    kinds.push(EventKind::Scored {
        score_sigma: resolved,
    });
    scores.to_vec()
}

The surrounding match &ev.outcome { ... } and the surrounding flow (the ranks arm above, the results.push(event_result); below) stay unchanged.

Step 6: Run the full library test suite — bit-equal regression net

Run: cargo build && cargo test --lib && cargo test

Expected: clean build. All 100 lib + 27 integration tests pass. Bit-equal goldens — every existing scored-event constructor uses the no-override path (Outcome::scores(...) or EventBuilder::scores(...)), which now resolves to sigma: None → resolved = self.score_sigma, exactly equal to the previous behavior.

If unexpected additional compile errors surface (any site pattern-matching Outcome::Scored(...) outside the 735 arm), STOP and report — the plan's inventory is wrong, surface that as a finding before continuing.

If any existing test fails: investigate. Most likely cause is a typo in the new pattern arms (Step 3) or the resolution rule (Step 5). The override path isn't exercised yet by any existing test, so the only thing that can break is the inheritance path.

Step 7: Format and lint

Run: cargo +nightly fmt && cargo clippy --all-targets -- -D warnings

Expected: no diff, no warnings.

Step 8: Commit

git add src/outcome.rs src/history.rs
git commit -m "$(cat <<'EOF'
feat(outcome): per-event score_sigma override on Outcome::Scored

Outcome::Scored shape changes from tuple to struct:
{ scores, sigma: Option<f64> }. New constructor scores_with_sigma
sets sigma=Some(s) and debug-asserts s > 0.0; existing scores(I)
constructor keeps its signature and builds with sigma=None internally.
team_count, as_scores, as_ranks accessor pattern matches updated.

History::add_events resolves sigma.unwrap_or(self.score_sigma) at the
ingest arm, so downstream EventKind::Scored stays a plain f64 and
TimeSlice / run_chain need zero changes.

Breaking change to the public Outcome::Scored variant shape
(acceptable in 0.1.x). Bit-equal for callers using the no-override
path because the resolution falls through to self.score_sigma exactly
as before.
EOF
)"

Task 2: `EventBuilder::scores_with_sigma` builder method

The override path is fully wired by Task 1, but it's only reachable via the Outcome::scores_with_sigma constructor (passed into History::add_events directly). The fluent-builder ergonomic — h.event(t).team(...).scores_with_sigma(scores, sigma).commit() — needs one new method on EventBuilder.

Files:

Modify: src/event_builder.rs (new builder method)
Step 1: Add the EventBuilder method

In src/event_builder.rs, find the existing scores method (currently at src/event_builder.rs:79-82). Immediately below it (still inside impl<'h, T, D, O, K> EventBuilder<...>), add:

/// Set explicit per-team continuous scores with a per-event noise override.
///
/// `sigma` overrides `HistoryBuilder::score_sigma` for this event only.
/// Must be `> 0.0`; debug-asserts otherwise via `Outcome::scores_with_sigma`.
pub fn scores_with_sigma<I: IntoIterator<Item = f64>>(mut self, scores: I, sigma: f64) -> Self {
    self.event.outcome = crate::Outcome::scores_with_sigma(scores, sigma);
    self
}

Step 2: Build and run the test suite

Run: cargo build && cargo test --lib && cargo test

Expected: clean build, all 100 lib + 27 integration tests pass. The new method is additive — no behavior changes for existing tests.

Step 3: Format and lint

Run: cargo +nightly fmt && cargo clippy --all-targets -- -D warnings

Expected: no diff, no warnings.

Step 4: Commit

git add src/event_builder.rs
git commit -m "$(cat <<'EOF'
feat(event_builder): expose scores_with_sigma fluent method

Adds EventBuilder::scores_with_sigma, the fluent-builder ergonomic
mirror of Outcome::scores_with_sigma. Lets users write
h.event(t).team(...).team(...).scores_with_sigma([..], sigma).commit()
to set a per-event score_sigma override.
EOF
)"

Task 3: End-to-end integration tests

Files:

Modify: src/history.rs (three new tests in the existing #[cfg(test)] mod tests block at the bottom)
Step 1: Locate the test module

Run: grep -n "^#\[cfg(test)\]" src/history.rs

Identify the test module (there should be one near the bottom of the file). Read its imports and look at neighboring tests to see the existing builder/event-construction pattern in current use. Mirror that pattern in the new tests below — the surface syntax (History::builder(), event(t).team(...), learning_curves(), etc.) must match what already works in this file.

Step 2: Write the failing tests

Add the following three tests at the end of the existing #[cfg(test)] mod tests block in src/history.rs (just before the module's closing }):

#[test]
fn outcome_scores_default_sigma_uses_history_default() {
    use crate::Outcome;

    // Path A: explicit sigma=0.5 via override.
    let mut h_a = crate::History::builder().score_sigma(0.5).build();
    h_a.add_events([crate::Event {
        time: 0_i64,
        teams: smallvec::smallvec![
            crate::Team::with_members([crate::Member::new("a")]),
            crate::Team::with_members([crate::Member::new("b")]),
        ],
        outcome: Outcome::scores_with_sigma([3.0, 1.0], 0.5),
    }])
    .unwrap();
    h_a.converge().unwrap();

    // Path B: history-wide default 0.5, no per-event override.
    let mut h_b = crate::History::builder().score_sigma(0.5).build();
    h_b.add_events([crate::Event {
        time: 0_i64,
        teams: smallvec::smallvec![
            crate::Team::with_members([crate::Member::new("a")]),
            crate::Team::with_members([crate::Member::new("b")]),
        ],
        outcome: Outcome::scores([3.0, 1.0]),
    }])
    .unwrap();
    h_b.converge().unwrap();

    // Inheritance: posteriors must be bit-equal.
    let curves_a = h_a.learning_curves();
    let curves_b = h_b.learning_curves();
    for (key, a_pts) in curves_a.iter() {
        let b_pts = curves_b.get(key).expect("agent missing in path B");
        for (a, b) in a_pts.iter().zip(b_pts.iter()) {
            assert_eq!(a.1.pi(), b.1.pi(), "mismatch at agent {key:?}");
            assert_eq!(a.1.tau(), b.1.tau(), "mismatch at agent {key:?}");
        }
    }
}

#[test]
fn outcome_scores_with_sigma_overrides_history_default() {
    use crate::Outcome;

    // Path A: history-wide default 0.5, per-event override 2.0.
    let mut h_a = crate::History::builder().score_sigma(0.5).build();
    h_a.add_events([crate::Event {
        time: 0_i64,
        teams: smallvec::smallvec![
            crate::Team::with_members([crate::Member::new("a")]),
            crate::Team::with_members([crate::Member::new("b")]),
        ],
        outcome: Outcome::scores_with_sigma([3.0, 1.0], 2.0),
    }])
    .unwrap();
    h_a.converge().unwrap();

    // Path B: history-wide default 2.0, no per-event override.
    let mut h_b = crate::History::builder().score_sigma(2.0).build();
    h_b.add_events([crate::Event {
        time: 0_i64,
        teams: smallvec::smallvec![
            crate::Team::with_members([crate::Member::new("a")]),
            crate::Team::with_members([crate::Member::new("b")]),
        ],
        outcome: Outcome::scores([3.0, 1.0]),
    }])
    .unwrap();
    h_b.converge().unwrap();

    // Override == default-set-to-the-override-value: bit-equal.
    let curves_a = h_a.learning_curves();
    let curves_b = h_b.learning_curves();
    for (key, a_pts) in curves_a.iter() {
        let b_pts = curves_b.get(key).expect("agent missing in path B");
        for (a, b) in a_pts.iter().zip(b_pts.iter()) {
            assert_eq!(a.1.pi(), b.1.pi(), "mismatch at agent {key:?}");
            assert_eq!(a.1.tau(), b.1.tau(), "mismatch at agent {key:?}");
        }
    }

    // Path C: history-wide default 0.5, no override. Different sigma → different posteriors.
    let mut h_c = crate::History::builder().score_sigma(0.5).build();
    h_c.add_events([crate::Event {
        time: 0_i64,
        teams: smallvec::smallvec![
            crate::Team::with_members([crate::Member::new("a")]),
            crate::Team::with_members([crate::Member::new("b")]),
        ],
        outcome: Outcome::scores([3.0, 1.0]),
    }])
    .unwrap();
    h_c.converge().unwrap();

    let curves_c = h_c.learning_curves();
    let mut max_diff: f64 = 0.0;
    for (key, a_pts) in curves_a.iter() {
        let c_pts = curves_c.get(key).expect("agent missing in path C");
        for (a, c) in a_pts.iter().zip(c_pts.iter()) {
            max_diff = max_diff.max((a.1.mu() - c.1.mu()).abs());
            max_diff = max_diff.max((a.1.sigma() - c.1.sigma()).abs());
        }
    }
    assert!(
        max_diff > 1e-6,
        "override should produce different posteriors from inherited default; max_diff={max_diff}"
    );
}

#[test]
fn event_builder_scores_with_sigma_threading() {
    use crate::Outcome;

    // Path A: builder fluent API with sigma override.
    let mut h_a = crate::History::builder().score_sigma(0.5).build();
    h_a.event(0_i64)
        .team(["a"])
        .team(["b"])
        .scores_with_sigma([3.0, 1.0], 2.0)
        .commit()
        .unwrap();
    h_a.converge().unwrap();

    // Path B: same outcome via the explicit Outcome constructor.
    let mut h_b = crate::History::builder().score_sigma(0.5).build();
    h_b.add_events([crate::Event {
        time: 0_i64,
        teams: smallvec::smallvec![
            crate::Team::with_members([crate::Member::new("a")]),
            crate::Team::with_members([crate::Member::new("b")]),
        ],
        outcome: Outcome::scores_with_sigma([3.0, 1.0], 2.0),
    }])
    .unwrap();
    h_b.converge().unwrap();

    let curves_a = h_a.learning_curves();
    let curves_b = h_b.learning_curves();
    for (key, a_pts) in curves_a.iter() {
        let b_pts = curves_b.get(key).expect("agent missing");
        for (a, b) in a_pts.iter().zip(b_pts.iter()) {
            assert_eq!(a.1.pi(), b.1.pi(), "mismatch at agent {key:?}");
            assert_eq!(a.1.tau(), b.1.tau(), "mismatch at agent {key:?}");
        }
    }
}

If the surface API (e.g. History::add_events, Event { time, teams, outcome }, Team::with_members, Member::new, event(...).team(...).commit(), learning_curves()) doesn't exactly match what's available in the test module, look at neighboring tests for the patterns currently in use and adjust. The CONTRACT is: build two Histories that should produce identical posteriors, run them, compare. The surface syntax must follow what compiles in this file.

Step 3: Run the new tests

Run: cargo test --lib outcome_scores_default_sigma_uses_history_default outcome_scores_with_sigma_overrides_history_default event_builder_scores_with_sigma_threading

Expected: 3 passed.

Fallback if Test 2's max_diff > 1e-6 fails (sigma=0.5 vs sigma=2.0 produces nearly identical posteriors — unlikely on a single 2-team scored event, but possible if the priors dominate): use a larger gap, e.g. Outcome::scores_with_sigma([3.0, 1.0], 5.0) vs Outcome::scores([3.0, 1.0]) with score_sigma(0.5). The point is to prove the resolution path actually engages — any sigma gap that produces a measurable posterior difference is fine.

Step 4: Run the full test suite

Run: cargo test --lib && cargo test

Expected: lib count = 103 (was 100, +3), integration count = 27 (unchanged), all passing.

Step 5: Format and lint

Run: cargo +nightly fmt && cargo clippy --all-targets -- -D warnings

Expected: no diff, no warnings.

Step 6: Commit

git add src/history.rs
git commit -m "$(cat <<'EOF'
test(history): end-to-end per-event score_sigma override tests

Three integration tests on a 2-team scored event:
- inheritance: Outcome::scores(...) with no override produces
  bit-equal posteriors to the same outcome wrapped in
  scores_with_sigma(scores, history.score_sigma)
- override-supersedes-default: scores_with_sigma(scores, X) with
  history score_sigma(Y) produces bit-equal posteriors to
  scores(...) with history score_sigma(X), AND differs measurably
  from scores(...) with history score_sigma(Y)
- builder threading: EventBuilder::scores_with_sigma reaches the
  ingest path identically to the Outcome constructor
EOF
)"

Self-review (writer's note)

Spec coverage:

Spec § "What ships" item 1 (Scored becomes struct variant) → Task 1 step 3 ✓
Spec § "What ships" item 2 (scores_with_sigma constructor) → Task 1 step 3 ✓
Spec § "What ships" item 3 (EventBuilder::scores_with_sigma) → Task 2 step 1 ✓
Spec § "What ships" item 4 (sigma resolution at ingest) → Task 1 step 5 ✓
Spec § "What ships" item 5 (pattern-match update inventory) → Task 1 step 5 (single site at history.rs:735) ✓
Spec § "Validation" (debug_assert at constructor) → Task 1 step 3 (in scores_with_sigma) ✓
Spec § "Validation" (debug_assert at ingest) → Task 1 step 5 ✓
Spec § "Testing strategy" §1 (regression net) → Task 1 step 6, Task 2 step 2, Task 3 step 4 ✓
Spec § "Testing strategy" §2 test 1 (default-uses-history-default) → Task 3 step 2 test 1 ✓
Spec § "Testing strategy" §2 test 2 (override-supersedes-default) → Task 3 step 2 test 2 ✓
Spec § "Testing strategy" §2 test 3 (builder threading) → Task 3 step 2 test 3 ✓

Out-of-scope items correctly absent: No EventKind::Scored change, no TimeSlice/run_chain changes, no Game::scored standalone API change, no deprecation of HistoryBuilder::score_sigma.

Type / signature consistency:

Outcome::Scored { scores: SmallVec<[f64; 4]>, sigma: Option<f64> } — Task 1 step 3 (def) and Task 1 step 5 (destructure) match ✓
Outcome::scores_with_sigma<I>(scores: I, sigma: f64) -> Outcome — Task 1 step 3 (def) and Task 2 step 1 (call) match ✓
EventBuilder::scores_with_sigma<I>(mut self, scores: I, sigma: f64) -> Self — Task 2 step 1 (def) and Task 3 step 2 test 3 (call) match ✓
sigma.unwrap_or(self.score_sigma) resolution rule — Task 1 step 5 ✓

Task split rationale: Task 1 lands the foundational shape change AND the ingest resolution atomically — every commit boundary builds and tests pass bit-equal. Task 2 is the small additive EventBuilder method, separated for review-focus reasons (it's the user-facing fluent API exposure). Task 3 is purely additive integration tests. Each task is independently committable; no intermediate non-building state.

No placeholders detected.

23 KiB Raw Blame History

Per-Event score_sigma Override Implementation Plan

Spec reference

File map

Pre-flight context for the implementer

Task 1: Outcome::Scored becomes a struct variant + constructors

Task 2: EventBuilder::scores_with_sigma builder method

Task 3: End-to-end integration tests

Self-review (writer's note)

23 KiB

Raw Blame History

Per-Event `score_sigma` Override Implementation Plan

Task 1: `Outcome::Scored` becomes a struct variant + constructors

Task 2: `EventBuilder::scores_with_sigma` builder method