Files
trueskill-tt/docs/superpowers/specs/2026-05-08-per-event-score-sigma-design.md
T
logaritmisk 46625d247a docs: spec for per-event score_sigma override
Outcome::Scored becomes a struct variant with an Option<f64> sigma
field. None inherits HistoryBuilder::score_sigma; Some(s) overrides
per event. Resolved at ingest time so EventKind::Scored stays a plain
f64 and TimeSlice/run_chain need zero changes. New constructors
Outcome::scores_with_sigma and EventBuilder::scores_with_sigma cover
the override path; existing scores(..) keeps its signature with
sigma=None internally.

Breaking change to Outcome::Scored variant shape (tuple → struct);
acceptable in 0.1.x. Closes the last item from the T4-MarginFactor
deferred wishlist.
2026-05-08 16:05:27 +02:00

9.9 KiB
Raw Blame History

Per-Event score_sigma Override

Summary

Let users specify a per-event noise override on Outcome::Scored. Today every scored event in a History shares the single HistoryBuilder::score_sigma value (default 1.0); a user who wants to say "this match was a clean blowout, trust the margin more" or "this one was a disrupted scrappy game, trust it less" has no way to do so.

The override is resolved at ingest time and stored as a plain f64 on the existing EventKind::Scored { score_sigma } payload, so TimeSlice and run_chain need zero changes. The work is purely on the public API surface: Outcome::Scored becomes a struct variant with an Option<f64> sigma field; two builder methods on Outcome and EventBuilder cover the explicit-override path.

Background

Outcome::Scored(SmallVec<[f64; 4]>) is the public per-team-score variant (src/outcome.rs:20). It's constructed via Outcome::scores(I) (src/outcome.rs:44) or EventBuilder::scores(I) (src/event_builder.rs:79).

When History::add_events ingests a Scored outcome, it always uses the history-wide default:

// src/history.rs:735-740
crate::Outcome::Scored(scores) => {
    kinds.push(EventKind::Scored {
        score_sigma: self.score_sigma,
    });
    scores.to_vec()
}

The downstream EventKind::Scored { score_sigma: f64 } (src/time_slice.rs:51) is already per-event-shaped — every Event carries its own copy. The constraint is purely at the ingest boundary.

This was flagged as deferred tech debt during the T4-MarginFactor work: "EventKind::Scored.score_sigma payload is always history-wide today; per-event override deferred."

Scope

What ships

  1. Outcome::Scored becomes a struct variant: Scored { scores: SmallVec<[f64; 4]>, sigma: Option<f64> }. None = use history default; Some(s) = override.
  2. New constructor Outcome::scores_with_sigma(scores, sigma) on Outcome. Existing Outcome::scores(I) keeps the same shape but builds with sigma: None.
  3. New builder method EventBuilder::scores_with_sigma(scores, sigma) on EventBuilder.
  4. History::add_events resolves sigma.unwrap_or(self.score_sigma) when converting an Outcome::Scored to EventKind::Scored.
  5. Mechanical pattern-match updates at every site that destructures Outcome::Scored(...) as a tuple. Estimate ~510 sites across src/, tests/, examples/, benches/.

What does not ship

  • No change to EventKind::Scored (already per-event).
  • No change to TimeSlice or run_chain.
  • No change to Game::scored standalone API (it still takes score_sigma via GameOptions::score_sigma).
  • No deprecation of HistoryBuilder::score_sigma — the history-wide default is still useful as a common-case fallback.

Design

Outcome enum change

// src/outcome.rs
#[derive(Clone, Debug)]
pub enum Outcome {
    Ranked(SmallVec<[u32; 4]>),
    Scored {
        scores: SmallVec<[f64; 4]>,
        /// Per-event noise override. `None` means inherit
        /// `HistoryBuilder::score_sigma`. Must be `> 0.0` if `Some`.
        sigma: Option<f64>,
    },
}

The variant shape changes from tuple to struct. Pattern matches that extract the scores switch from Outcome::Scored(scores) to Outcome::Scored { scores, .. } (or { scores, sigma } where the sigma is needed).

Outcome constructors

impl Outcome {
    /// Per-team continuous scores; uses HistoryBuilder::score_sigma default.
    pub fn scores<I: IntoIterator<Item = f64>>(scores: I) -> Self {
        Self::Scored {
            scores: scores.into_iter().collect(),
            sigma: None,
        }
    }

    /// Per-team scores with explicit per-event noise override.
    ///
    /// `sigma` must be > 0.0; debug_assert.
    pub fn scores_with_sigma<I: IntoIterator<Item = f64>>(
        scores: I,
        sigma: f64,
    ) -> Self {
        debug_assert!(sigma > 0.0, "score_sigma must be > 0.0 (got {sigma})");
        Self::Scored {
            scores: scores.into_iter().collect(),
            sigma: Some(sigma),
        }
    }
}

Outcome::scores(I) keeps the existing function signature exactly — its only behavioural change is the internal struct construction. The existing as_scores(), team_count(), etc. accessors keep their public signatures (they return Option<&[f64]> and usize); their internal pattern matches update mechanically.

EventBuilder method

impl<'h, T, D, O, K> EventBuilder<'h, T, D, O, K>
where
    T: Time,
    D: Drift<T>,
    O: Observer<T>,
    K: Eq + std::hash::Hash + Clone,
{
    /// Per-team scores; uses HistoryBuilder::score_sigma default.
    pub fn scores<I: IntoIterator<Item = f64>>(mut self, scores: I) -> Self {
        self.event.outcome = crate::Outcome::scores(scores);
        self
    }

    /// Per-team scores with explicit per-event noise override.
    pub fn scores_with_sigma<I: IntoIterator<Item = f64>>(
        mut self,
        scores: I,
        sigma: f64,
    ) -> Self {
        self.event.outcome = crate::Outcome::scores_with_sigma(scores, sigma);
        self
    }
}

The existing .scores(...) builder method stays — its body changes trivially because Outcome::scores(I) still has the same signature. .scores_with_sigma(...) is the new method.

Sigma resolution

In History::add_events at src/history.rs:735:

crate::Outcome::Scored { scores, sigma } => {
    let resolved = sigma.unwrap_or(self.score_sigma);
    debug_assert!(
        resolved > 0.0,
        "resolved score_sigma must be > 0.0 (got {resolved})"
    );
    kinds.push(EventKind::Scored {
        score_sigma: resolved,
    });
    scores.to_vec()
}

Resolution at ingest time means downstream code keeps a plain f64. No Option propagates further.

Validation

  • Outcome::scores_with_sigma(_, sigma) debug-asserts sigma > 0.0 at construction.
  • History::add_events debug-asserts the resolved sigma is > 0.0 (catches both inherited and overridden paths).
  • HistoryBuilder::score_sigma(s) keeps its existing positive assertion.

The default sigma at the History level (1.0) is positive, so an event with sigma = None against a default-built History always passes the resolved-sigma assertion trivially.

Pattern-match update inventory

Every site that destructures Outcome::Scored(_) as a tuple needs updating. Known sites:

  • src/outcome.rs: the team_count(), as_scores(), as_ranks() match arms (src/outcome.rs:51, :58, :64).
  • src/history.rs:735: the conversion arm (this is also where the resolution rule lands).
  • Any test in src/outcome.rs test mod that constructs Outcome::Scored(...) literally.
  • Any callsite in src/, tests/, examples/, benches/, src/game.rs that pattern-matches the variant.

The compiler surfaces every site at cargo build. Locating them is mechanical.

Testing strategy

Regression net

Existing 100 lib + 27 integration tests are the bit-equal regression net for the sigma = None path. Every existing test that uses Outcome::scores(...) or EventBuilder::scores(...) should continue to produce identical posteriors — the resolved sigma equals the history default (which equals what the hardcoded path produced).

New tests

Three additions in the src/history.rs test module:

  1. outcome_scores_default_sigma_uses_history_default — build a History with score_sigma(0.5), add a 2-team event via Outcome::scores([3.0, 1.0]) (no override), capture posteriors. Build a second History identical except using Outcome::scores_with_sigma([3.0, 1.0], 0.5) (override matches default). Assert posteriors are bit-equal across the two paths.

  2. outcome_scores_with_sigma_overrides_history_default — build a History with score_sigma(0.5), add an event via Outcome::scores_with_sigma([3.0, 1.0], 2.0). Build a second History with score_sigma(2.0) and add the same event via Outcome::scores([3.0, 1.0]). Assert posteriors are bit-equal. Then build a third History with score_sigma(0.5) and add via Outcome::scores([3.0, 1.0]) (no override). Assert this third one's posteriors differ measurably from the override path (max diff > 1e-6) — proves the override actually changes inference.

  3. event_builder_scores_with_sigma_threading — same shape as #2 but constructed via the fluent builder h.event(0).team(["a"]).team(["b"]).scores_with_sigma([3.0, 1.0], 2.0).commit(). Proves the builder method works end-to-end.

Pattern-match update test impact

Existing tests in src/outcome.rs that construct Outcome::Scored(...) literally need updating to the struct shape. Mechanical change; no new tests required.

Verification gates

cargo +nightly fmt
cargo clippy --all-targets -- -D warnings
cargo test --lib
cargo test

Test count grows by 3.

Risks

  • Public API breaking change. Outcome::Scored variant shape changes from tuple to struct. Any downstream consumer pattern-matching on the tuple form breaks. In a 0.1.x crate this is acceptable; flag it in the commit message.
  • Mechanical breadth. The pattern-match updates touch several files. They're all caught by the compiler so the risk is low, but the diff will look bigger than the actual logical change.
  • Two ways to do the same thing. Outcome::scores_with_sigma(..) and EventBuilder::scores_with_sigma(..) both produce the same outcome. This is intentional — the constructor is the underlying primitive; the builder method is the ergonomic wrapper. Same pattern as the existing Outcome::scores(..) / EventBuilder::scores(..) pair.

Out-of-scope follow-ups

  • Per-event override of other config currently history-wide (p_draw, drift, beta) — same architectural pattern would apply but each is its own design decision.
  • Validation upgrade from debug_assert! to a Result at the Outcome construction boundary.
  • Schedule trait integration with run_chain, Residual schedule, SynergyFactor (still pending from the larger spec).