Files
trueskill-tt/docs/superpowers/specs/2026-05-08-per-event-score-sigma-design.md
logaritmisk 46625d247a docs: spec for per-event score_sigma override
Outcome::Scored becomes a struct variant with an Option<f64> sigma
field. None inherits HistoryBuilder::score_sigma; Some(s) overrides
per event. Resolved at ingest time so EventKind::Scored stays a plain
f64 and TimeSlice/run_chain need zero changes. New constructors
Outcome::scores_with_sigma and EventBuilder::scores_with_sigma cover
the override path; existing scores(..) keeps its signature with
sigma=None internally.

Breaking change to Outcome::Scored variant shape (tuple → struct);
acceptable in 0.1.x. Closes the last item from the T4-MarginFactor
deferred wishlist.
2026-05-08 16:05:27 +02:00

293 lines
9.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Per-Event `score_sigma` Override
## Summary
Let users specify a per-event noise override on `Outcome::Scored`.
Today every scored event in a `History` shares the single
`HistoryBuilder::score_sigma` value (default `1.0`); a user who wants
to say "this match was a clean blowout, trust the margin more" or
"this one was a disrupted scrappy game, trust it less" has no way to
do so.
The override is resolved at ingest time and stored as a plain `f64`
on the existing `EventKind::Scored { score_sigma }` payload, so
`TimeSlice` and `run_chain` need zero changes. The work is purely on
the public API surface: `Outcome::Scored` becomes a struct variant
with an `Option<f64> sigma` field; two builder methods on `Outcome`
and `EventBuilder` cover the explicit-override path.
## Background
`Outcome::Scored(SmallVec<[f64; 4]>)` is the public per-team-score
variant (`src/outcome.rs:20`). It's constructed via
`Outcome::scores(I)` (`src/outcome.rs:44`) or
`EventBuilder::scores(I)` (`src/event_builder.rs:79`).
When `History::add_events` ingests a Scored outcome, it always uses
the history-wide default:
```rust
// src/history.rs:735-740
crate::Outcome::Scored(scores) => {
kinds.push(EventKind::Scored {
score_sigma: self.score_sigma,
});
scores.to_vec()
}
```
The downstream `EventKind::Scored { score_sigma: f64 }`
(`src/time_slice.rs:51`) is already per-event-shaped — every Event
carries its own copy. The constraint is purely at the ingest boundary.
This was flagged as deferred tech debt during the T4-MarginFactor
work: "EventKind::Scored.score_sigma payload is always history-wide
today; per-event override deferred."
## Scope
### What ships
1. `Outcome::Scored` becomes a struct variant:
`Scored { scores: SmallVec<[f64; 4]>, sigma: Option<f64> }`.
`None` = use history default; `Some(s)` = override.
2. New constructor `Outcome::scores_with_sigma(scores, sigma)` on
`Outcome`. Existing `Outcome::scores(I)` keeps the same shape but
builds with `sigma: None`.
3. New builder method `EventBuilder::scores_with_sigma(scores, sigma)`
on `EventBuilder`.
4. `History::add_events` resolves `sigma.unwrap_or(self.score_sigma)`
when converting an `Outcome::Scored` to `EventKind::Scored`.
5. Mechanical pattern-match updates at every site that destructures
`Outcome::Scored(...)` as a tuple. Estimate ~510 sites across
`src/`, `tests/`, `examples/`, `benches/`.
### What does not ship
- No change to `EventKind::Scored` (already per-event).
- No change to `TimeSlice` or `run_chain`.
- No change to `Game::scored` standalone API
(it still takes `score_sigma` via `GameOptions::score_sigma`).
- No deprecation of `HistoryBuilder::score_sigma` — the history-wide
default is still useful as a common-case fallback.
## Design
### `Outcome` enum change
```rust
// src/outcome.rs
#[derive(Clone, Debug)]
pub enum Outcome {
Ranked(SmallVec<[u32; 4]>),
Scored {
scores: SmallVec<[f64; 4]>,
/// Per-event noise override. `None` means inherit
/// `HistoryBuilder::score_sigma`. Must be `> 0.0` if `Some`.
sigma: Option<f64>,
},
}
```
The variant shape changes from tuple to struct. Pattern matches that
extract the scores switch from `Outcome::Scored(scores)` to
`Outcome::Scored { scores, .. }` (or `{ scores, sigma }` where the
sigma is needed).
### `Outcome` constructors
```rust
impl Outcome {
/// Per-team continuous scores; uses HistoryBuilder::score_sigma default.
pub fn scores<I: IntoIterator<Item = f64>>(scores: I) -> Self {
Self::Scored {
scores: scores.into_iter().collect(),
sigma: None,
}
}
/// Per-team scores with explicit per-event noise override.
///
/// `sigma` must be > 0.0; debug_assert.
pub fn scores_with_sigma<I: IntoIterator<Item = f64>>(
scores: I,
sigma: f64,
) -> Self {
debug_assert!(sigma > 0.0, "score_sigma must be > 0.0 (got {sigma})");
Self::Scored {
scores: scores.into_iter().collect(),
sigma: Some(sigma),
}
}
}
```
`Outcome::scores(I)` keeps the existing function signature exactly —
its only behavioural change is the internal struct construction. The
existing `as_scores()`, `team_count()`, etc. accessors keep their
public signatures (they return `Option<&[f64]>` and `usize`); their
internal pattern matches update mechanically.
### `EventBuilder` method
```rust
impl<'h, T, D, O, K> EventBuilder<'h, T, D, O, K>
where
T: Time,
D: Drift<T>,
O: Observer<T>,
K: Eq + std::hash::Hash + Clone,
{
/// Per-team scores; uses HistoryBuilder::score_sigma default.
pub fn scores<I: IntoIterator<Item = f64>>(mut self, scores: I) -> Self {
self.event.outcome = crate::Outcome::scores(scores);
self
}
/// Per-team scores with explicit per-event noise override.
pub fn scores_with_sigma<I: IntoIterator<Item = f64>>(
mut self,
scores: I,
sigma: f64,
) -> Self {
self.event.outcome = crate::Outcome::scores_with_sigma(scores, sigma);
self
}
}
```
The existing `.scores(...)` builder method stays — its body changes
trivially because `Outcome::scores(I)` still has the same signature.
`.scores_with_sigma(...)` is the new method.
### Sigma resolution
In `History::add_events` at `src/history.rs:735`:
```rust
crate::Outcome::Scored { scores, sigma } => {
let resolved = sigma.unwrap_or(self.score_sigma);
debug_assert!(
resolved > 0.0,
"resolved score_sigma must be > 0.0 (got {resolved})"
);
kinds.push(EventKind::Scored {
score_sigma: resolved,
});
scores.to_vec()
}
```
Resolution at ingest time means downstream code keeps a plain `f64`.
No `Option` propagates further.
### Validation
- `Outcome::scores_with_sigma(_, sigma)` debug-asserts `sigma > 0.0`
at construction.
- `History::add_events` debug-asserts the resolved sigma is `> 0.0`
(catches both inherited and overridden paths).
- `HistoryBuilder::score_sigma(s)` keeps its existing positive
assertion.
The default sigma at the History level (`1.0`) is positive, so an
event with `sigma = None` against a default-built History always
passes the resolved-sigma assertion trivially.
### Pattern-match update inventory
Every site that destructures `Outcome::Scored(_)` as a tuple needs
updating. Known sites:
- `src/outcome.rs`: the `team_count()`, `as_scores()`, `as_ranks()`
match arms (`src/outcome.rs:51`, `:58`, `:64`).
- `src/history.rs:735`: the conversion arm (this is also where the
resolution rule lands).
- Any test in `src/outcome.rs` test mod that constructs
`Outcome::Scored(...)` literally.
- Any callsite in `src/`, `tests/`, `examples/`, `benches/`,
`src/game.rs` that pattern-matches the variant.
The compiler surfaces every site at `cargo build`. Locating them is
mechanical.
## Testing strategy
### Regression net
Existing 100 lib + 27 integration tests are the bit-equal regression
net for the `sigma = None` path. Every existing test that uses
`Outcome::scores(...)` or `EventBuilder::scores(...)` should
continue to produce identical posteriors — the resolved sigma equals
the history default (which equals what the hardcoded path produced).
### New tests
Three additions in the `src/history.rs` test module:
1. **`outcome_scores_default_sigma_uses_history_default`** — build a
History with `score_sigma(0.5)`, add a 2-team event via
`Outcome::scores([3.0, 1.0])` (no override), capture posteriors.
Build a second History identical except using
`Outcome::scores_with_sigma([3.0, 1.0], 0.5)` (override matches
default). Assert posteriors are bit-equal across the two paths.
2. **`outcome_scores_with_sigma_overrides_history_default`** — build a
History with `score_sigma(0.5)`, add an event via
`Outcome::scores_with_sigma([3.0, 1.0], 2.0)`. Build a second
History with `score_sigma(2.0)` and add the same event via
`Outcome::scores([3.0, 1.0])`. Assert posteriors are bit-equal.
Then build a third History with `score_sigma(0.5)` and add via
`Outcome::scores([3.0, 1.0])` (no override). Assert this third
one's posteriors differ measurably from the override path
(max diff > 1e-6) — proves the override actually changes
inference.
3. **`event_builder_scores_with_sigma_threading`** — same shape as
#2 but constructed via the fluent builder
`h.event(0).team(["a"]).team(["b"]).scores_with_sigma([3.0, 1.0], 2.0).commit()`.
Proves the builder method works end-to-end.
### Pattern-match update test impact
Existing tests in `src/outcome.rs` that construct
`Outcome::Scored(...)` literally need updating to the struct shape.
Mechanical change; no new tests required.
## Verification gates
```bash
cargo +nightly fmt
cargo clippy --all-targets -- -D warnings
cargo test --lib
cargo test
```
Test count grows by 3.
## Risks
- **Public API breaking change.** `Outcome::Scored` variant shape
changes from tuple to struct. Any downstream consumer
pattern-matching on the tuple form breaks. In a 0.1.x crate this
is acceptable; flag it in the commit message.
- **Mechanical breadth.** The pattern-match updates touch several
files. They're all caught by the compiler so the risk is low, but
the diff will look bigger than the actual logical change.
- **Two ways to do the same thing.** `Outcome::scores_with_sigma(..)`
and `EventBuilder::scores_with_sigma(..)` both produce the same
outcome. This is intentional — the constructor is the underlying
primitive; the builder method is the ergonomic wrapper. Same
pattern as the existing `Outcome::scores(..)` /
`EventBuilder::scores(..)` pair.
## Out-of-scope follow-ups
- Per-event override of other config currently history-wide
(`p_draw`, drift, beta) — same architectural pattern would apply
but each is its own design decision.
- Validation upgrade from `debug_assert!` to a `Result` at the
Outcome construction boundary.
- Schedule trait integration with `run_chain`, `Residual` schedule,
`SynergyFactor` (still pending from the larger spec).