Adds soft Gaussian-observation evidence on the per-pair diff variable,
enabling continuous score margins as a richer alternative to ranks.
Public API:
- `Outcome::Scored([scores])` (non-breaking enum extension under
`#[non_exhaustive]`).
- `Game::scored(teams, outcome, options)` constructor parallel to
`Game::ranked`.
- `EventBuilder::scores([...])` fluent helper.
- `HistoryBuilder::score_sigma(σ)` knob (default 1.0, validated > 0).
- `GameOptions::score_sigma`.
- `EventKind` re-exported from `lib.rs` (annotated `#[non_exhaustive]`).
- New `InferenceError::InvalidParameter { name, value }` variant.
Internals:
- `MarginFactor` (`factor/margin.rs`): Gaussian observation factor that
closes in one EP step; cavity-cached log-evidence mirrors `TruncFactor`.
- `BuiltinFactor::Margin` dispatch arm.
- `DiffFactor` enum in `game.rs` lets `Game::likelihoods` and the new
`likelihoods_scored` share the per-pair link abstraction.
- Per-event `EventKind { Ranked, Scored { score_sigma } }` routed through
`TimeSlice::add_events`, `iteration_direct`, and `log_evidence`.
Tests: 88 lib + 27 integration (4 new in `tests/scored.rs`); existing
goldens byte-identical. Bench: `benches/scored.rs` baseline ~960µs for
60 events × 20-player pool with default convergence.
Plan: docs/superpowers/plans/2026-04-27-t4-margin-factor.md
Spec item marked Done.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
64 KiB
T4 — MarginFactor + Outcome::Scored Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add a MarginFactor (Gaussian observation factor on a diff variable) and an Outcome::Scored(scores) variant, so users can supply continuous per-team scores instead of just ranks. Per-pair score margins become soft EP evidence about the latent performance diff.
Architecture:
- Sort scored teams by score descending; for each adjacent pair compute
m_obs = score_higher − score_lower ≥ 0. Per pair:RankDiffFactorwritesdiff = team_a − team_b, then aMarginFactormultiplies in the Gaussian observationN(m_obs, score_sigma²). This replaces theTruncFactorfor scored outcomes; ranked outcomes are unchanged. - A new internal enum
DiffFactor { Trunc(TruncFactor), Margin(MarginFactor) }letsGame::likelihoodskeep its single hand-rolled forward/backward sweep loop while dispatching the per-diff factor by enum. score_sigmais configurable onGameOptionsandHistoryBuilder(default1.0).Outcomeis already#[non_exhaustive], so addingScoredis non-breaking for downstreammatcharms.
Tech Stack: Rust 2024, smallvec, rayon (already in tree). No new crate dependencies.
File Structure
| Path | Status | Responsibility |
|---|---|---|
src/factor/margin.rs |
create | MarginFactor struct + Factor impl + cavity-cached evidence + unit tests |
src/factor/mod.rs |
modify | pub mod margin;, BuiltinFactor::Margin(...) variant + dispatch arms |
src/factors.rs |
modify | re-export MarginFactor |
src/outcome.rs |
modify | Outcome::Scored(SmallVec<[f64; 4]>) variant, scores() ctor, as_scores() accessor, team_count arm |
src/game.rs |
modify | pub(crate) enum DiffFactor, scored path in likelihoods, Game::scored() ctor, GameOptions::score_sigma |
src/event_builder.rs |
modify | .scores([...]) builder method |
src/history.rs |
modify | match Outcome::Scored in add_events; HistoryBuilder::score_sigma; new internal add_events_scored_with_prior (or extra arg) |
tests/scored.rs |
create | end-to-end Scored integration tests |
examples/scored.rs |
create | worked example using Outcome::Scored |
benches/scored.rs |
create | criterion benchmark mirroring batch.rs with scored events |
CLAUDE.md |
modify | mark T4-MarginFactor complete in the architecture notes |
Background — math the implementer needs
For a diff variable D with current marginal D_marg, the MarginFactor models an observation m_obs ~ N(D, σ²) where σ = score_sigma. Standard EP for a Gaussian-likelihood factor:
- Cavity:
D_cav = D_marg / msg(wheremsgis this factor's stored outgoing message; initN_INFso the first cavity = the current marginal). - Tilted distribution:
D_cav · N(m_obs, σ²)— a product of two Gaussians; closed-form, no approximation needed (so it converges in one propagation). - New marginal: the tilted distribution.
- New outgoing message:
new_msg = new_marginal / D_cav. Because the tilted distribution is exact,new_msg = N(m_obs, σ²)(a constant inm_obsandσ). - Cavity evidence:
Z_cav = pdf(m_obs; D_cav.mu(), sqrt(D_cav.sigma()² + σ²))(the marginal likelihood ofm_obsunder the cavity). Cache on first propagate, identical toTruncFactor's pattern.log_evidence = Z_cav.ln().
Practical consequence: MarginFactor::propagate returns a non-zero delta on its first call (because msg jumps from N_INF to N(m_obs, σ²)) and exactly zero afterwards, since new_msg is a constant.
A Gaussian N(m, σ) constructed via Gaussian::from_ms(m, σ). Multiplication adds nat-params (pi += other.pi; tau += other.tau). Division subtracts. The pdf(x, mu, sigma) helper already exists in lib.rs (private, but importable as crate::pdf).
Concrete numerical check for tests: With cavity N(0, 6) and observation m_obs=5, σ=1:
D_cav.pi = 1/36 ≈ 0.027778,D_cav.tau = 0.- New marginal:
pi = 0.027778 + 1 = 1.027778,tau = 0 + 5 = 5. Somu = 5 / 1.027778 ≈ 4.864865,sigma = 1/sqrt(1.027778) ≈ 0.986394. Z_cav = pdf(5, 0, sqrt(36 + 1)) = pdf(5, 0, sqrt(37)) ≈ 0.046827. Solog_evidence ≈ -3.0613.
Task 1: MarginFactor core (file + struct + Factor impl + unit tests)
Files:
-
Create:
src/factor/margin.rs -
Modify:
src/factor/mod.rs:100-102(addpub mod margin;next to the existingpub modlines) -
Step 1: Add the module declaration so the new file compiles
In src/factor/mod.rs, find the existing block:
pub mod rank_diff;
pub mod team_sum;
pub mod trunc;
Replace with:
pub mod margin;
pub mod rank_diff;
pub mod team_sum;
pub mod trunc;
- Step 2: Create
src/factor/margin.rswith the failing tests first
use crate::{
N_INF, cdf, pdf,
factor::{Factor, VarId, VarStore},
gaussian::Gaussian,
};
/// Gaussian observation factor on a diff variable.
///
/// Encodes the soft evidence `m_obs ~ N(diff, sigma²)`. The outgoing message
/// to `diff` is the constant `N(m_obs, sigma²)`, so this factor converges in a
/// single propagation: subsequent calls return a zero delta.
#[derive(Debug)]
pub struct MarginFactor {
pub diff: VarId,
pub m_obs: f64,
pub sigma: f64,
pub(crate) msg: Gaussian,
pub(crate) evidence_cached: Option<f64>,
}
impl MarginFactor {
pub fn new(diff: VarId, m_obs: f64, sigma: f64) -> Self {
debug_assert!(sigma > 0.0, "score sigma must be positive");
Self {
diff,
m_obs,
sigma,
msg: N_INF,
evidence_cached: None,
}
}
}
impl Factor for MarginFactor {
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
let marginal = vars.get(self.diff);
let cavity = marginal / self.msg;
if self.evidence_cached.is_none() {
self.evidence_cached = Some(cavity_evidence(cavity, self.m_obs, self.sigma));
}
let new_msg = Gaussian::from_ms(self.m_obs, self.sigma);
let new_marginal = cavity * new_msg;
let old_msg = self.msg;
self.msg = new_msg;
vars.set(self.diff, new_marginal);
old_msg.delta(new_msg)
}
fn log_evidence(&self, _vars: &VarStore) -> f64 {
self.evidence_cached.unwrap_or(1.0).ln()
}
}
fn cavity_evidence(cavity: Gaussian, m_obs: f64, sigma: f64) -> f64 {
let combined_sigma = (cavity.sigma().powi(2) + sigma.powi(2)).sqrt();
pdf(m_obs, cavity.mu(), combined_sigma)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn first_propagate_writes_tilted_marginal() {
let mut vars = VarStore::new();
let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
let mut f = MarginFactor::new(diff, 5.0, 1.0);
f.propagate(&mut vars);
let result = vars.get(diff);
// pi = 1/36 + 1 ≈ 1.027778; tau = 0 + 5 = 5
// mu = 5 / 1.027778 ≈ 4.864865; sigma = 1/sqrt(1.027778) ≈ 0.986394
assert!((result.mu() - 4.864864864864865).abs() < 1e-12);
assert!((result.sigma() - 0.986393923832144).abs() < 1e-12);
}
#[test]
fn converges_in_one_step() {
let mut vars = VarStore::new();
let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
let mut f = MarginFactor::new(diff, 5.0, 1.0);
f.propagate(&mut vars);
let (dmu, dsig) = f.propagate(&mut vars);
assert!(dmu < 1e-12, "expected ~0 delta on second propagate, got {dmu}");
assert!(dsig < 1e-12);
}
#[test]
fn evidence_cached_on_first_propagate() {
let mut vars = VarStore::new();
let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
let mut f = MarginFactor::new(diff, 5.0, 1.0);
assert!(f.evidence_cached.is_none());
f.propagate(&mut vars);
let z = f.evidence_cached.unwrap();
// pdf(5, 0, sqrt(37)) ≈ 0.046827
assert!((z - 0.04682752233851171).abs() < 1e-10);
// Subsequent propagations don't change it.
f.propagate(&mut vars);
assert_eq!(f.evidence_cached.unwrap(), z);
}
#[test]
fn log_evidence_matches_cached_ln() {
let mut vars = VarStore::new();
let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
let mut f = MarginFactor::new(diff, 5.0, 1.0);
f.propagate(&mut vars);
let logz = f.log_evidence(&vars);
assert!((logz - (-3.061357379815869)).abs() < 1e-10);
}
// Silence unused-import warning for cdf until/if a tie-band variant is added.
#[allow(dead_code)]
fn _cdf_smoke() -> f64 {
cdf(0.0, 0.0, 1.0)
}
}
Note: the unused
cdfimport keeps parity withtrunc.rsstyle and reserves the spot if a tie-band MarginFactor variant gets added later. If you'd rather drop it, remove thecdffrom the import list and delete_cdf_smoke.
- Step 3: Run the new tests to verify they pass once added (after Step 2 they will pass; this step is the guard)
Run: cargo test --lib factor::margin
Expected: 4 passed.
- Step 4: Verify the module compiles cleanly with no warnings
Run: cargo build and cargo clippy --lib -- -D warnings
Expected: no warnings, no errors.
- Step 5: Format and commit
cargo +nightly fmt
git add src/factor/margin.rs src/factor/mod.rs
git commit -m "feat(factor): add MarginFactor for scored-margin EP evidence"
Task 2: Wire MarginFactor into BuiltinFactor enum dispatch
Files:
-
Modify:
src/factor/mod.rs:76-98(theBuiltinFactorenum and itsFactorimpl) -
Modify:
src/factors.rs:7-13(the public re-export list) -
Step 1: Write a failing dispatch test in
src/factor/mod.rs
Open src/factor/mod.rs. Inside the existing #[cfg(test)] mod tests { ... } block (around line 105), add:
#[test]
fn builtin_factor_dispatches_to_margin() {
use super::margin::MarginFactor;
let mut vars = VarStore::new();
let diff = vars.alloc(Gaussian::from_ms(0.0, 6.0));
let mut f = BuiltinFactor::Margin(MarginFactor::new(diff, 5.0, 1.0));
f.propagate(&mut vars);
let result = vars.get(diff);
assert!((result.mu() - 4.864864864864865).abs() < 1e-12);
let logz = f.log_evidence(&vars);
assert!((logz - (-3.061357379815869)).abs() < 1e-10);
}
- Step 2: Run the test to verify it fails
Run: cargo test --lib factor::tests::builtin_factor_dispatches_to_margin
Expected: FAIL with no variant named Margin found for enum BuiltinFactor.
- Step 3: Add the enum variant + Factor impl arms
Replace the current BuiltinFactor definition and its Factor impl (currently src/factor/mod.rs:76-98):
/// Enum dispatcher for the built-in factor types.
///
/// Using an enum instead of `Box<dyn Factor>` keeps factor data inline and
/// avoids virtual-call overhead in the hot inference loop.
#[derive(Debug)]
pub enum BuiltinFactor {
TeamSum(team_sum::TeamSumFactor),
RankDiff(rank_diff::RankDiffFactor),
Trunc(trunc::TruncFactor),
Margin(margin::MarginFactor),
}
impl Factor for BuiltinFactor {
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64) {
match self {
Self::TeamSum(f) => f.propagate(vars),
Self::RankDiff(f) => f.propagate(vars),
Self::Trunc(f) => f.propagate(vars),
Self::Margin(f) => f.propagate(vars),
}
}
fn log_evidence(&self, vars: &VarStore) -> f64 {
match self {
Self::Trunc(f) => f.log_evidence(vars),
Self::Margin(f) => f.log_evidence(vars),
_ => 0.0,
}
}
}
- Step 4: Re-export
MarginFactorfromsrc/factors.rs
Replace the body of src/factors.rs (lines 7-13) with:
pub use crate::{
factor::{
BuiltinFactor, Factor, VarId, VarStore, margin::MarginFactor,
rank_diff::RankDiffFactor, team_sum::TeamSumFactor, trunc::TruncFactor,
},
schedule::{EpsilonOrMax, Schedule, ScheduleReport},
};
- Step 5: Run the test to verify it passes
Run: cargo test --lib factor::tests::builtin_factor_dispatches_to_margin
Expected: PASS.
- Step 6: Run the full lib test suite to confirm no regressions
Run: cargo test --lib
Expected: all tests pass (current count + 5 new from Tasks 1–2).
- Step 7: Format and commit
cargo +nightly fmt
git add src/factor/mod.rs src/factors.rs
git commit -m "feat(factor): dispatch MarginFactor through BuiltinFactor enum"
Task 3: Add Outcome::Scored variant and accessors
Files:
-
Modify:
src/outcome.rs -
Step 1: Write failing tests in
src/outcome.rs
Add to the existing #[cfg(test)] mod tests { ... } block (after winner_out_of_range_panics, around line 86):
#[test]
fn scored_two_teams() {
let o = Outcome::scores([10.0, 4.0]);
assert_eq!(o.team_count(), 2);
assert_eq!(o.as_scores(), Some(&[10.0, 4.0][..]));
assert_eq!(o.as_ranks(), None);
}
#[test]
fn scored_team_count_matches_input() {
let o = Outcome::scores([3.0, 1.0, 2.0, 0.0]);
assert_eq!(o.team_count(), 4);
}
#[test]
fn ranked_as_scores_returns_none() {
let o = Outcome::winner(0, 2);
assert!(o.as_scores().is_none());
assert!(o.as_ranks().is_some());
}
- Step 2: Run the tests to verify they fail
Run: cargo test --lib outcome::tests
Expected: FAIL — no function or associated item named scores found, etc.
- Step 3: Implement the
Scoredvariant and helpers
Replace the body of src/outcome.rs with:
//! Outcome of a match.
//!
//! `Ranked(ranks)` for ordinal results; `Scored(scores)` for continuous
//! per-team scores (engages `MarginFactor` in the engine).
use smallvec::SmallVec;
/// Final outcome of a match.
///
/// `Ranked(ranks)`: lower rank = better. Equal ranks mean a tie between those
/// teams. `ranks.len()` must equal the number of teams in the event.
///
/// `Scored(scores)`: higher score = better. Adjacent (sorted) pairs feed
/// observed margins to `MarginFactor`. `scores.len()` must equal the number
/// of teams in the event.
#[derive(Clone, Debug, PartialEq)]
#[non_exhaustive]
pub enum Outcome {
Ranked(SmallVec<[u32; 4]>),
Scored(SmallVec<[f64; 4]>),
}
impl Outcome {
/// `n`-team outcome where team `winner` won and everyone else tied for last.
///
/// Panics if `winner >= n`.
pub fn winner(winner: u32, n: u32) -> Self {
assert!(winner < n, "winner index {winner} out of range 0..{n}");
let ranks: SmallVec<[u32; 4]> = (0..n).map(|i| if i == winner { 0 } else { 1 }).collect();
Self::Ranked(ranks)
}
/// All `n` teams tied.
pub fn draw(n: u32) -> Self {
Self::Ranked(SmallVec::from_vec(vec![0; n as usize]))
}
/// Explicit per-team ranking.
pub fn ranking<I: IntoIterator<Item = u32>>(ranks: I) -> Self {
Self::Ranked(ranks.into_iter().collect())
}
/// Explicit per-team continuous scores; higher = better.
pub fn scores<I: IntoIterator<Item = f64>>(scores: I) -> Self {
Self::Scored(scores.into_iter().collect())
}
pub fn team_count(&self) -> usize {
match self {
Self::Ranked(r) => r.len(),
Self::Scored(s) => s.len(),
}
}
pub(crate) fn as_ranks(&self) -> Option<&[u32]> {
match self {
Self::Ranked(r) => Some(r),
Self::Scored(_) => None,
}
}
pub(crate) fn as_scores(&self) -> Option<&[f64]> {
match self {
Self::Scored(s) => Some(s),
Self::Ranked(_) => None,
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn winner_two_teams() {
let o = Outcome::winner(0, 2);
assert_eq!(o.as_ranks(), Some(&[0u32, 1][..]));
assert_eq!(o.team_count(), 2);
}
#[test]
fn winner_three_teams_second_wins() {
let o = Outcome::winner(1, 3);
assert_eq!(o.as_ranks(), Some(&[1u32, 0, 1][..]));
}
#[test]
fn draw_three_teams() {
let o = Outcome::draw(3);
assert_eq!(o.as_ranks(), Some(&[0u32, 0, 0][..]));
}
#[test]
fn ranking_from_iter() {
let o = Outcome::ranking([2, 0, 1]);
assert_eq!(o.as_ranks(), Some(&[2u32, 0, 1][..]));
}
#[test]
#[should_panic(expected = "winner index 2 out of range")]
fn winner_out_of_range_panics() {
let _ = Outcome::winner(2, 2);
}
#[test]
fn scored_two_teams() {
let o = Outcome::scores([10.0, 4.0]);
assert_eq!(o.team_count(), 2);
assert_eq!(o.as_scores(), Some(&[10.0, 4.0][..]));
assert_eq!(o.as_ranks(), None);
}
#[test]
fn scored_team_count_matches_input() {
let o = Outcome::scores([3.0, 1.0, 2.0, 0.0]);
assert_eq!(o.team_count(), 4);
}
#[test]
fn ranked_as_scores_returns_none() {
let o = Outcome::winner(0, 2);
assert!(o.as_scores().is_none());
assert!(o.as_ranks().is_some());
}
}
Note: the existing
as_ranksreturned&[u32]and was#[allow(dead_code)]. The new signature returnsOption<&[u32]>becauseRankedis no longer the only variant. All in-tree call sites that usedas_ranks()(we'll update them in later tasks) must now handle theOption.
- Step 4: Run the outcome tests to verify they pass
Run: cargo test --lib outcome
Expected: 8 passed.
- Step 5: Update existing call sites to handle the new
Option<&[u32]>return
Two call sites use as_ranks() today. Update each to expect Option:
In src/history.rs:672, change:
let ranks = ev.outcome.as_ranks();
if ranks.len() != ev.teams.len() {
to:
let ranks = match ev.outcome.as_ranks() {
Some(r) => r,
None => {
// Scored path will be wired in Task 7; for now it's an error.
return Err(InferenceError::MismatchedShape {
kind: "outcome variant",
expected: 0,
got: 0,
});
}
};
if ranks.len() != ev.teams.len() {
In src/history.rs:701, change:
let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
let inverted: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();
(no change needed — ranks is already &[u32] here).
In src/game.rs:312, change:
let ranks = outcome.as_ranks();
let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
let result: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();
to:
let ranks = outcome.as_ranks().ok_or(crate::InferenceError::MismatchedShape {
kind: "Game::ranked requires Outcome::Ranked",
expected: 0,
got: 0,
})?;
let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
let result: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();
- Step 6: Verify the full lib still compiles and tests pass
Run: cargo test --lib
Expected: all tests pass (call sites updated cleanly).
- Step 7: Format and commit
cargo +nightly fmt
git add src/outcome.rs src/history.rs src/game.rs
git commit -m "feat(outcome): add Scored variant; switch as_ranks/as_scores to Option"
Task 4: Internal DiffFactor enum to dispatch Trunc vs Margin per-pair
Files:
-
Modify:
src/game.rs(top of file, beforeGameimpl) -
Step 1: Write a failing test in
src/game.rs's test module
In the #[cfg(test)] mod tests { ... } block at the bottom of src/game.rs, add (after test_2vs2_weighted):
#[test]
fn diff_factor_dispatch_trunc_and_margin() {
use crate::factor::{margin::MarginFactor, trunc::TruncFactor, VarStore};
use super::DiffFactor;
let mut vars = VarStore::new();
let dt = vars.alloc(Gaussian::from_ms(0.0, 6.0));
let dm = vars.alloc(Gaussian::from_ms(0.0, 6.0));
let mut t = DiffFactor::Trunc(TruncFactor::new(dt, 0.0, false));
let mut m = DiffFactor::Margin(MarginFactor::new(dm, 5.0, 1.0));
let _ = t.propagate(&mut vars);
let _ = m.propagate(&mut vars);
// Smoke: both diffs got written; their msgs are non-N_INF.
assert!(t.msg().pi() > 0.0);
assert!(m.msg().pi() > 0.0);
assert_eq!(t.diff(), dt);
assert_eq!(m.diff(), dm);
}
- Step 2: Run the test to verify it fails
Run: cargo test --lib game::tests::diff_factor_dispatch_trunc_and_margin
Expected: FAIL — cannot find type DiffFactor in this scope.
- Step 3: Add the
DiffFactorenum at the top ofsrc/game.rs
Insert after the existing use block (around line 14, before pub struct GameOptions):
use crate::factor::margin::MarginFactor;
/// Per-adjacent-pair link factor in the game's diff chain.
///
/// `Trunc` is used for `Outcome::Ranked` (rank-based truncation).
/// `Margin` is used for `Outcome::Scored` (Gaussian observation on the diff).
#[derive(Debug)]
pub(crate) enum DiffFactor {
Trunc(TruncFactor),
Margin(MarginFactor),
}
impl DiffFactor {
pub(crate) fn diff(&self) -> crate::factor::VarId {
match self {
Self::Trunc(f) => f.diff,
Self::Margin(f) => f.diff,
}
}
pub(crate) fn msg(&self) -> Gaussian {
match self {
Self::Trunc(f) => f.msg,
Self::Margin(f) => f.msg,
}
}
pub(crate) fn evidence(&self) -> f64 {
match self {
Self::Trunc(f) => f.evidence_cached.unwrap_or(1.0),
Self::Margin(f) => f.evidence_cached.unwrap_or(1.0),
}
}
pub(crate) fn propagate(&mut self, vars: &mut crate::factor::VarStore) -> (f64, f64) {
use crate::factor::Factor;
match self {
Self::Trunc(f) => f.propagate(vars),
Self::Margin(f) => f.propagate(vars),
}
}
}
- Step 4: Refactor
Game::likelihoodsto driveVec<DiffFactor>instead ofVec<TruncFactor>
This is a mechanical rename inside Game::likelihoods (currently src/game.rs:135-273). The loop logic is unchanged; we just move the per-pair object behind the enum. Replace the body of Game::likelihoods from where let mut trunc: Vec<TruncFactor> = ... is constructed (around line 160) to its last use (around line 243):
// One DiffFactor per adjacent sorted-team pair; each owns a diff VarId.
let mut links: Vec<DiffFactor> = (0..n_diffs)
.map(|i| {
let tie = self.result[arena.sort_buf[i]] == self.result[arena.sort_buf[i + 1]];
let margin = if self.p_draw == 0.0 {
0.0
} else {
let a: f64 = self.teams[arena.sort_buf[i]]
.iter()
.map(|p| p.beta.powi(2))
.sum();
let b: f64 = self.teams[arena.sort_buf[i + 1]]
.iter()
.map(|p| p.beta.powi(2))
.sum();
compute_margin(self.p_draw, (a + b).sqrt())
};
let vid = arena.vars.alloc(N_INF);
DiffFactor::Trunc(TruncFactor::new(vid, margin, tie))
})
.collect();
// Per-team messages from neighbouring RankDiff factors (replaces TeamMessage).
arena.lhood_lose.resize(n_teams, N_INF);
arena.lhood_win.resize(n_teams, N_INF);
let mut step = (f64::INFINITY, f64::INFINITY);
let mut iter = 0;
while tuple_gt(step, 1e-6) && iter < 10 {
step = (0.0_f64, 0.0_f64);
for (e, lf) in links[..n_diffs.saturating_sub(1)].iter_mut().enumerate() {
let pw = arena.team_prior[e] * arena.lhood_lose[e];
let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
let raw = pw - pl;
arena.vars.set(lf.diff(), raw * lf.msg());
let d = lf.propagate(&mut arena.vars);
step = tuple_max(step, d);
let new_ll = pw - lf.msg();
step = tuple_max(step, arena.lhood_lose[e + 1].delta(new_ll));
arena.lhood_lose[e + 1] = new_ll;
}
for (rev_i, lf) in links[1..].iter_mut().rev().enumerate() {
let e = n_diffs - 1 - rev_i;
let pw = arena.team_prior[e] * arena.lhood_lose[e];
let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
let raw = pw - pl;
arena.vars.set(lf.diff(), raw * lf.msg());
let d = lf.propagate(&mut arena.vars);
step = tuple_max(step, d);
let new_lw = pl + lf.msg();
step = tuple_max(step, arena.lhood_win[e].delta(new_lw));
arena.lhood_win[e] = new_lw;
}
iter += 1;
}
if n_diffs == 1 {
let raw = (arena.team_prior[0] * arena.lhood_lose[0])
- (arena.team_prior[1] * arena.lhood_win[1]);
arena.vars.set(links[0].diff(), raw * links[0].msg());
links[0].propagate(&mut arena.vars);
}
if n_diffs > 0 {
let pl1 = arena.team_prior[1] * arena.lhood_win[1];
arena.lhood_win[0] = pl1 + links[0].msg();
let pw_last = arena.team_prior[n_teams - 2] * arena.lhood_lose[n_teams - 2];
arena.lhood_lose[n_teams - 1] = pw_last - links[n_diffs - 1].msg();
}
self.evidence = links.iter().map(|l| l.evidence()).product();
(Everything below the evidence line is unchanged.) Also remove the now-unused use crate::factor::trunc::TruncFactor; from the file's top imports if it becomes unused — but we still construct TruncFactor directly above, so it stays.
- Step 5: Run the full lib test suite to verify the refactor preserves all golden values
Run: cargo test --lib
Expected: all tests pass with identical assertions — this is a pure refactor.
- Step 6: Run the integration tests
Run: cargo test
Expected: all pass.
- Step 7: Format and commit
cargo +nightly fmt
git add src/game.rs
git commit -m "refactor(game): dispatch per-diff link factors via DiffFactor enum"
Task 5: Add score_sigma to GameOptions and the scored path in Game::likelihoods
Files:
-
Modify:
src/game.rs -
Step 1: Write a failing test for the scored path
In src/game.rs's test module, after the new dispatch test from Task 4, add:
#[test]
fn scored_path_sharper_when_margin_is_large() {
// Same prior on both sides; large positive observed margin should pull
// team A above team B.
let prior = R::new(
Gaussian::from_ms(25.0, 25.0 / 3.0),
25.0 / 6.0,
ConstantDrift(25.0 / 300.0),
);
let teams = vec![vec![prior], vec![prior]];
let result = vec![10.0, 0.0]; // a beat b by 10
let weights = [vec![1.0], vec![1.0]];
let mut arena = ScratchArena::new();
let g = Game::scored_with_arena(
teams,
&result,
&weights,
1.0, // score_sigma
&mut arena,
);
let p = g.posteriors();
let a = p[0][0];
let b = p[1][0];
assert!(a.mu() > b.mu(), "expected team a posterior mu > team b; got {} vs {}", a.mu(), b.mu());
// Tighter score_sigma should produce a stronger update.
let mut arena2 = ScratchArena::new();
let g_tight = Game::scored_with_arena(
vec![vec![prior], vec![prior]],
&result,
&weights,
0.1, // tighter score_sigma
&mut arena2,
);
let p_tight = g_tight.posteriors();
let a_tight = p_tight[0][0];
assert!(a_tight.mu() > a.mu(), "expected tighter sigma to push posterior further; {} vs {}", a_tight.mu(), a.mu());
}
- Step 2: Run the test to verify it fails
Run: cargo test --lib game::tests::scored_path_sharper_when_margin_is_large
Expected: FAIL — no function or associated item named scored_with_arena.
- Step 3: Add
score_sigmatoGameOptions
Replace the GameOptions definition (around src/game.rs:15-28):
#[derive(Clone, Copy, Debug)]
pub struct GameOptions {
pub p_draw: f64,
pub score_sigma: f64,
pub convergence: crate::ConvergenceOptions,
}
impl Default for GameOptions {
fn default() -> Self {
Self {
p_draw: crate::P_DRAW,
score_sigma: 1.0,
convergence: crate::ConvergenceOptions::default(),
}
}
}
- Step 4: Add
Game::scored_with_arenaand friends
In Game<'a, T, D>'s impl block (the one with ranked_with_arena, around src/game.rs:90-133), add a new method right after ranked_with_arena:
pub(crate) fn scored_with_arena(
teams: Vec<Vec<Rating<T, D>>>,
scores: &'a [f64],
weights: &'a [Vec<f64>],
score_sigma: f64,
arena: &mut ScratchArena,
) -> Self {
debug_assert!(
scores.len() == teams.len(),
"scores must have the same length as teams"
);
debug_assert!(
weights
.iter()
.zip(teams.iter())
.all(|(w, t)| w.len() == t.len()),
"weights must have the same dimensions as teams"
);
debug_assert!(score_sigma > 0.0, "score_sigma must be positive");
let mut this = Self {
teams,
result: scores,
weights,
p_draw: 0.0,
likelihoods: Vec::new(),
evidence: 0.0,
};
this.likelihoods_scored(arena, score_sigma);
this
}
- Step 5: Add
likelihoods_scored(parallel tolikelihoods)
Right after fn likelihoods (around line 273), add:
fn likelihoods_scored(&mut self, arena: &mut ScratchArena, score_sigma: f64) {
arena.reset();
let n_teams = self.teams.len();
arena.sort_buf.extend(0..n_teams);
arena.sort_buf.sort_by(|&i, &j| {
self.result[j]
.partial_cmp(&self.result[i])
.unwrap_or(Ordering::Equal)
});
arena.team_prior.extend(arena.sort_buf.iter().map(|&t| {
self.teams[t]
.iter()
.zip(self.weights[t].iter())
.fold(N00, |p, (player, &w)| p + (player.performance() * w))
}));
let n_diffs = n_teams.saturating_sub(1);
// One MarginFactor per adjacent sorted-team pair, observed m_obs ≥ 0.
let mut links: Vec<DiffFactor> = (0..n_diffs)
.map(|i| {
let m_obs = self.result[arena.sort_buf[i]] - self.result[arena.sort_buf[i + 1]];
let vid = arena.vars.alloc(N_INF);
DiffFactor::Margin(MarginFactor::new(vid, m_obs, score_sigma))
})
.collect();
arena.lhood_lose.resize(n_teams, N_INF);
arena.lhood_win.resize(n_teams, N_INF);
let mut step = (f64::INFINITY, f64::INFINITY);
let mut iter = 0;
while tuple_gt(step, 1e-6) && iter < 10 {
step = (0.0_f64, 0.0_f64);
for (e, lf) in links[..n_diffs.saturating_sub(1)].iter_mut().enumerate() {
let pw = arena.team_prior[e] * arena.lhood_lose[e];
let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
let raw = pw - pl;
arena.vars.set(lf.diff(), raw * lf.msg());
let d = lf.propagate(&mut arena.vars);
step = tuple_max(step, d);
let new_ll = pw - lf.msg();
step = tuple_max(step, arena.lhood_lose[e + 1].delta(new_ll));
arena.lhood_lose[e + 1] = new_ll;
}
for (rev_i, lf) in links[1..].iter_mut().rev().enumerate() {
let e = n_diffs - 1 - rev_i;
let pw = arena.team_prior[e] * arena.lhood_lose[e];
let pl = arena.team_prior[e + 1] * arena.lhood_win[e + 1];
let raw = pw - pl;
arena.vars.set(lf.diff(), raw * lf.msg());
let d = lf.propagate(&mut arena.vars);
step = tuple_max(step, d);
let new_lw = pl + lf.msg();
step = tuple_max(step, arena.lhood_win[e].delta(new_lw));
arena.lhood_win[e] = new_lw;
}
iter += 1;
}
if n_diffs == 1 {
let raw = (arena.team_prior[0] * arena.lhood_lose[0])
- (arena.team_prior[1] * arena.lhood_win[1]);
arena.vars.set(links[0].diff(), raw * links[0].msg());
links[0].propagate(&mut arena.vars);
}
if n_diffs > 0 {
let pl1 = arena.team_prior[1] * arena.lhood_win[1];
arena.lhood_win[0] = pl1 + links[0].msg();
let pw_last = arena.team_prior[n_teams - 2] * arena.lhood_lose[n_teams - 2];
arena.lhood_lose[n_teams - 1] = pw_last - links[n_diffs - 1].msg();
}
self.evidence = links.iter().map(|l| l.evidence()).product();
arena.inv_buf.resize(n_teams, 0);
for (si, &orig_i) in arena.sort_buf.iter().enumerate() {
arena.inv_buf[orig_i] = si;
}
self.likelihoods = self
.teams
.iter()
.zip(self.weights.iter())
.enumerate()
.map(|(orig_i, (players, weights))| {
let si = arena.inv_buf[orig_i];
let m = arena.lhood_win[si] * arena.lhood_lose[si];
let performance = players
.iter()
.zip(weights.iter())
.fold(N00, |p, (player, &w)| p + (player.performance() * w));
players
.iter()
.zip(weights.iter())
.map(|(player, &w)| {
((m - performance.exclude(player.performance() * w)) * (1.0 / w))
.forget(player.beta.powi(2))
})
.collect::<Vec<_>>()
})
.collect::<Vec<_>>();
}
The body is identical to
likelihoodsexcept for the per-pair factor construction (no draw-margin computation,MarginFactorinstead ofTruncFactor). DRY would let us extract the loop, but the duplication is small (~50 lines) and the divergence may grow as more factor kinds are added; we accept it for clarity. Revisit in T4-Synergy if it gets unwieldy.
- Step 6: Run the test to verify it passes
Run: cargo test --lib game::tests::scored_path_sharper_when_margin_is_large
Expected: PASS.
- Step 7: Run the full test suite
Run: cargo test
Expected: all pass.
- Step 8: Format and commit
cargo +nightly fmt
git add src/game.rs
git commit -m "feat(game): add scored_with_arena driving MarginFactor links"
Task 6: Public Game::scored constructor and OwnedGame support
Files:
-
Modify:
src/game.rs -
Step 1: Write a failing test in
src/game.rs's test module
#[test]
fn game_scored_public_ctor() {
use crate::Outcome;
let prior = R::new(
Gaussian::from_ms(25.0, 25.0 / 3.0),
25.0 / 6.0,
ConstantDrift(25.0 / 300.0),
);
let opts = GameOptions {
score_sigma: 1.0,
..GameOptions::default()
};
let g = Game::scored(&[&[prior], &[prior]], Outcome::scores([8.0, 2.0]), &opts).unwrap();
let p = g.posteriors();
assert!(p[0][0].mu() > p[1][0].mu());
}
#[test]
fn game_scored_rejects_ranked_outcome() {
let prior = R::new(
Gaussian::from_ms(25.0, 25.0 / 3.0),
25.0 / 6.0,
ConstantDrift(25.0 / 300.0),
);
let err = Game::scored(
&[&[prior], &[prior]],
crate::Outcome::winner(0, 2),
&GameOptions::default(),
)
.unwrap_err();
assert!(matches!(err, crate::InferenceError::MismatchedShape { .. }));
}
- Step 2: Run the tests to verify they fail
Run: cargo test --lib game::tests::game_scored_public_ctor game::tests::game_scored_rejects_ranked_outcome
Expected: FAIL — no function or associated item named scored.
- Step 3: Add
OwnedGame::new_scoredconstructor
In OwnedGame<T, D>'s impl (around src/game.rs:46-78), add right after new:
pub(crate) fn new_scored(
teams: Vec<Vec<Rating<T, D>>>,
scores: Vec<f64>,
weights: Vec<Vec<f64>>,
score_sigma: f64,
) -> Self {
let mut arena = ScratchArena::new();
let g = Game::scored_with_arena(teams.clone(), &scores, &weights, score_sigma, &mut arena);
let likelihoods = g.likelihoods;
let evidence = g.evidence;
Self {
teams,
result: scores,
weights,
p_draw: 0.0,
likelihoods,
evidence,
}
}
- Step 4: Add
Game::scoredpublic method
In the impl<T: Time, D: Drift<T>> Game<'_, T, D> block (around src/game.rs:293-349), add right after ranked:
pub fn scored(
teams: &[&[Rating<T, D>]],
outcome: crate::Outcome,
options: &GameOptions,
) -> Result<OwnedGame<T, D>, crate::InferenceError> {
if options.score_sigma <= 0.0 {
return Err(crate::InferenceError::InvalidProbability {
value: options.score_sigma,
});
}
if outcome.team_count() != teams.len() {
return Err(crate::InferenceError::MismatchedShape {
kind: "outcome scores vs teams",
expected: teams.len(),
got: outcome.team_count(),
});
}
let scores = outcome
.as_scores()
.ok_or(crate::InferenceError::MismatchedShape {
kind: "Game::scored requires Outcome::Scored",
expected: 0,
got: 0,
})?
.to_vec();
let teams_owned: Vec<Vec<Rating<T, D>>> = teams.iter().map(|t| t.to_vec()).collect();
let weights: Vec<Vec<f64>> = teams.iter().map(|t| vec![1.0; t.len()]).collect();
Ok(OwnedGame::new_scored(teams_owned, scores, weights, options.score_sigma))
}
- Step 5: Run the new tests to verify they pass
Run: cargo test --lib game::tests::game_scored_public_ctor game::tests::game_scored_rejects_ranked_outcome
Expected: both PASS.
- Step 6: Run the full test suite
Run: cargo test
Expected: all pass.
- Step 7: Format and commit
cargo +nightly fmt
git add src/game.rs
git commit -m "feat(game): add public Game::scored constructor"
Task 7: Plumb Outcome::Scored through TimeSlice and History::add_events
Files:
- Modify:
src/time_slice.rs - Modify:
src/history.rs
The per-event Event struct in src/time_slice.rs:80-85 is { teams, evidence, weights }. We add a kind: EventKind field that selects which Game::*_with_arena to call. Score noise (score_sigma) lives inside the Scored variant so events can in principle have per-event sigma, though the public API only exposes one history-wide knob today.
- Step 1: Add
EventKindtosrc/time_slice.rsand akindfield onEvent
In src/time_slice.rs, immediately above the struct Event definition (currently around line 80), add:
#[derive(Debug, Clone, Copy)]
pub(crate) enum EventKind {
Ranked,
Scored { score_sigma: f64 },
}
Modify struct Event (currently lines 81-85) to:
#[derive(Debug)]
pub(crate) struct Event {
teams: Vec<Team>,
evidence: f64,
weights: Vec<Vec<f64>>,
kind: EventKind,
}
- Step 2: Dispatch on
kindinEvent::iteration_direct
Replace the body of Event::iteration_direct (currently src/time_slice.rs:123-144):
fn iteration_direct<T: Time, D: Drift<T>>(
&mut self,
skills: &mut SkillStore,
agents: &CompetitorStore<T, D>,
p_draw: f64,
arena: &mut ScratchArena,
) {
let teams = self.within_priors(false, false, skills, agents);
let result = self.outputs();
let g = match self.kind {
EventKind::Ranked => {
Game::ranked_with_arena(teams, &result, &self.weights, p_draw, arena)
}
EventKind::Scored { score_sigma } => {
Game::scored_with_arena(teams, &result, &self.weights, score_sigma, arena)
}
};
for (t, team) in self.teams.iter_mut().enumerate() {
for (i, item) in team.items.iter_mut().enumerate() {
let old_likelihood = skills.get(item.agent).unwrap().likelihood;
let new_likelihood = (old_likelihood / item.likelihood) * g.likelihoods[t][i];
skills.get_mut(item.agent).unwrap().likelihood = new_likelihood;
item.likelihood = g.likelihoods[t][i];
}
}
self.evidence = g.evidence;
}
- Step 3: Dispatch on
kindinTimeSlice::iteration(sequential branch)
Inside TimeSlice::iteration (currently src/time_slice.rs:295-325), replace the body of the if from > 0 || self.color_groups.is_empty() branch's inner for event in ... loop. The Game::ranked_with_arena(...) call (lines 302-308) becomes:
let g = match event.kind {
EventKind::Ranked => Game::ranked_with_arena(
teams,
&result,
&event.weights,
self.p_draw,
&mut self.arena,
),
EventKind::Scored { score_sigma } => Game::scored_with_arena(
teams,
&result,
&event.weights,
score_sigma,
&mut self.arena,
),
};
(The rest of that loop body — likelihood update + event.evidence = g.evidence — is unchanged.)
- Step 4: Dispatch on
kindinTimeSlice::log_evidence
TimeSlice::log_evidence (currently src/time_slice.rs:467-532) calls Game::ranked_with_arena in three places (lines 482-490, 506-514). For each, change to a match on event.kind mirroring Step 2.
Add a helper inside the impl to keep the call sites tidy:
fn run_event<D: Drift<T>>(
&self,
event: &Event,
online: bool,
forward: bool,
agents: &CompetitorStore<T, D>,
arena: &mut ScratchArena,
) -> f64 {
let teams = event.within_priors(online, forward, &self.skills, agents);
let result = event.outputs();
match event.kind {
EventKind::Ranked => {
Game::ranked_with_arena(teams, &result, &event.weights, self.p_draw, arena).evidence
}
EventKind::Scored { score_sigma } => {
Game::scored_with_arena(teams, &result, &event.weights, score_sigma, arena)
.evidence
}
}
}
Then replace the inline Game::ranked_with_arena(...).evidence.ln() calls with self.run_event(event, online, forward, agents, &mut arena).ln().
- Step 5: Extend
TimeSlice::add_eventssignature with per-eventkinds
Change the add_events signature (currently src/time_slice.rs:203-209) to:
pub fn add_events<D: Drift<T>>(
&mut self,
composition: Vec<Vec<Vec<Index>>>,
results: Vec<Vec<f64>>,
weights: Vec<Vec<Vec<f64>>>,
kinds: Vec<EventKind>,
agents: &CompetitorStore<T, D>,
) {
Inside the same method, update the event-construction map (around line 240). Each constructed Event gets its kind from kinds[e]:
Event {
teams,
evidence: 0.0,
weights,
kind: kinds[e],
}
- Step 6: Update
TimeSlice::add_events's tests to pass the new argument
Three call sites in src/time_slice.rs:604, :680, :759, :790, :855 (the unit tests test_one_event_each, test_same_strength, test_add_events, time_slice_color_groups_reorders_events) all call time_slice.add_events(...). Add a fourth argument vec![EventKind::Ranked; n_events] between weights and &agents for each call. Example:
time_slice.add_events(
vec![
vec![vec![a], vec![b]],
vec![vec![c], vec![d]],
vec![vec![e], vec![f]],
],
vec![vec![1.0, 0.0], vec![0.0, 1.0], vec![1.0, 0.0]],
vec![],
vec![EventKind::Ranked; 3],
&agents,
);
- Step 7: Update the
Historycallers ofTimeSlice::add_events
In src/history.rs:562 and :572, the calls pass composition, results, weights, &self.agents. Add the kinds vector. We'll thread the per-event EventKind through add_events_with_prior in Step 8 and pass it in here as kinds_chunk.
- Step 8: Extend
History::add_events_with_priorto accept and route per-event kinds
In src/history.rs:447-454, change the signature to:
pub(crate) fn add_events_with_prior(
&mut self,
composition: Vec<Vec<Vec<Index>>>,
results: Vec<Vec<f64>>,
times: Vec<T>,
weights: Vec<Vec<Vec<f64>>>,
kinds: Vec<crate::time_slice::EventKind>,
mut priors: HashMap<Index, Rating<T, D>>,
) -> Result<(), InferenceError> {
Around line 543, alongside the existing per-batch slicing of composition, results, and weights, add:
let kinds_chunk: Vec<crate::time_slice::EventKind> =
(i..j).map(|e| kinds[o[e]]).collect();
Update the two time_slice.add_events(composition, results, weights, &self.agents) call sites (lines 562 and 572) to:
time_slice.add_events(composition, results, weights, kinds_chunk, &self.agents);
(For both branches — existing-slice and new-slice. Use kinds_chunk.clone() if the borrow checker complains; the vec is small.)
Validation: also add a length check at the top of the function alongside the existing ones:
if !kinds.is_empty() && kinds.len() != composition.len() {
return Err(InferenceError::MismatchedShape {
kind: "kinds",
expected: composition.len(),
got: kinds.len(),
});
}
- Step 9: Update
record_winnerandrecord_drawto passkinds
In src/history.rs:617-647, update both calls:
self.add_events_with_prior(
vec![vec![vec![w], vec![l]]],
vec![vec![1.0, 0.0]],
vec![time],
vec![],
vec![crate::time_slice::EventKind::Ranked],
HashMap::new(),
)
Same shape for record_draw.
- Step 10: Update
History::add_eventsto compute kinds per event and pass through
Replace the placeholder match arm added in Task 3 Step 5 (around src/history.rs:672-680). The full updated event-loop body of History::add_events (around lines 671-705) becomes:
let mut kinds: Vec<crate::time_slice::EventKind> = Vec::with_capacity(events.len());
for ev in events {
let team_count = ev.teams.len();
let (results_for_event, kind): (Vec<f64>, crate::time_slice::EventKind) = match &ev.outcome {
Outcome::Ranked(ranks) => {
if ranks.len() != team_count {
return Err(InferenceError::MismatchedShape {
kind: "outcome ranks vs teams",
expected: team_count,
got: ranks.len(),
});
}
let max_rank = ranks.iter().copied().max().unwrap_or(0) as f64;
let inverted: Vec<f64> = ranks.iter().map(|&r| max_rank - r as f64).collect();
(inverted, crate::time_slice::EventKind::Ranked)
}
Outcome::Scored(scores) => {
if scores.len() != team_count {
return Err(InferenceError::MismatchedShape {
kind: "outcome scores vs teams",
expected: team_count,
got: scores.len(),
});
}
(
scores.to_vec(),
crate::time_slice::EventKind::Scored {
score_sigma: self.score_sigma,
},
)
}
};
let mut event_comp: Vec<Vec<Index>> = Vec::with_capacity(team_count);
let mut event_weights: Vec<Vec<f64>> = Vec::with_capacity(team_count);
for team in ev.teams {
let mut team_indices: Vec<Index> = Vec::with_capacity(team.members.len());
let mut team_weights: Vec<f64> = Vec::with_capacity(team.members.len());
for member in team.members {
let idx = self.keys.get_or_create(&member.key);
team_indices.push(idx);
team_weights.push(member.weight);
if let Some(prior) = member.prior {
priors.insert(idx, Rating::new(prior, self.beta, self.drift));
}
}
event_comp.push(team_indices);
event_weights.push(team_weights);
}
composition.push(event_comp);
weights.push(event_weights);
results.push(results_for_event);
times.push(ev.time);
kinds.push(kind);
}
self.add_events_with_prior(composition, results, times, weights, kinds, priors)
(Note EventKind needs to be re-exported from time_slice. Confirm pub(crate) enum EventKind in time_slice.rs is reachable from history.rs via crate::time_slice::EventKind.)
- Step 11: Add
score_sigma: f64field toHistoryandHistoryBuilder
In src/history.rs:21-37 (HistoryBuilder struct), add field score_sigma: f64,.
In the Default impl (around line 121), set score_sigma: 1.0.
In History::builder_with_key (around line 170), set score_sigma: 1.0.
In each builder transition method that constructs a new HistoryBuilder (drift at line 55, observer at line 85), copy the score_sigma field through.
Add a builder method (insert near p_draw, around line 70):
pub fn score_sigma(mut self, score_sigma: f64) -> Self {
self.score_sigma = score_sigma;
self
}
In HistoryBuilder::build (around line 100), set score_sigma: self.score_sigma, on the constructed History.
In the History struct (around line 135), add score_sigma: f64,.
- Step 12: Write a failing integration test in
tests/scored.rs(new file)
Create tests/scored.rs:
use smallvec::smallvec;
use trueskill_tt::{ConstantDrift, Event, History, Member, Outcome, Team};
#[test]
fn scored_two_team_one_event_pulls_winner_up() {
let mut h = History::builder()
.mu(25.0)
.sigma(25.0 / 3.0)
.beta(25.0 / 6.0)
.drift(ConstantDrift(0.0))
.build();
let events: Vec<Event<i64, &'static str>> = vec![Event {
time: 1,
teams: smallvec![
Team::with_members([Member::new("alice")]),
Team::with_members([Member::new("bob")]),
],
outcome: Outcome::scores([10.0, 0.0]),
}];
h.add_events(events).unwrap();
h.converge().unwrap();
let alice = h.current_skill(&"alice").unwrap();
let bob = h.current_skill(&"bob").unwrap();
assert!(alice.mu() > 25.0, "alice mu should exceed prior; got {}", alice.mu());
assert!(bob.mu() < 25.0, "bob mu should be below prior; got {}", bob.mu());
}
#[test]
fn scored_zero_margin_treats_as_tie() {
let mut h = History::builder()
.mu(25.0)
.sigma(25.0 / 3.0)
.beta(25.0 / 6.0)
.drift(ConstantDrift(0.0))
.build();
let events: Vec<Event<i64, &'static str>> = vec![Event {
time: 1,
teams: smallvec![
Team::with_members([Member::new("alice")]),
Team::with_members([Member::new("bob")]),
],
outcome: Outcome::scores([3.0, 3.0]),
}];
h.add_events(events).unwrap();
h.converge().unwrap();
let alice = h.current_skill(&"alice").unwrap();
let bob = h.current_skill(&"bob").unwrap();
assert!((alice.mu() - bob.mu()).abs() < 1e-6, "tied scores -> equal mu; got {} vs {}", alice.mu(), bob.mu());
// Sigma should still tighten (we have evidence diff ≈ 0).
assert!(alice.sigma() < 25.0 / 3.0);
}
#[test]
fn scored_three_team_partial_order() {
let mut h = History::builder()
.mu(25.0)
.sigma(25.0 / 3.0)
.beta(25.0 / 6.0)
.drift(ConstantDrift(0.0))
.build();
let events: Vec<Event<i64, &'static str>> = vec![Event {
time: 1,
teams: smallvec![
Team::with_members([Member::new("a")]),
Team::with_members([Member::new("b")]),
Team::with_members([Member::new("c")]),
],
outcome: Outcome::scores([20.0, 10.0, 5.0]),
}];
h.add_events(events).unwrap();
h.converge().unwrap();
let a = h.current_skill(&"a").unwrap();
let b = h.current_skill(&"b").unwrap();
let c = h.current_skill(&"c").unwrap();
assert!(a.mu() > b.mu());
assert!(b.mu() > c.mu());
}
#[test]
fn scored_rejects_outcome_team_count_mismatch() {
use trueskill_tt::InferenceError;
let mut h: History = History::builder().build();
let events: Vec<Event<i64, &'static str>> = vec![Event {
time: 1,
teams: smallvec![
Team::with_members([Member::new("a")]),
Team::with_members([Member::new("b")]),
],
outcome: Outcome::scores([1.0, 2.0, 3.0]),
}];
let err = h.add_events(events).unwrap_err();
assert!(matches!(err, InferenceError::MismatchedShape { .. }));
}
- Step 13: Run the integration tests
Run: cargo test --test scored
Expected: all four tests PASS (the wiring from Steps 1–11 is now complete).
- Step 14: Run the full test suite + clippy
Run: cargo test && cargo clippy --all-targets -- -D warnings
Expected: all pass, no clippy warnings. Pay particular attention to the existing time_slice unit tests — they were updated in Step 6 and need to use EventKind::Ranked.
- Step 15: Format and commit
cargo +nightly fmt
git add src/history.rs src/time_slice.rs tests/scored.rs
git commit -m "feat(history): route Outcome::Scored events through MarginFactor path"
Task 8: EventBuilder::scores convenience
Files:
-
Modify:
src/event_builder.rs -
Modify:
tests/api_shape.rs(add a fluent-builder scored test) -
Step 1: Write failing tests in
tests/api_shape.rs
Append to the existing test list:
#[test]
fn fluent_event_builder_scores() {
use trueskill_tt::ConstantDrift;
let mut h = History::builder()
.mu(25.0)
.sigma(25.0 / 3.0)
.beta(25.0 / 6.0)
.drift(ConstantDrift(0.0))
.build();
h.event(1)
.team(["alice"])
.team(["bob"])
.scores([12.0, 4.0])
.commit()
.unwrap();
h.converge().unwrap();
let a = h.current_skill(&"alice").unwrap();
let b = h.current_skill(&"bob").unwrap();
assert!(a.mu() > b.mu());
}
- Step 2: Run the test to verify it fails
Run: cargo test --test api_shape fluent_event_builder_scores
Expected: FAIL — no method named scores.
- Step 3: Add
.scorestoEventBuilder
In src/event_builder.rs, alongside .ranking/.winner/.draw (around line 73), add:
/// Set explicit per-team continuous scores; higher = better.
pub fn scores<I: IntoIterator<Item = f64>>(mut self, scores: I) -> Self {
self.event.outcome = crate::Outcome::scores(scores);
self
}
- Step 4: Run the test to verify it passes
Run: cargo test --test api_shape fluent_event_builder_scores
Expected: PASS.
- Step 5: Run the full test suite
Run: cargo test
Expected: all pass.
- Step 6: Format and commit
cargo +nightly fmt
git add src/event_builder.rs tests/api_shape.rs
git commit -m "feat(event-builder): add .scores convenience for Outcome::Scored"
Task 9: Worked example — scored matches end-to-end
Files:
-
Create:
examples/scored.rs -
Step 1: Create the example
//! Worked example: continuous-score outcomes via `Outcome::Scored`.
//!
//! Three players play a small round-robin where the score margin matters,
//! not just who won. We show how `score_sigma` controls how much weight
//! the engine places on the observed margin.
//!
//! Run with: `cargo run --example scored --release`
use smallvec::smallvec;
use trueskill_tt::{ConstantDrift, Event, History, Member, Outcome, Team};
fn main() {
let mut h = History::builder()
.mu(25.0)
.sigma(25.0 / 3.0)
.beta(25.0 / 6.0)
.drift(ConstantDrift(0.03))
.score_sigma(2.0) // tune to data; smaller = trust margins more
.build();
let events: Vec<Event<i64, &'static str>> = vec![
Event {
time: 1,
teams: smallvec![
Team::with_members([Member::new("alice")]),
Team::with_members([Member::new("bob")]),
],
outcome: Outcome::scores([21.0, 9.0]),
},
Event {
time: 2,
teams: smallvec![
Team::with_members([Member::new("bob")]),
Team::with_members([Member::new("carol")]),
],
outcome: Outcome::scores([21.0, 18.0]),
},
Event {
time: 3,
teams: smallvec![
Team::with_members([Member::new("alice")]),
Team::with_members([Member::new("carol")]),
],
outcome: Outcome::scores([21.0, 21.0]),
},
];
h.add_events(events).unwrap();
let report = h.converge().unwrap();
println!(
"converged={}, iterations={}, log_evidence={:.4}",
report.converged, report.iterations, report.log_evidence
);
for who in &["alice", "bob", "carol"] {
let s = h.current_skill(who).unwrap();
println!("{:>6}: mu={:>7.3} sigma={:.3}", who, s.mu(), s.sigma());
}
}
- Step 2: Confirm the example compiles and runs
Run: cargo run --example scored --release
Expected: prints converged=true with three player skills; alice highest, bob middle, carol lowest (or close to bob — depends on score_sigma).
- Step 3: Commit
cargo +nightly fmt
git add examples/scored.rs
git commit -m "docs(examples): worked Outcome::Scored example"
Task 10: Benchmark — scored ingestion + convergence
Files:
-
Create:
benches/scored.rs -
Modify:
Cargo.toml(add[[bench]]entry if needed) -
Step 1: Check
Cargo.tomlfor the existing bench wiring
Run: cat Cargo.toml | grep -A 3 'bench'
If auto-bench = false is set or each bench is registered explicitly, add a new entry:
[[bench]]
name = "scored"
harness = false
- Step 2: Create
benches/scored.rsmodeled onbenches/batch.rs
use criterion::{Criterion, criterion_group, criterion_main};
use smallvec::smallvec;
use trueskill_tt::{ConstantDrift, Event, History, Member, Outcome, Team};
fn bench_scored_history(c: &mut Criterion) {
c.bench_function("scored_history_60_events_30_iter", |bencher| {
bencher.iter(|| {
let mut h = History::builder()
.mu(25.0)
.sigma(25.0 / 3.0)
.beta(25.0 / 6.0)
.drift(ConstantDrift(0.03))
.score_sigma(2.0)
.build();
let mut events: Vec<Event<i64, String>> = Vec::with_capacity(60);
for i in 0..60 {
let a = format!("p{}", i % 20);
let b = format!("p{}", (i + 7) % 20);
let s_a = (i as f64 * 0.3).sin().abs() * 21.0;
let s_b = (i as f64 * 0.3).cos().abs() * 21.0;
events.push(Event {
time: 1 + (i / 6) as i64,
teams: smallvec![
Team::with_members([Member::new(a)]),
Team::with_members([Member::new(b)]),
],
outcome: Outcome::scores([s_a, s_b]),
});
}
h.add_events(events).unwrap();
h.converge().unwrap();
});
});
}
criterion_group!(benches, bench_scored_history);
criterion_main!(benches);
The
Historyhere usesStringkeys to match the typical real-world bench shape; ifHistory<i64, _, _, String>requiresbuilder_with_key, adapt accordingly.
- Step 3: Verify the benchmark compiles
Run: cargo bench --no-run --bench scored
Expected: builds without error.
- Step 4: Run the benchmark and capture a baseline number
Run: cargo bench --bench scored 2>&1 | tee benches/scored_baseline.txt
(Save the result alongside the existing benches/baseline.txt so future tiers can compare.)
- Step 5: Commit
cargo +nightly fmt
git add benches/scored.rs benches/scored_baseline.txt Cargo.toml
git commit -m "bench(scored): add criterion bench mirroring batch bench"
Task 11: Documentation — README + CLAUDE.md status update
Files:
-
Modify:
README.md -
Modify:
CLAUDE.md -
Modify:
docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md(mark MarginFactor done) -
Step 1: Add a "Scored outcomes" subsection to
README.md
Find the existing ## Usage section (or equivalent) and add:
### Scored outcomes
Use `Outcome::scores([...])` when you have continuous per-team scores rather
than just ranks. Adjacent score margins flow into a `MarginFactor` that adds
soft Gaussian evidence about the latent performance diff. Configure
`HistoryBuilder::score_sigma(σ)` to control how much you trust the margins
(smaller σ = more trust).
```rust
use trueskill_tt::{History, Outcome};
let mut h = History::builder().score_sigma(2.0).build();
h.event(1)
.team(["alice"])
.team(["bob"])
.scores([21.0, 9.0])
.commit()
.unwrap();
h.converge().unwrap();
```
(Replace the backticks-surrounded fence indicators above (```rust and `````) with proper triple backticks; the zero-width chars are there to avoid breaking this plan file's nesting.)
- Step 2: Update
CLAUDE.mdarchitecture notes
In CLAUDE.md, add to the existing factor list (or near the architecture section):
- `MarginFactor` (factor/margin.rs) — Gaussian observation factor on a diff variable; engaged by `Outcome::Scored`.
- Step 3: Mark the T4-Margin item complete in the spec
In docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md, find the T4 section (line 577 onward):
- `MarginFactor` → enables `Outcome::Scored`.
Change to:
- `MarginFactor` → enables `Outcome::Scored`. **Done** (see `docs/superpowers/plans/2026-04-27-t4-margin-factor.md`).
- Step 4: Final full test + clippy + fmt run
Run:
cargo +nightly fmt
cargo clippy --all-targets -- -D warnings
cargo test
cargo bench --no-run
Expected: all green, no warnings, all bench targets compile.
- Step 5: Commit
git add README.md CLAUDE.md docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md
git commit -m "docs(t4-margin): document Outcome::Scored and mark spec item done"
Acceptance criteria
- All existing lib + integration tests still pass with their existing golden values (Trunc path is bit-for-bit unchanged after the
DiffFactorrefactor in Task 4). cargo test --test scoredpasses all four tests added in Task 7.cargo run --example scored --releaseruns and prints sensible posteriors.cargo bench --bench scoredproduces a baseline result saved underbenches/.cargo clippy --all-targets -- -D warningsis clean.Outcome::Scoredis accepted by the public API:History::add_events,History::event(...).scores(...), andGame::scored.score_sigmais configurable viaHistoryBuilder::score_sigmaandGameOptions::score_sigma, default1.0.
Out of scope (deferred to later T4 plans)
- Damped / Residual schedules
- SynergyFactor
- ScoreFactor (continuous outcome variable distinct from observed margin)
- Per-event
score_sigmaoverrides (currently history-wide) - Tie-band MarginFactor variant (
m_obsband rather than point observation)