Files

Anders Olsson 8b53cacd64 T4 (MarginFactor): scored outcomes via Gaussian-margin EP evidence

Adds soft Gaussian-observation evidence on the per-pair diff variable,
enabling continuous score margins as a richer alternative to ranks.

Public API:
- `Outcome::Scored([scores])` (non-breaking enum extension under
  `#[non_exhaustive]`).
- `Game::scored(teams, outcome, options)` constructor parallel to
  `Game::ranked`.
- `EventBuilder::scores([...])` fluent helper.
- `HistoryBuilder::score_sigma(σ)` knob (default 1.0, validated > 0).
- `GameOptions::score_sigma`.
- `EventKind` re-exported from `lib.rs` (annotated `#[non_exhaustive]`).
- New `InferenceError::InvalidParameter { name, value }` variant.

Internals:
- `MarginFactor` (`factor/margin.rs`): Gaussian observation factor that
  closes in one EP step; cavity-cached log-evidence mirrors `TruncFactor`.
- `BuiltinFactor::Margin` dispatch arm.
- `DiffFactor` enum in `game.rs` lets `Game::likelihoods` and the new
  `likelihoods_scored` share the per-pair link abstraction.
- Per-event `EventKind { Ranked, Scored { score_sigma } }` routed through
  `TimeSlice::add_events`, `iteration_direct`, and `log_evidence`.

Tests: 88 lib + 27 integration (4 new in `tests/scored.rs`); existing
goldens byte-identical.  Bench: `benches/scored.rs` baseline ~960µs for
60 events × 20-player pool with default convergence.

Plan: docs/superpowers/plans/2026-04-27-t4-margin-factor.md
Spec item marked Done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-27 08:47:36 +02:00

3.0 KiB

Raw Permalink Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

cargo build                          # Build the library
cargo test --lib                     # Run all library tests
cargo test --lib <test_name>         # Run a single test by name
cargo test --lib -- --nocapture      # Run tests with stdout output
cargo clippy                         # Lint
cargo bench                          # Run benchmarks (criterion)

The approx feature enables approx::AbsDiffEq for Gaussian:

cargo test --features approx

Architecture

This is a Rust port of TrueSkillThroughTime.py — a Bayesian skill rating system that tracks skill evolution over time using Gaussian message passing.

Data flow

History  →  Batch[]  →  Game[]  →  teams/players

History (history.rs) — top-level container. Organizes games by time into Batches, runs forward/backward message passing across batches, and exposes learning_curves() and log_evidence().
Batch (batch.rs) — all games at a single time step. Runs iteration() to update skill estimates via Game::posteriors(), collecting Skill distributions per player.
Game (game.rs) — a single match. Given teams (slices of Gaussian), computes posterior skill distributions using Gaussian factor graphs and message.rs helpers.
Agent (agent.rs) — wraps a Player with temporal state (last_time, message). receive() applies time-decay (gamma) when the player reappears after a gap.
Player (player.rs) — static configuration: prior Gaussian, beta (performance noise), gamma (skill drift per time unit).
Gaussian (gaussian.rs) — core probability type. Stored as natural parameters (pi = 1/sigma², tau = mu/sigma²). Arithmetic ops implement message multiplication/division in the factor graph.
message.rs — TeamMessage and DiffMessage: intermediate factor graph messages used inside Game.
MarginFactor (factor/margin.rs) — Gaussian observation factor on a diff variable; engaged by Outcome::Scored.
lib.rs — exports the public API (Game, Gaussian, History, Player) and standalone functions (quality(), pdf(), cdf(), erfc()). Also defines global defaults: MU=0.0, SIGMA=6.0, BETA=1.0, GAMMA=0.03, P_DRAW=0.0, EPSILON=1e-6, ITERATIONS=30.

Key design points

History uses IndexMap<K> (defined in lib.rs) to map arbitrary player keys to Agent state.
Convergence is measured by the maximum delta() across all skill distributions; iteration stops when below EPSILON or after ITERATIONS rounds.
The approx feature gates AbsDiffEq on Gaussian for use in tests — the feature is optional and only needed for approximate equality assertions.
time in History/Batch is currently an f64; the README notes it needs to become an enum to support richer temporal states.