Files
trueskill-tt/CLAUDE.md
Anders Olsson 8b53cacd64 T4 (MarginFactor): scored outcomes via Gaussian-margin EP evidence
Adds soft Gaussian-observation evidence on the per-pair diff variable,
enabling continuous score margins as a richer alternative to ranks.

Public API:
- `Outcome::Scored([scores])` (non-breaking enum extension under
  `#[non_exhaustive]`).
- `Game::scored(teams, outcome, options)` constructor parallel to
  `Game::ranked`.
- `EventBuilder::scores([...])` fluent helper.
- `HistoryBuilder::score_sigma(σ)` knob (default 1.0, validated > 0).
- `GameOptions::score_sigma`.
- `EventKind` re-exported from `lib.rs` (annotated `#[non_exhaustive]`).
- New `InferenceError::InvalidParameter { name, value }` variant.

Internals:
- `MarginFactor` (`factor/margin.rs`): Gaussian observation factor that
  closes in one EP step; cavity-cached log-evidence mirrors `TruncFactor`.
- `BuiltinFactor::Margin` dispatch arm.
- `DiffFactor` enum in `game.rs` lets `Game::likelihoods` and the new
  `likelihoods_scored` share the per-pair link abstraction.
- Per-event `EventKind { Ranked, Scored { score_sigma } }` routed through
  `TimeSlice::add_events`, `iteration_direct`, and `log_evidence`.

Tests: 88 lib + 27 integration (4 new in `tests/scored.rs`); existing
goldens byte-identical.  Bench: `benches/scored.rs` baseline ~960µs for
60 events × 20-player pool with default convergence.

Plan: docs/superpowers/plans/2026-04-27-t4-margin-factor.md
Spec item marked Done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 08:47:36 +02:00

47 lines
3.0 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Commands
```bash
cargo build # Build the library
cargo test --lib # Run all library tests
cargo test --lib <test_name> # Run a single test by name
cargo test --lib -- --nocapture # Run tests with stdout output
cargo clippy # Lint
cargo bench # Run benchmarks (criterion)
```
The `approx` feature enables `approx::AbsDiffEq` for `Gaussian`:
```bash
cargo test --features approx
```
## Architecture
This is a Rust port of [TrueSkillThroughTime.py](https://github.com/glandfried/TrueSkillThroughTime.py) — a Bayesian skill rating system that tracks skill evolution over time using Gaussian message passing.
### Data flow
```
History → Batch[] → Game[] → teams/players
```
- **`History`** (`history.rs`) — top-level container. Organizes games by time into `Batch`es, runs forward/backward message passing across batches, and exposes `learning_curves()` and `log_evidence()`.
- **`Batch`** (`batch.rs`) — all games at a single time step. Runs `iteration()` to update skill estimates via `Game::posteriors()`, collecting `Skill` distributions per player.
- **`Game`** (`game.rs`) — a single match. Given teams (slices of `Gaussian`), computes posterior skill distributions using Gaussian factor graphs and `message.rs` helpers.
- **`Agent`** (`agent.rs`) — wraps a `Player` with temporal state (`last_time`, `message`). `receive()` applies time-decay (`gamma`) when the player reappears after a gap.
- **`Player`** (`player.rs`) — static configuration: prior `Gaussian`, `beta` (performance noise), `gamma` (skill drift per time unit).
- **`Gaussian`** (`gaussian.rs`) — core probability type. Stored as natural parameters (`pi = 1/sigma²`, `tau = mu/sigma²`). Arithmetic ops implement message multiplication/division in the factor graph.
- **`message.rs`** — `TeamMessage` and `DiffMessage`: intermediate factor graph messages used inside `Game`.
- **`MarginFactor`** (`factor/margin.rs`) — Gaussian observation factor on a diff variable; engaged by `Outcome::Scored`.
- **`lib.rs`** — exports the public API (`Game`, `Gaussian`, `History`, `Player`) and standalone functions (`quality()`, `pdf()`, `cdf()`, `erfc()`). Also defines global defaults: `MU=0.0`, `SIGMA=6.0`, `BETA=1.0`, `GAMMA=0.03`, `P_DRAW=0.0`, `EPSILON=1e-6`, `ITERATIONS=30`.
### Key design points
- `History` uses `IndexMap<K>` (defined in `lib.rs`) to map arbitrary player keys to `Agent` state.
- Convergence is measured by the maximum `delta()` across all skill distributions; iteration stops when below `EPSILON` or after `ITERATIONS` rounds.
- The `approx` feature gates `AbsDiffEq` on `Gaussian` for use in tests — the feature is optional and only needed for approximate equality assertions.
- `time` in `History`/`Batch` is currently an `f64`; the README notes it needs to become an enum to support richer temporal states.