T4 (MarginFactor): scored outcomes via Gaussian-margin EP evidence

Adds soft Gaussian-observation evidence on the per-pair diff variable,
enabling continuous score margins as a richer alternative to ranks.

Public API:
- `Outcome::Scored([scores])` (non-breaking enum extension under
  `#[non_exhaustive]`).
- `Game::scored(teams, outcome, options)` constructor parallel to
  `Game::ranked`.
- `EventBuilder::scores([...])` fluent helper.
- `HistoryBuilder::score_sigma(σ)` knob (default 1.0, validated > 0).
- `GameOptions::score_sigma`.
- `EventKind` re-exported from `lib.rs` (annotated `#[non_exhaustive]`).
- New `InferenceError::InvalidParameter { name, value }` variant.

Internals:
- `MarginFactor` (`factor/margin.rs`): Gaussian observation factor that
  closes in one EP step; cavity-cached log-evidence mirrors `TruncFactor`.
- `BuiltinFactor::Margin` dispatch arm.
- `DiffFactor` enum in `game.rs` lets `Game::likelihoods` and the new
  `likelihoods_scored` share the per-pair link abstraction.
- Per-event `EventKind { Ranked, Scored { score_sigma } }` routed through
  `TimeSlice::add_events`, `iteration_direct`, and `log_evidence`.

Tests: 88 lib + 27 integration (4 new in `tests/scored.rs`); existing
goldens byte-identical.  Bench: `benches/scored.rs` baseline ~960µs for
60 events × 20-player pool with default convergence.

Plan: docs/superpowers/plans/2026-04-27-t4-margin-factor.md
Spec item marked Done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-27 08:47:36 +02:00
parent 6bf3e7e294
commit 8b53cacd64
23 changed files with 3005 additions and 83 deletions

View File

@@ -0,0 +1,14 @@
Finished `bench` profile [optimized + debuginfo] target(s) in 0.02s
Running benches/scored.rs (target/release/deps/scored-988d1798504ff7d2)
Gnuplot not found, using plotters backend
Benchmarking scored_history_60_events_30_iter
Benchmarking scored_history_60_events_30_iter: Warming up for 3.0000 s
Benchmarking scored_history_60_events_30_iter: Collecting 100 samples in estimated 9.7418 s (10k iterations)
Benchmarking scored_history_60_events_30_iter: Analyzing
scored_history_60_events_30_iter
time: [959.36 µs 962.68 µs 966.13 µs]
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
5 (5.00%) high mild
5 (5.00%) high severe