Comprehensive design for a multi-tier rewrite covering performance, factor-graph extensibility, convergence scheduling, and API surface. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
27 KiB
TrueSkill-TT Engine Redesign — Design
Date: 2026-04-23 Status: Approved (pending implementation plan)
Summary
Comprehensive redesign of the TrueSkill-TT engine targeting four orthogonal goals:
- Performance — substantially faster offline convergence and incremental online updates.
- Accuracy and richer match formats — support for score margins, free-for-all with partial orders, correlated skills.
- Better convergence — replace ad-hoc capped iteration with a pluggable
Scheduletrait covering all three nested loops. - Better API surface — typed event description, observer-based progress reporting, generic time axis, structured errors, ergonomic builders.
The design is comprehensive (Approach 1 of three considered) but delivered in five tiers so each step is independently shippable and validated by benchmarks.
Goals & non-goals
Goals
- 10–30× speedup on the offline convergence path for representative workloads (1000+ players, 1000+ events, 30 iterations)
- Order-of-magnitude speedup on incremental "add a single event" workloads
- Pluggable factor graph allowing new factor types without engine changes
- Optional Rayon-backed parallelism on top of
Send + Sync-correct internals - Typed, ergonomic public API; replace nested
Vec<Vec<Vec<_>>>shapes withEvent<T, K>/Team<K>/Member<K> - Generic time axis:
Untimed,i64, or user-supplied - Observer-based progress instead of
verbose: bool+println! - Structured
Result<_, InferenceError>at API boundaries
Non-goals
- WebAssembly support is not a goal; we may break it if a crate or feature requires.
- No GPU offload.
- No
no_stdsupport. - No persistent format / serde — possible future feature.
- No replacement of the Gaussian/EP approximation itself in this design (the underlying inference math stays the same; we change layout, dispatch, scheduling, and API around it).
Workload assumptions
Baseline workload that drives perf decisions:
- ~1000+ players
- ~1000+ events total
- ~50–60 events per time slice (per day)
- Both online (incremental adds) and offline (full convergence) are common
- Offline convergence runs frequently
Section 1 — Core types & traits
The foundation everything else builds on.
Gaussian — natural-parameter storage
Switch storage from (mu, sigma) to natural parameters (pi, tau) where pi = sigma⁻², tau = mu · pi. Multiplication and division dominate the hot path; in nat-params they are direct adds/subs of the components, no sqrt. Reads of mu/sigma become accessor methods (tau / pi, 1.0 / pi.sqrt()). The trade is correct because reads are vanishingly rare compared to writes in EP.
pub struct Gaussian { pi: f64, tau: f64 }
pub const UNIFORM: Gaussian = Gaussian { pi: 0.0, tau: 0.0 }; // replaces N_INF
Time trait
Replaces the bare i64 time field. Keeps History parametric.
pub trait Time: Copy + Ord + Send + Sync + 'static {
fn elapsed_to(&self, later: &Self) -> i64;
}
pub struct Untimed; // ZST for the no-time-axis case
impl Time for Untimed { fn elapsed_to(&self, _: &Self) -> i64 { 0 } }
impl Time for i64 { fn elapsed_to(&self, later: &Self) -> i64 { later - self } }
// Optional impls behind feature flags: time::OffsetDateTime, chrono types
Drift<T> trait
Generic over T: Time so seasonal/calendar-aware drift is possible without going through i64.
pub trait Drift<T: Time>: Copy + Send + Sync {
fn variance_delta(&self, from: &T, to: &T) -> f64;
}
ConstantDrift(f64) impl: to.elapsed_to(from) as f64 * gamma * gamma.
Index and KeyTable<K>
Index(usize) is the handle into dense per-History Vec storage. Public, but intended for use by power users on hot paths who want to skip the KeyTable lookup. Casual API takes &K. KeyTable<K> (renamed from IndexMap, to avoid colliding with the indexmap crate's type) maps user keys → Index.
Observer trait
Replaces verbose: bool + println!. Default no-op impls; user overrides what they need.
pub trait Observer<T: Time>: Send + Sync {
fn on_iteration_end(&self, _iter: usize, _max_step: (f64, f64)) {}
fn on_batch_processed(&self, _time: &T, _idx: usize, _n_events: usize) {}
fn on_converged(&self, _iters: usize, _final_step: (f64, f64)) {}
}
pub struct NullObserver;
impl<T: Time> Observer<T> for NullObserver {}
Trade-offs
Gaussiannatural-param representation: anyone readingmu/sigmain a hot loop pays a sqrt — but that's correct, hot reads are rare.Timeas a trait (not enum) keeps it open-ended at zero runtime cost; defaultHistory<i64, _>keeps the call sites familiar.Observeris a trait (not a closure) so different sites can have different signatures without losing type safety.NullObserveris a ZST.
Section 2 — Factor graph architecture
The current Game::likelihoods is a hand-rolled, hard-coded graph. To unlock richer formats and let us experiment with EP schedules, the graph itself becomes a data structure.
Variable / Factor model
Variables hold their current Gaussian marginal. Factors hold their outgoing messages to each connected variable plus do the local computation. Standard EP: factor's update is "divide marginal by old outgoing → cavity → apply local approximation → multiply marginal by new outgoing."
pub trait Factor: Send + Sync {
fn variables(&self) -> &[VarId];
fn propagate(&mut self, vars: &mut VarStore) -> (f64, f64); // returns max delta
fn log_evidence(&self, _vars: &VarStore) -> f64 { 0.0 }
}
Built-in factor catalog
| Factor | Purpose | Status |
|---|---|---|
PerformanceFactor |
skill → performance (add β² noise, optional weight) | replaces inline performance() * weight |
TeamSumFactor |
weighted sum of player perfs → team perf | replaces inline fold |
RankDiffFactor |
(team_a perf) − (team_b perf) → diff var | currently team[e].posterior_win() − team[e+1].posterior_lose() |
TruncFactor |
EP truncation: P(diff > margin) or `P( |
diff |
MarginFactor (future) |
use observed score margin as soft evidence | enables richer match formats |
SynergyFactor (future) |
couples teammates' skills | enables different topology |
ScoreFactor (future) |
continuous outcome (e.g., points scored) | enables score-based outcomes |
The first four together exactly reproduce today's algorithm. The last three are extension slots.
Game = factor graph + schedule
pub struct Game<S: Schedule = DefaultSchedule> {
vars: VarStore, // SoA: Vec<Gaussian> marginals
factors: FactorList, // enum dispatch over BuiltinFactor (see Open Questions)
schedule: S,
}
Lean toward enum dispatch (enum BuiltinFactor { Perf(...), Sum(...), RankDiff(...), Trunc(...), ... }) over Box<dyn Factor> for the built-ins:
- avoids per-message vtable overhead in the hottest loop
- keeps factor data inline (no heap indirection)
- still allows user-defined factors via a
BuiltinFactor::Custom(Box<dyn Factor>)variant
Schedule trait
Controls iteration order and stopping. Default = current behavior (sweep forward, then backward, until ε or max iters). Pluggable so we can later try damped EP or junction-tree schedules.
High-level constructors
Game::ranked(teams, results, options) // dominant case
Game::free_for_all(players, ranking) // FFA with possible ties
Game::custom(builder) // power users build their own graph
GameOptions carries iteration cap, epsilon, p_draw, and approximation choice. Today these are scattered between method args and module constants.
Trade-offs
- Enum dispatch over trait objects for built-ins; richer factors drop in via new enum variants.
- Variables and factor messages stored as
Vec<Gaussian>indexed byVarId/ edge slot — flat, cache-friendly. Scheduleis a generic parameter (zero-cost); most users get default; experimentation is open.
Open question
Whether enum BuiltinFactor will feel too closed-world. The Custom(Box<dyn Factor>) escape hatch helps but inner-loop perf for user factors will be slower. Acceptable for now; flagged for future revisit if it becomes a problem.
Section 3 — Storage layout (SoA + arenas)
Dense Vec keyed by Index
Every HashMap<Index, T> becomes a Vec<T> (or Vec<Option<T>> for sparse) indexed directly by Index.0. The public-facing KeyTable<K> continues to map arbitrary keys → Index.
SoA at hot layers, AoS at boundaries
The Skill struct stays as a public type for the API (returned from learning_curves, etc.), but inside TimeSlice we lay it out column-wise:
struct TimeSliceSkills {
forward: Vec<Gaussian>, // [n_agents]
backward: Vec<Gaussian>,
likelihood: Vec<Gaussian>,
online: Vec<Gaussian>,
elapsed: Vec<i64>,
present: Vec<bool>,
}
Within a slice, the inner loops touch one column repeatedly across many events — keeping the column contiguous improves cache utilization and makes the eventual SIMD step (Section 6) straightforward.
Gaussian itself stays as a single 16-byte struct in the Vec<Gaussian>. Splitting into two parallel Vec<f64>s wins for pure SIMD over thousands of Gaussians but loses for the random-access patterns dominant in EP. Revisit if benchmarks demand it.
Arena allocator inside Game
Replace per-event allocations with a ScratchArena reused across calls.
pub struct ScratchArena {
var_buf: Vec<Gaussian>,
factor_buf: Vec<Gaussian>, // edge messages
bool_buf: Vec<bool>,
f64_buf: Vec<f64>,
}
impl ScratchArena {
fn reset(&mut self); // sets len=0, keeps capacity
fn alloc_vars(&mut self, n: usize) -> &mut [Gaussian];
}
TimeSlice owns one ScratchArena; each event borrows it for the duration of its Game construction and inference. For the parallel-slice story (Section 6), each Rayon task gets its own arena.
Per-event storage layout
Inside a TimeSlice, each event is stored column-wise as well, with Item inlined into team-level parallel arrays:
struct EventStorage {
teams: SmallVec<[TeamStorage; 4]>,
outcome: Outcome,
weights: SmallVec<[SmallVec<[f64; 4]>; 4]>,
evidence: f64,
}
struct TeamStorage {
competitors: SmallVec<[Index; 4]>, // who's on the team
edge_messages: SmallVec<[Gaussian; 4]>, // outgoing message per slot
output: f64,
}
Iteration over (competitor, edge_message) pairs zips two slices — no per-element struct.
SmallVec for typical shapes
Teams ≤ ~5 players, games ≤ ~8 teams. SmallVec<[T; 8]> for team membership and SmallVec<[T; 4]> for team rosters keeps the common case allocation-free.
Trade-offs
- Dense
Vec<T>keyed byIndexis faster but means agent removal needs tombstones (or just leaves slots present-but-inactive). Acceptable: TrueSkill histories rarely remove players. - SoA at
TimeSlicelevel only, not atHistorylevel.HistorykeepsVec<TimeSlice>because slices are heterogeneous in size. - One
ScratchArenaperTimeSlicekeeps the lifetime story simple.
Open question
The TimeSliceSkills sketch above uses (b) dense + present mask: one slot per agent in the history, indexed directly by Index, with a present: Vec<bool> mask for batches the agent didn't participate in. The alternative is (a) sparse columnar: a Vec<Index> of present agents and parallel Vec<Gaussian> columns of length n_present, with a separate lookup (binary search or auxiliary table) to find a given Index's slot.
(b) gives O(1) lookup and SIMD-friendly columns but wastes memory for sparsely populated slices. (a) is leaner per-slice but pays per-lookup cost in the inner loop. Bench both during T0 and pick. Default proposal: (b), since modern systems are memory-rich and the parallelism story is cleaner.
Section 4 — API surface
Typed event description
pub struct Event<T: Time, K> {
pub time: T,
pub teams: SmallVec<[Team<K>; 4]>,
pub outcome: Outcome,
}
pub struct Team<K> {
pub members: SmallVec<[Member<K>; 4]>,
}
pub struct Member<K> {
pub key: K,
pub weight: f64, // default 1.0
pub prior: Option<Rating>, // per-event override
}
pub enum Outcome {
Ranked(SmallVec<[u32; 4]>), // rank per team; equal ranks = tie
Scored(SmallVec<[f64; 4]>), // continuous score per team (engages MarginFactor)
}
Outcome::winner(0), Outcome::draw(), Outcome::ranking([0,1,2]) are convenience constructors.
Builders
let mut history = History::<i64, _>::builder()
.mu(25.0).sigma(25.0/3.0).beta(25.0/6.0)
.drift(ConstantDrift(0.03))
.p_draw(0.10)
.convergence(ConvergenceOptions { max_iter: 30, epsilon: 1e-6 })
.observer(LogObserver::default())
.build();
For the no-time case, type inference picks Untimed:
let mut history = History::<Untimed, _>::builder().build();
Three-tier event ingestion
// 1. Bulk ingestion (high-throughput path)
history.add_events(events_iter)?;
// 2. One-off match (very common in practice)
history.record_winner("alice", "bob", time)?;
history.record_draw("alice", "bob", time)?;
// 3. Builder for irregular shapes
history.event(time)
.team(["alice", "bob"]).weights([1.0, 0.7])
.team(["carol"])
.ranking([1, 0])
.commit()?;
Convergence & queries
let report: ConvergenceReport = history.converge()?;
let curve: Vec<(i64, Gaussian)> = history.learning_curve(&"alice");
let all = history.learning_curves(); // HashMap<&K, Vec<(T, Gaussian)>>
let now = history.current_skill(&"alice"); // Option<Gaussian>
let ev = history.log_evidence();
let ev_for = history.log_evidence_for(&["alice", "bob"]);
let q = history.predict_quality(&[&["alice"], &["bob"]]);
let p_win = history.predict_outcome(&[&["alice"], &["bob"]]);
Standalone Game
let g = Game::ranked(&[&[alice], &[bob]], Outcome::winner(0), &options);
let post = g.posteriors();
// Convenience
let (a, b) = Game::one_v_one(&alice, &bob, Outcome::winner(0));
Errors
Replace debug_assert!/panic! at the API boundary with Result.
pub enum InferenceError {
MismatchedShape { kind: &'static str, expected: usize, got: usize },
InvalidProbability { value: f64 },
ConvergenceFailed { last_step: (f64, f64), iterations: usize },
NegativePrecision { pi: f64 },
}
Hot inner loops still use debug_assert! for invariants the API has already enforced.
Trade-offs
- Generic over user's
K; engine works inIndex. Public outputs use&K. SmallVeceverywhere on the event-description path.- Three-tier API so casual users don't drown in types and bulk users still get throughput.
Outcomeenum replaces the "lower number wins"&[f64]convention.
Open question
Whether to expose Index directly to users via an intern_key(&K) -> Index method, letting hot-path callers skip the KeyTable lookup on every call. Recommendation: yes — public Index handle plus history.lookup<Q: Borrow<K>>(&Q) -> Option<Index>. The casual API still takes &K everywhere; power users can promote to Index when profiling demands.
Section 4½ — Naming pass
| Current | New | Rationale |
|---|---|---|
History |
History (kept) |
Matches upstream; reads cleanly. |
Batch |
TimeSlice |
Says what it is: every event sharing one timestamp. |
Player |
Rating |
The struct holds prior/beta/drift — that's a rating configuration. Resolves the Player/Agent confusion. |
Agent |
Competitor |
Holds dynamic state for someone competing in the history; fits the domain. |
Skill |
Skill (kept) |
Per-time-slice skill estimate; clearer than BatchSkill. |
Item |
inlined into TeamStorage columns (engine) / Member<K> (public) |
Eliminates the per-element struct in the hot path; gives API users a clear "team member" name. |
Game |
Game (kept) |
Match collides with Rust's match. |
Index |
Index (kept) |
Internal handle. |
IndexMap |
KeyTable |
Avoids confusion with the indexmap crate. |
Section 5 — Convergence & message scheduling
Three nested loops, one mechanism
The system has three nested convergence loops:
- Within-game: EP sweeps over the factor graph
- Within-time-slice: re-running games as inputs change
- Cross-history: forward-pass then backward-pass over all slices
All three implement Workload; one Schedule impl drives all of them.
pub trait Schedule {
fn run<W: Workload>(&self, workload: &mut W) -> ScheduleReport;
}
pub trait Workload {
fn step(&mut self) -> (f64, f64);
fn snapshot_evidence(&self) -> f64 { 0.0 }
}
pub struct ScheduleReport {
pub iterations: usize,
pub final_step: (f64, f64),
pub converged: bool,
}
Built-in schedules
| Schedule | Behavior | Use |
|---|---|---|
EpsilonOrMax { eps, max } |
Default. Sweep until (dpi, dtau) ≤ eps or max iters. |
All three loops. Replicates current behavior. |
Damped { eps, max, alpha } |
Same, but writes α·new + (1−α)·old. |
Stuck oscillations. |
Residual { eps, max } |
Priority-queue: re-update factor with largest pending delta first. | Faster convergence on uneven graphs. |
OneShot |
Exactly one pass, no convergence check. | Online incremental adds. |
Stopping in natural-param space
Switch from (|Δmu|, |Δsigma|) ≤ epsilon to (|Δpi|, |Δtau|) ≤ (eps_pi, eps_tau):
muandsigmaare on different scales; one tolerance is wrong for both- We store in nat-params anyway — checking convergence in mu/sigma costs free sqrts
- Nat-param delta is the natural geometry of the EP fixed point
Default EpsilonOrMax::default() exposes a single epsilon for simplicity; advanced ctor exposes both tolerances.
Within-game improvements
- Replace hard-cap of 10 iterations with
GameOptions::schedulethat propagatesScheduleReportupward - Fast path: graphs with no diff chain (1v1 with 1 iter sufficient) skip the loop entirely
- FFA / many-team ranks benefit from
Residual; opt-in
Within-slice and cross-history improvements
- No more old/new HashMap snapshotting: track deltas inline as we write under SoA
- Per-slice dirty bits: a
TimeSlicewhose neighbor messages haven't changed since its last full sweep doesn't need to re-run. Tracktime_slice.dirtyand skip clean ones during the cross-history sweep. Big win for online-add (the locality case).
ConvergenceReport
pub struct ConvergenceReport {
pub iterations: usize,
pub final_step: (f64, f64),
pub log_evidence: f64,
pub converged: bool,
pub per_iteration_time: SmallVec<[Duration; 32]>,
pub batches_skipped: usize,
}
Observer continues to receive per-iteration callbacks for live UI; ConvergenceReport is the post-hoc summary.
Trade-offs
- One
Scheduletrait shared across loops — fewer concepts, more composable. - Convergence checks in nat-param space — slightly different exact threshold than today; tests' epsilons re-tuned mechanically.
- Dirty-bit skipping changes iteration order vs. today; fixed point is the same, iteration counts may shift downward.
ResidualandDampedare opt-in; default behavior matches today closely.
Open question
Whether Schedule::run should take an optional Observer reference. Recommendation: observation lives at a higher layer (History::converge calls observer hooks; Schedule is purely the loop driver).
Section 6 — Concurrency & parallelism
What's parallelizable
| Operation | Parallelism | Strategy |
|---|---|---|
History::converge() (full forward+backward) |
Sequential across slices | Within each slice: color-group events in parallel via Rayon |
History::add_events(...) |
Sequential append, but ingestion of typed events into EventStorage parallelizes trivially |
n/a |
History::learning_curves() |
Per-key parallel | into_par_iter() |
History::log_evidence_for(targets) |
Per-batch parallel, reduce sum | par_iter().map(...).sum() |
Game inference |
Sequential | n/a (too small to amortize Rayon overhead) |
Within-slice color-group parallelism
When events are added to a slice, partition them into color groups where events in the same color touch no shared Index. Within a color, run events in parallel via Rayon. Across colors, run sequentially. Preserves asynchronous-EP semantics exactly.
Alternative: synchronous EP with snapshot. All events read from a frozen skill snapshot, write deltas to thread-local buffers, barrier merges. Trivially parallel but weaker per-iteration convergence — needs damping. Available as a Schedule impl, opt-in.
Send + Sync requirements
All public traits (Time, Drift, Observer, Factor, Schedule) require Send + Sync. Observer impls must be thread-safe (called from arbitrary worker threads).
Rayon as default-on feature
rayon as default-on feature; with default-features = false, parallel paths fall back to sequential iterators behind cfg(feature = "rayon").
Expected speedup ballpark
For 1000 players, 60 events/slice × 1000 slices, 30 convergence iterations:
| Source | Estimated speedup vs. today |
|---|---|
HashMap → dense Vec |
2–4× |
Natural-param Gaussian, no-sqrt mul/div |
1.5–2× |
Pre-allocated ScratchArena |
1.2–1.5× |
| Color-group parallel events in slice (8 cores) | 2–4× |
| Dirty-bit slice skipping (online add case) | 5–50× |
| Combined (offline converge) | ~10–30× |
| Combined (online add) | ~50–500× depending on locality |
These are pre-implementation estimates. Each tier validates with criterion.
Trade-offs
- Color-group parallelism requires up-front graph coloring at ingestion. Cost: linear in events, run once per
add_events. Cheap. - Default = asynchronous EP (preserves current semantics). Synchronous opt-in only.
- Cross-slice sweep stays sequential; no speculative parallel sweeps.
- Rayon default-on but feature-gated.
Open question
Whether to expose color-group partitioning to users. Recommendation: hidden by default, escape hatch via add_events_with_partition(...) for power users who already know their event independence.
Section 7 — Migration, testing, and delivery plan
The crate is unreleased, so version-bump ceremony doesn't apply. Tiers are sequencing of work and milestones, not releases.
Tier sequence
T0 — Numerical parity (no API change)
Internal-only. Public surface unchanged.
- Switch
Gaussianstorage to natural parameters(pi, tau).mu()/sigma()become accessors. - Replace
HashMap<Index, _>with denseVec<_>keyed byIndex.0everywhere. - Introduce
ScratchArenainsideBatchsoGame::newstops allocating per-event. - Drop the
panic!inmu_sigma; returnResultpropagated upward.
Acceptance: existing test suite passes (bit-equal where possible, ULP-bounded where natural-param arithmetic shifts a rounding); cargo bench shows ≥3× win on batch benchmark; no API breakage.
T1 — Factor graph machinery (internal-only)
- Introduce
Factor,VarStore,Scheduleaspub(crate)types. - Re-implement
Game::likelihoods()on top ofBuiltinFactor::{Perf, TeamSum, RankDiff, Trunc}driven byEpsilonOrMax. - Replace within-game iteration tracking with
ScheduleReport.
Acceptance: existing test suite passes (ULP-bounded); within-game iteration counts unchanged; benchmarks ≥ T0.
T2 — New API surface (breaking)
All renames and the new public API land together. No half-renamed intermediate state.
- New types:
Rating,TimeSlice,Competitor,Member<K>,Outcome,Event<T, K>,KeyTable<K>. Timetrait introduced;History<T: Time, D: Drift<T>>is generic.- Three-tier API surface:
record_winner,event(...).team(...).commit(), bulkadd_events(iter). Observertrait +ConvergenceReport;verbose: booldeleted.panic!/debug_assert!at API boundary becomeResult<_, InferenceError>.- Promote
Factor/Schedule/VarStoretopubunder afactorsmodule.
Acceptance: full test suite rewritten in new API; equivalence tests prove identical posteriors vs. old API on the same inputs.
T3 — Concurrency
Send + Syncaudit and bounds on all public traits.- Color-group partitioning at
TimeSliceingestion. rayonas default-on feature with#[cfg(feature = "rayon")]fallback.- Parallel paths: within-slice color groups,
learning_curves,log_evidence_for.
Acceptance: deterministic posteriors across RAYON_NUM_THREADS={1,2,4,8}; benchmarks show >2× on 8-core for offline converge.
T4 — Richer factor types & schedules
Each shipped independently after T3.
MarginFactor→ enablesOutcome::Scored.DampedandResidualschedules.SynergyFactor,ScoreFactor→ same pattern when wanted.
Each comes with its own benchmark and a worked example in examples/.
Testing strategy
| Layer | Approach |
|---|---|
| Numerical correctness | Keep existing hardcoded golden values from test_1vs1, test_1vs1_draw, test_2vs1vs2_mixed, etc. through T0–T1 unchanged. They are a regression net against the original Python port. |
| API parity | T2 adds an equivalence test module that runs identical inputs through old vs. new construction and compares posteriors within ULPs. |
| Property tests | Add proptest for: factor graph fixed-point invariance under message order, Outcome round-trip, Gaussian mul/div associativity in nat-params, schedule convergence regardless of starting state. |
| Determinism | T3 adds tests that run identical input across multiple Rayon thread counts and assert identical posteriors. |
| Benchmark gates | Each tier has a "must not regress" gate vs. the previous tier on the existing batch and gaussian criterion suites. T0 must beat baseline by ≥3×; T1 ≥ T0; etc. |
Risk management
- T0 risk: rounding drift in tests. Mitigation: where natural-param arithmetic legitimately changes the last ULPs, update goldens and simultaneously add a parity test against a snapshot taken from baseline to prove the difference is bounded.
- T2 risk: API design mistakes. Mitigation: review the spec and a worked example before implementing; iterate on feedback.
- T3 risk: subtle race conditions in color-group partitioning. Mitigation:
loomtests for the merge step; deterministic-output assertion across thread counts. - Cross-tier risk: scope creep. Each tier has a closed checklist; new ideas go to the next tier's wishlist.
What we're explicitly not doing
- No GPU offload.
- No
no_stdsupport. - No serde / persistence in this design.
- No incremental online API beyond
record_winner/add_events.
Open questions summary
Collected here for the review pass:
enum BuiltinFactorextensibility — may feel too closed-world; revisit if user-defined factors viaCustom(Box<dyn Factor>)become common.- Sparse vs. dense per-slice skill storage — default to dense +
presentmask; sparse columnar is the alternative. Decided by T0 benchmarks. Indexexposure for hot paths — exposeintern_key/lookupso power users can promote&KtoIndexand skip theKeyTablelookup; casual API still takes&Keverywhere.Schedule::runand observer wiring — observation stays at higher layer (History::convergecalls observer hooks;Scheduleis purely the loop driver).- Color-group partition exposure — hidden by default, escape hatch via
add_events_with_partition(...).