Commit Graph

4 Commits

Author SHA1 Message Date
6437649436 perf(arena): pool team_prior/lhood/inv buffers to eliminate per-game allocs
Move team_prior, lhood_lose, lhood_win, inv_buf into ScratchArena so
their Vec capacity is reused across games in a Batch. Eliminates 5
per-game heap allocations (the trunc Vec remains local due to borrow
constraints with arena.vars).

Batch::iteration: 23.0 µs (down from 27.0 µs with naive local Vecs;
8% above T0 21.253 µs baseline due to TruncFactor propagate overhead).
2026-04-24 09:10:48 +02:00
cdfd75f846 bench: capture T1 final numbers and fix clippy warnings
Fixed:
- Removed unused .enumerate() in batch.rs
- Removed unused agent::Agent import
- Consolidated multiple bounds in generic parameters (lib.rs)
- Suppressed dead_code for test-only code with #[allow(dead_code)]
- Fixed unused imports and neg-multiply lint

Batch::iteration: 27.023 µs (T0 was 21.253 µs, expected minor regression from T1 infrastructure).
Gaussian::* unchanged (~236-280 ps).

Acceptance: T1 factor-graph refactor lands without clippy/fmt issues.
All 53 tests pass. Closes T1 tier.
2026-04-24 09:04:29 +02:00
d3cfee53a1 bench: capture T0 final numbers and post-mortem
Batch::iteration: 29.840 µs → 21.253 µs (1.40×)
Gaussian::mul:     1.568 ns →  218.69 ps (7.17×)
Gaussian::div:     1.572 ns →  218.64 ps (7.19×)

Gaussian arithmetic hit target (7×+ vs 1.5–2× expected). Batch::iteration
reached 1.40× vs the 3× target. Post-mortem: the bench exercises 100 tiny
2-team events and the dominant cost is still Vec allocation in within_priors,
sort_perm, and Game::likelihoods. The HashMap→Vec win shows at the History
level (forward/backward sweep) which this bench doesn't exercise.

Remediation plan documented in benches/baseline.txt: arena-ify sort_perm,
within_priors, and Game::likelihoods in T1 when Game's internals are
redesigned around the new factor graph.

38/38 tests passing. Closes T0 tier.
2026-04-24 07:28:28 +02:00
06d3c886fe bench: capture T0 baseline; expose pi/tau accessors; fix div panic
- Promotes Gaussian::pi and Gaussian::tau to public so benches/gaussian.rs
  compiles, then captures baseline numbers for the T0 acceptance gate.
- Fixes the divide bench: g1/g2 panicked (g1 has lower precision than g2;
  cavity requires pi_num >= pi_den). Swapped to g2/g1 (well-defined).

Baseline on Apple M5 Pro:
  Batch::iteration  29.840 µs
  Gaussian::mul      1.568 ns   (vs ~220 ps for add/sub — hot path)
  Gaussian::div      1.572 ns
2026-04-24 06:43:00 +02:00