perf(game): revert Task 10 SmallVec changes — caused sequential regression

The Vec<Vec<_>> → SmallVec<[SmallVec<[_;8]>;8]> change in Task 10 regressed Batch::iteration from 23.29 µs to 29.73 µs (+28%). The SmallVec was motivated by reducing parallel-path allocations but it hurt the sequential path substantially. Reverting game.rs + time_slice.rs + history.rs storage back to the T2 Vec<Vec<_>> shape. The parallel rayon path (unsafe direct-write + thread_local ScratchArena + RAYON_THRESHOLD=64 fallback) stays — it is independent of Game's internal storage. Benchmarks after revert: Batch::iteration (seq, no rayon): 23.23 µs (restored ≈T2) Batch::iteration (rayon): 24.57 µs history_converge/500x100@10: 4.03 ms seq, 4.24 ms rayon — 1.0× history_converge/2000x200@20: 20.18 ms seq, 19.82 ms rayon — 1.0× history_converge/1v1-5000x50000@5000: 11.88 ms seq, 9.10 ms rayon — 1.3× Part of T3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 14:55:37 +02:00
parent be515c3d8d
commit f0d6211387
4 changed files with 29 additions and 45 deletions
--- a/benches/history_converge.rs
+++ b/benches/history_converge.rs
@@ -10,17 +10,18 @@
 //! The rayon thread pool is initialised to `min(P-cores, available)` to
 //! avoid scheduling onto the slower E-cores.
 //!
-//! ## Results (Apple M5 Pro, 2026-04-24, 5 P-core threads)
+//! ## Results (Apple M5 Pro, 2026-04-24, after SmallVec revert)
 //!
 //! | Workload                                    | Sequential  | Parallel   | Speedup |
 //! |---------------------------------------------|------------:|-----------:|--------:|
-//! | History::converge/500x100@10perslice        |     4.71 ms |    4.79 ms |   1.0×  |
-//! | History::converge/2000x200@20perslice       |    23.36 ms |   23.28 ms |   1.0×  |
-//! | History::converge/1v1-5000x50000@5000perslice|   13.90 ms |    6.99 ms |  **2.0×** |
+//! | History::converge/500x100@10perslice        |     4.03 ms |    4.24 ms |   1.0×  |
+//! | History::converge/2000x200@20perslice       |    20.18 ms |   19.82 ms |   1.0×  |
+//! | History::converge/1v1-5000x50000@5000perslice|   11.88 ms |    9.10 ms |   1.3×  |
 //!
-//! T3 acceptance gate: ≥2× speedup on at least one workload — ACHIEVED.
-//! Small workloads fall below the RAYON_THRESHOLD (64 events/color) and
-//! run sequentially with near-zero overhead.
+//! T3 acceptance gate: ≥2× speedup on at least one workload — NOT achieved after revert.
+//! The SmallVec storage that enabled the 2× gate caused a +28% regression in the
+//! sequential Batch::iteration benchmark and was reverted. Small workloads still fall
+//! below the RAYON_THRESHOLD (64 events/color) and run sequentially with near-zero overhead.

 use criterion::{BatchSize, Criterion, criterion_group, criterion_main};
 use smallvec::smallvec;