diff --git a/CHANGELOG.md b/CHANGELOG.md
index e5136db..4f73828 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,149 +2,13 @@
 
 All notable changes to this project will be documented in this file.
 
-## Unreleased — T3 concurrency
+## 0.1.1 - 2026-04-27
 
-Adds rayon-backed parallel paths per Section 6 of
-`docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`.
+### Other (unconventional)
 
-### Breaking
-
-- `Send + Sync` bounds added to public traits: `Time`, `Drift<T>`,
-  `Observer<T>`, `Factor`, `Schedule`. All built-in impls satisfy these
-  via auto-derive, but downstream custom impls that aren't thread-safe
-  will need the bounds.
-
-### New
-
-- Opt-in `rayon` cargo feature. When enabled:
-  - Within-slice event iteration runs color-group events in parallel
-    via `par_iter_mut` (`TimeSlice::sweep_color_groups`).
-  - `History::learning_curves` computes per-slice posteriors in
-    parallel, merges sequentially in slice order.
-  - `History::log_evidence` / `log_evidence_for` use per-slice parallel
-    computation with deterministic sequential reduction (sum in slice
-    order) — bit-identical to the sequential baseline.
-- `ColorGroups` internal infrastructure with greedy graph coloring
-  (`src/color_group.rs`). Events sharing no `Index` go into the same
-  color group; events in the same group can run concurrently without
-  touching each other's skills.
-- `tests/determinism.rs` asserts bit-identical posteriors across
-  `RAYON_NUM_THREADS={1, 2, 4, 8}`.
-- `benches/history_converge.rs` measures end-to-end convergence on
-  three workload shapes.
-
-### Performance notes
-
-- Default build (no rayon): `Batch::iteration` 23.23 µs — no regression
-  vs T2.
-- With `--features rayon`:
-  - 500 events / 100 competitors / 10 per slice: 1.0× speedup.
-  - 2000 events / 200 competitors / 20 per slice: 1.0× speedup.
-  - 5000 events in one slice / 50k competitors: **1.3× speedup.**
-- The spec targeted >2× speedup on 8-core offline converge. This is
-  only achievable on workloads with many events-per-slice AND large
-  competitor pools. **Typical TrueSkill workloads (tens of events
-  per slice) do not materially benefit from T3's within-slice
-  parallelism** because rayon's task-spawn overhead dominates.
-- Cross-slice parallelism (dirty-bit slice skipping per spec Section
-  5) is the natural next step for real workload speedup — deferred
-  to a future tier.
-
-### Internals
-
-- The parallel path uses an `unsafe` block to concurrently write to
-  `SkillStore` from color-group-disjoint events. Soundness rests on
-  the color-group invariant (events in the same color touch no shared
-  `Index`), which is guaranteed by construction in
-  `TimeSlice::recompute_color_groups`. Sequential path unchanged.
-- `RAYON_THRESHOLD = 64` — color groups smaller than this fall back to
-  sequential iteration inside the parallel `sweep_color_groups` to
-  avoid rayon's task-spawn overhead.
-- Thread-local `ScratchArena` per rayon worker thread.
-
-## Unreleased — T2 new API surface
-
-Breaking: every renamed type and the new public API land together per
-`docs/superpowers/specs/2026-04-23-trueskill-engine-redesign-design.md`
-Section 7 "T2".
-
-### Breaking renames
-
-- `Batch` → `TimeSlice`
-- `Player` → `Rating` (and the `.player` field on `Competitor` is now `.rating`)
-- `Agent` → `Competitor`
-- `IndexMap` → `KeyTable`
-- `History` field `.batches` → `.time_slices`
-
-### New types
-
-- `Time` trait with `Untimed` ZST and `i64` impls (generic time axis).
-- `Drift<T: Time>` — generified from the old `Drift` trait.
-- `Event<T, K>`, `Team<K>`, `Member<K>` — typed bulk-ingest event shape.
-- `Outcome` (`#[non_exhaustive]`) — `Ranked(SmallVec<[u32; 4]>)` with convenience
-  constructors `winner`, `draw`, `ranking`. `Scored` lands in T4.
-- `Observer<T: Time>` trait + `NullObserver` ZST — structured progress callbacks.
-- `ConvergenceOptions`, `ConvergenceReport` — configuration and post-hoc summary.
-- `GameOptions`, `OwnedGame<T, D>` — ergonomic Game constructors without lifetime
-  gymnastics.
-- `factors` module — re-exports `Factor`, `BuiltinFactor`, `VarId`, `VarStore`,
-  `Schedule`, `EpsilonOrMax`, `ScheduleReport`, and the three built-in factor types
-  (`TeamSumFactor`, `RankDiffFactor`, `TruncFactor`) as public API.
-
-### New `History` API
-
-- Three-tier ingestion:
-  - Tier 1 (bulk): `add_events<I: IntoIterator<Item = Event<T, K>>>(events) -> Result`
-  - Tier 2 (one-off): `record_winner(&K, &K, T)`, `record_draw(&K, &K, T)`
-  - Tier 3 (fluent): `event(T).team([...]).weights([...]).ranking([...]).commit()`
-- `converge() -> Result<ConvergenceReport, InferenceError>` — replaces
-  `convergence(iters, eps, verbose)`.
-- `current_skill(&K)`, `learning_curve(&K)`, `learning_curves()` (now keyed on `K`).
-- `log_evidence()` zero-arg, `log_evidence_for(&[&K])`.
-- `predict_quality(&[&[&K]])`, `predict_outcome(&[&[&K]])` (2-team only in T2;
-  N-team deferred to T4).
-- `intern(&Q)` / `lookup(&Q)` expose the internal `KeyTable<K>` for power users.
-- `History<T, D, O, K>` is now fully generic with defaults
-  `<i64, ConstantDrift, NullObserver, &'static str>`.
-
-### New `Game` API
-
-- `Game::ranked(&[&[Rating]], Outcome, &GameOptions) -> Result<OwnedGame, _>`.
-- `Game::one_v_one(&Rating, &Rating, Outcome) -> Result<(Gaussian, Gaussian), _>`.
-- `Game::free_for_all(&[&Rating], Outcome, &GameOptions) -> Result<OwnedGame, _>`.
-- `Game::custom(...)` minimal escape hatch for user-defined factor graphs
-  (`#[doc(hidden)]` — full ergonomics in T4).
-- `Game::log_evidence()` and `OwnedGame::log_evidence()` accessors.
-
-### Errors
-
-- `InferenceError` now carries `MismatchedShape { kind, expected, got }`,
-  `InvalidProbability { value }`, `ConvergenceFailed { last_step, iterations }`,
-  and `NegativePrecision { pi }`. Shape and bounds validation at the API boundary
-  now returns `Err` rather than panicking.
-
-### Removed (breaking)
-
-- `History::convergence(iters, eps, verbose)` — use `converge()`.
-- `HistoryBuilder::gamma(f64)` — use `.drift(ConstantDrift(g))`.
-- `HistoryBuilder::time(bool)` and `History.time: bool` — use the `Time` type parameter.
-- The nested-`Vec<Vec<Vec<_>>>` public `add_events` signature —
-  use typed `add_events(iter)`.
-- `learning_curves_by_index()` — use `learning_curves()`.
-
-### Performance
-
-`Batch::iteration` bench: **21.36 µs** (T1 was 22.88 µs on the same hardware, a
-~7% improvement from the typed-path being slightly more direct). Gaussian
-operations unchanged.
-
-### Notes
-
-- `Time = Untimed` returns `elapsed_to → 0` — **behavior change** from the old
-  `time=false` mode, which implicitly generated `elapsed=1` per event via an
-  `i64::MAX` sentinel in `Agent.last_time`. Tests that relied on the old
-  `time=false` semantics now use `History::<i64, _>` with explicit
-  `1..=n` timestamps.
+- T0 + T1 + T2: engine redesign through new API surface (#1)
+- T3: rayon-backed concurrency (opt-in) (#2)
+- T4 (MarginFactor): scored outcomes via Gaussian-margin EP evidence
 
 ## 0.1.0 - 2026-04-23
 
@@ -156,6 +20,8 @@ operations unchanged.
 
 - chore: added cliff.toml, release.toml and rustfmt.toml
 - chore: clean up
+- chore: make cargo release add CHANGELOG.md before commit
+- chore: do not publish
 
 ### Other (unconventional)
 
diff --git a/release.toml b/release.toml
index 2af34d1..e32cc02 100644
--- a/release.toml
+++ b/release.toml
@@ -1,2 +1,2 @@
 publish = false
-pre-release-hook = ["sh", "-c", "git cliff -o ../CHANGELOG.md --tag {{version}} && git add CHANGELOG.md"]
+pre-release-hook = ["sh", "-c", "git cliff -o CHANGELOG.md --tag {{version}} && git add CHANGELOG.md"]