Files
biggus-dickus/docs/specs/2026-06-02-mvp-architecture.md
T
logaritmisk 8f67503f45 docs: add project vision, MVP architecture spec, and reference material
- docs/VISION.md: product vision + feature catalogue (MVP / post-MVP / later)
- docs/specs/2026-06-02-mvp-architecture.md: MVP architecture + 16-entry decision log
- reference/: Spectrum 5.0 cataloguing + Riksantikvarieämbetet source material (build-time reference)
- CLAUDE.md: project guidance for Claude Code

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 00:24:53 +02:00

317 lines
16 KiB
Markdown

# MVP Architecture & Design
**Status:** approved design, pre-implementation
**Date:** 2026-06-02
**Scope:** the MVP — the smallest useful build that exercises every architectural
pillar. Companion to [`../VISION.md`](../VISION.md) (full feature catalogue) and
grounded in [`../../reference/`](../../reference/) (Spectrum 5.0 + Riksantikvarie-
ämbetet source material).
> Neutral naming throughout. No product/brand name appears in code or these docs
> (see §13).
---
## 1. Goals & non-goals
**Goals**
- A small, strongly-typed, well-tested **core** that is easy to extend.
- An organization can **catalogue** its collection (Spectrum Cataloguing), attach
**media**, **search** it, control **visibility**, and expose **public** records.
- **Airtight org isolation** and a full **audit trail** from day one.
- **Easy self-hosting**: one binary, one database, minimal dependencies.
**Non-goals (MVP)**
- Other Spectrum procedures as workflows (entry, accession, loans, location/
movement control, …) — roadmap.
- Reporting/label templates, aggregator/LIDO/OAI-PMH/IIIF, translation workflow
UI, fleet provisioning/control plane, migrations machinery (none until 1.0).
---
## 2. Guiding principles
- **Make illegal states unrepresentable** (§9). Parse, don't validate.
- **Isolation by construction** (§4): credentials + topology, not `org_id`
filtering in code.
- **Module separation; no SQL spread.** SQL lives only in repository modules (§5,
§8).
- **Minimal custom code, reversible dependency bets** (§14).
- **Self-host is first-class** (§12).
- **Well-tested, not overboard** (§15): strong types shrink the test surface; the
isolation/security and the core get thorough tests; the dynamic field layer is
validated at runtime.
---
## 3. Deployment topology & tenancy
**The application binary is always single-tenant.** One running instance serves
exactly one organization and contains no concept of "other orgs". There is **no
multi-tenant code path**. Multi-tenancy is achieved entirely at the deployment
layer:
| | Self-host | Hosted fleet |
|---|---|---|
| App instances | one (1+ pods) | one deployment **per org** (1+ pods each) |
| Postgres | its own database | **one shared server**, one **database per org** |
| Meilisearch | its own index | **one shared server**, one **index per org** |
| Files | local disk or S3 | S3 (or RWX volume per org) |
| Domain | the org's domain | each org its own domain |
| Rollout | upgrade the instance | **per-org** image bump |
Consequences (recorded):
- **Per-org rollout & schema version.** Bumping one org's image rolls out that org
only; the instance runs its own migrations against its own database. Orgs may sit
on different versions. (Pre-1.0: recreate rather than migrate.)
- **Files:** with more than one pod per org, files must be on shared storage (S3 or
RWX volume) — local disk is single-pod/self-host only. `BlobStore` (§11) abstracts
this.
- **Cross-org features** (a future aggregator searching across museums; fleet
admin) are a **separate service**, never a single org-app. Out of MVP.
## 4. Isolation model
Because each org-app holds **credentials scoped to its own database and its own
search index**, cross-org access is not "prevented" — it is **impossible, because
the access path does not exist**:
- **Postgres:** database-per-org + a role granted access to *only* that database.
An instance physically cannot connect to another org's database.
- **Meilisearch:** index-per-org + an API key scoped to that org's index only.
- **No Row-Level Security needed** — there is no shared multi-org data in any
single database to protect, and the app has no cross-org code.
- **Files:** per-org bucket/prefix (S3) or per-org volume, with scoped credentials.
Defense-in-depth / verification:
- A **single configuration chokepoint** establishes "which org am I" at startup
from config; nothing reconstructs it ad hoc.
- **Negative tests** assert the app cannot be pointed outside its configured
database/index and that scoped credentials reject foreign access.
## 5. Crate / module layout
A Cargo **workspace** with **role-named** member crates (no brand name anywhere):
```
/ virtual workspace
crates/
domain/ core types, value objects, invariants (no I/O)
db/ sqlx repositories; ALL SQL lives here
storage/ BlobStore trait + OpenDAL adapter (S3 / local)
search/ search abstraction + Meilisearch adapter
auth/ password + OIDC, session/token, extractors
api/ axum router, handlers, OpenAPI (utoipa), public + admin
server/ binary: config, wiring, startup, migrations runner
web/ React SPA (separate build), consumes the OpenAPI
migrations/ SQL migrations (post-1.0; pre-1.0 = recreate)
```
Dependency direction points inward toward `domain`. `domain` has no I/O deps.
Each crate has one clear purpose, a defined interface, and is testable in
isolation. Experimental/volatile dependencies sit behind a crate-owned trait
(`BlobStore`, the search trait, …) so they are swappable (§14).
## 6. Data model — hybrid (Approach C)
Three layers:
### 6.1 Typed relational core
The accountability backbone and the most queried/integrity-critical fields, as
real columns/tables with strong types:
- object number (configurable format), object name, number of objects,
brief description, current location, current owner, recorder, recording date,
**visibility**, media links, audit linkage.
### 6.2 Flexible field layer
- A **field-definition registry**: each definition has a key, data type, optional
**vocabulary/authority binding**, validation rules, grouping, and locale
behavior.
- Field **values** stored as **JSONB** on the record, validated at write time
against the registry.
- The **Spectrum 5.0 Cataloguing field set** ships as **seed field definitions**
(see [`reference/spectrum-5.0-cataloguing-units-of-information.md`](../../reference/spectrum-5.0-cataloguing-units-of-information.md)).
Orgs enable a subset or the full set; custom fields are *data*, not migrations.
- **Trade (explicit):** this layer is **runtime-typed by design** — validated
against definitions at runtime, not by the compiler. Hard types where structure
is fixed (core, IDs, refs), runtime validation where it is dynamic.
### 6.3 Controlled vocabularies & authority records
- First-class relational tables for **person / organization / place** authorities
and **term sources** (vocabularies) — *store once, link many*.
- Referenced from both the typed core and the flexible fields. A field bound to a
term source accepts only a **resolved reference** (§9), never a free string.
- **Multilingual labels** (sv/en …) on terms and authorities.
### 6.4 Content i18n (capability now, workflow later)
- Localizable text values are **language-tagged in the data model from day one**
(so no painful migration later).
- The **translation workflow/UI is post-MVP**; MVP authors enter content in one
language while the model already supports more.
## 7. Surfaces & API
Two cleanly separated surfaces — a **load-bearing** rule:
- **Public surface** — `/api/public/**`: unauthenticated, **read-only**, serves
only **public** records as a typed **`PublicView`** (public-safe fields only).
- **Admin/privileged surface** — everything else: authenticated, read/write.
This separation enables independent **IP/VPN lockdown** (admin behind an ingress
allowlist while public stays open), caching, and rate-limiting — all at the
ingress layer, not in app code. An optional in-app IP-allowlist middleware is a
post-MVP portable fallback.
**OpenAPI:** code-first with **utoipa** — the spec is generated from Rust
types/handlers (cannot drift) and is the contract the React client consumes.
## 8. Persistence & data access
- **PostgreSQL** via **sqlx** (async, compile-time-checked queries). **All SQL is
confined to the `db` crate**, one repository per aggregate — satisfying "no SQL
spread everywhere" without an ORM's abstraction.
- JSONB for the flexible field values (GIN-indexable for search/filter needs).
- **No migrations until 1.0** — pre-1.0 we reshape freely (drop & recreate). Post-
1.0, each instance runs its own migrations on startup (per-org schema version).
## 9. Type-driven design (cross-cutting)
- **Newtype IDs** — `ObjectId`, `OrgId`, `MediaId`, `TermId`, `AuthorityId`; never
bare UUIDs.
- **Validated value objects** — `ObjectNumber`, `Email`, and `TermRef` /
`AuthorityRef` that are **constructable only by resolving** against the
vocabulary/authority. An unvalidated term cannot exist as that type. (Direct
mapping of Spectrum's "use a standard term source / form of name".)
- **`PublicView` projection** — a distinct type carrying only public-safe fields;
leaking an internal field on the public surface is impossible because the type
lacks it. (Preferred over a literal `Record<Public>` generic, since visibility is
runtime data from the DB.)
- **Visibility** — an enum with explicit transition methods (`publish`,
`unpublish`, `archive`): a type-driven state machine, not a stringly-typed flag.
- **Auth via extractors** — public handlers take no auth extractor; privileged
handlers require an `AuthUser` / `Authorized<Cap>` extractor, so a privileged
handler cannot compile without proof of authorization.
## 10. Authentication & authorization
- **Email/password** + **external OIDC** (the org-app is an OIDC relying party),
scoped to the single org the instance serves.
- **No separate IdP and no cross-org switching** in MVP (deferred; rare case).
- Sessions: stateless tokens or a sessions table in the org DB (no Redis required).
- Authorization enforced through typed extractors (§9); role/permission model kept
simple in MVP.
## 11. File storage
- **`BlobStore` trait** in the `storage` crate; **OpenDAL** adapter for **S3 and
local disk**. Chosen on fit (high-level, multi-backend; our bottleneck is
network/S3, not syscall I/O). `fusio` is watch-listed and swappable behind the
trait (§14).
- Media files are linked to records; derivatives/thumbnails/IIIF are post-MVP.
## 12. Search
- **Meilisearch**, one index per org, scoped API key. A search abstraction in the
`search` crate; Meili adapter behind it.
- MVP: index catalogue records on write; basic full-text + facet search in admin.
- Public-facing search is post-MVP.
## 13. Audit & amendment history
- **One append-only, immutable log** in the org database: who / when / what, with
**field-level before→after diffs**, covering domain create/update/delete and
auth/security events.
- Doubles as Spectrum **amendment history** surfaced on catalogue records
(Spectrum requires a transparent record of changes — never silently erase prior
terminology).
- MVP audits **writes + auth events**; auditing reads is deferred.
## 14. Visibility & publishing
- **Record-level visibility**: `draft` / `internal` / `public`.
- A fixed **never-public** field set (location, valuation, insurance, personal
data). Per-field publishability is post-MVP.
- Public API serves only `public` records via `PublicView`.
## 15. Export & backup (distinct)
- **Backup** (operational): `pg_dump` / PITR of the org database. Ops concern.
- **Export** (portable handover): a single **SQLite** file (metadata incl.
flattened flexible fields + vocab/authority tables) + plain **media files** + a
**manifest** — a whole-org archive, openable anywhere, stable long-term.
## 16. Internationalization
- **UI:** Swedish + English via a React i18n library + locale files; localized API
validation/error messages.
- **Data:** multilingual labels on vocab/authority terms; language-tagged content
values in the model (workflow post-MVP, §6.4).
## 17. Frontend
- **Lean React SPA**, evergreen browsers, consuming the OpenAPI. Separate build in
`web/`.
- **"Potato hardware" = an explicit bundle-discipline budget**: small dependency
set, code-splitting, measured bundle size as a tracked target — *not* a framework
compromise.
- Suits the data-entry-heavy cataloguing UI (vocabulary autocomplete, dynamic field
groups from the registry, inline validation).
## 18. Dependencies & tech stack
| Concern | Choice | Notes |
|---|---|---|
| Language | **Rust 2024** | |
| HTTP | **axum** | |
| API spec | **utoipa** (code-first OpenAPI) | drives the React client |
| DB | **PostgreSQL** + **sqlx** | SQL confined to `db` crate |
| Storage | **OpenDAL** behind `BlobStore` | S3 + local; `fusio` watch-listed |
| Search | **Meilisearch** behind a search trait | index-per-org |
| Cache | **Redis***deferred* | add only when needed; key-prefixed |
| Frontend | **React** (lean SPA) | bundle budget enforced |
| i18n (FE) | React i18n lib | sv/en |
**Dependency philosophy:** pre-1.0, choose on **capability/fit, not maturity**;
isolate volatile deps behind owned traits (reversible bets); **re-evaluate each
bet before 1.0**, when the API surface and data formats lock.
## 19. Testing strategy
- **Core & domain:** thorough unit tests; strong types remove whole categories from
the test surface.
- **Isolation/security:** dedicated **negative tests** (scoped credentials reject
foreign access; the public surface never emits internal fields/non-public
records).
- **Repositories:** integration tests against Postgres.
- **Flexible fields:** validation tested against field definitions.
- Deliberately **not overboard** elsewhere.
## 20. Decision log
| # | Decision | Why | Alternatives rejected |
|---|---|---|---|
| D1 | Per-org single-tenant binary; tenancy is deployment-only | Simplest core (no tenant plumbing); self-host = same artifact; isolation by construction | Shared multi-tenant app w/ `org_id`+RLS (bleed risk, complex core) |
| D2 | Database-per-org + scoped role; index-per-org + scoped key | Hard isolation; clean per-org export; no RLS | Schema-per-org (softer); shared DB + RLS (shared data path) |
| D3 | Hybrid data model (typed core + JSONB flexible + relational vocab/authority) | Small tested core + extensible tail; matches "link don't duplicate" | Fixed Spectrum schema (rigid); pure EAV/JSONB (weak integrity) |
| D4 | Type-driven design; `PublicView` projection; refs as validated types | Removes bug classes incl. public-data leaks; shrinks tests | Runtime checks only |
| D5 | sqlx + repository layer | Compile-time-checked SQL, no ORM, SQL in one place | SeaORM (more abstraction); Diesel (sync) |
| D6 | Clean public/admin surface split | Enables IP-lock/caching/publishing cleanly | Single mixed surface |
| D7 | Ingress-layer IP/VPN lockdown, admin-only-lockable | Not the app's job; public stays open | App-level firewall (fallback only) |
| D8 | Lean React SPA, evergreen + bundle budget | Growth path; ecosystem for data-entry UI; fits weak HW if disciplined | htmx/SSR (only needed for ancient browsers — none required) |
| D9 | Append-only audit w/ field diffs = amendment history | One mechanism satisfies ops audit + Spectrum requirement | Separate audit & history systems |
| D10 | Export = SQLite + files; backup = pg_dump | Portable, openable anywhere; distinct from ops backup | pg_dump as the only "export" (not portable) |
| D11 | OpenDAL behind `BlobStore` | Right altitude, multi-backend; bottleneck is network not syscalls | fusio now (lower-level, DB-engine focus) — watch-listed |
| D12 | utoipa code-first OpenAPI | Spec can't drift; drives client | spec-first |
| D13 | i18n: UI+vocab labels MVP; content workflow later, model ready now | Avoids painful migration; keeps MVP small | Full content translation in MVP (too big) |
| D14 | No IdP / no cross-org switching now | Rare case; keeps auth simple | Build shared IdP now |
| D15 | No migrations until 1.0 | Freedom to reshape pre-1.0 | Migrations from day one |
| D16 | No product name in code; role-named workspace; name from config | Placeholder must never leak; trivial rename later | Hardcode a working name |
## 21. Open items for the implementation plan
- First scaffolding task: **dissolve the current `biggus-dickus` package** into the
role-named workspace (the placeholder name must not survive into real code).
- Decide the role/permission model's MVP shape (kept minimal).
- Decide the object-number format configuration mechanism.
- Define the SQLite export schema mapping for the hybrid model.
- Choose specific crates for OIDC, JSONB validation, and React i18n during planning.