# MVP Architecture & Design **Status:** approved design, pre-implementation **Date:** 2026-06-02 **Scope:** the MVP — the smallest useful build that exercises every architectural pillar. Companion to [`../VISION.md`](../VISION.md) (full feature catalogue) and grounded in [`../../reference/`](../../reference/) (Spectrum 5.0 + Riksantikvarie- ämbetet source material). > Neutral naming throughout. No product/brand name appears in code or these docs > (see §13). --- ## 1. Goals & non-goals **Goals** - A small, strongly-typed, well-tested **core** that is easy to extend. - An organization can **catalogue** its collection (Spectrum Cataloguing), attach **media**, **search** it, control **visibility**, and expose **public** records. - **Airtight org isolation** and a full **audit trail** from day one. - **Easy self-hosting**: one binary, one database, minimal dependencies. **Non-goals (MVP)** - Other Spectrum procedures as workflows (entry, accession, loans, location/ movement control, …) — roadmap. - Reporting/label templates, aggregator/LIDO/OAI-PMH/IIIF, translation workflow UI, fleet provisioning/control plane, migrations machinery (none until 1.0). --- ## 2. Guiding principles - **Make illegal states unrepresentable** (§9). Parse, don't validate. - **Isolation by construction** (§4): credentials + topology, not `org_id` filtering in code. - **Module separation; no SQL spread.** SQL lives only in repository modules (§5, §8). - **Minimal custom code, reversible dependency bets** (§14). - **Self-host is first-class** (§12). - **Well-tested, not overboard** (§15): strong types shrink the test surface; the isolation/security and the core get thorough tests; the dynamic field layer is validated at runtime. --- ## 3. Deployment topology & tenancy **The application binary is always single-tenant.** One running instance serves exactly one organization and contains no concept of "other orgs". There is **no multi-tenant code path**. Multi-tenancy is achieved entirely at the deployment layer: | | Self-host | Hosted fleet | |---|---|---| | App instances | one (1+ pods) | one deployment **per org** (1+ pods each) | | Postgres | its own database | **one shared server**, one **database per org** | | Meilisearch | its own index | **one shared server**, one **index per org** | | Files | local disk or S3 | S3 (or RWX volume per org) | | Domain | the org's domain | each org its own domain | | Rollout | upgrade the instance | **per-org** image bump | Consequences (recorded): - **Per-org rollout & schema version.** Bumping one org's image rolls out that org only; the instance runs its own migrations against its own database. Orgs may sit on different versions. (Pre-1.0: recreate rather than migrate.) - **Files:** with more than one pod per org, files must be on shared storage (S3 or RWX volume) — local disk is single-pod/self-host only. `BlobStore` (§11) abstracts this. - **Cross-org features** (a future aggregator searching across museums; fleet admin) are a **separate service**, never a single org-app. Out of MVP. ## 4. Isolation model Because each org-app holds **credentials scoped to its own database and its own search index**, cross-org access is not "prevented" — it is **impossible, because the access path does not exist**: - **Postgres:** database-per-org + a role granted access to *only* that database. An instance physically cannot connect to another org's database. - **Meilisearch:** index-per-org + an API key scoped to that org's index only. - **No Row-Level Security needed** — there is no shared multi-org data in any single database to protect, and the app has no cross-org code. - **Files:** per-org bucket/prefix (S3) or per-org volume, with scoped credentials. Defense-in-depth / verification: - A **single configuration chokepoint** establishes "which org am I" at startup from config; nothing reconstructs it ad hoc. - **Negative tests** assert the app cannot be pointed outside its configured database/index and that scoped credentials reject foreign access. ## 5. Crate / module layout A Cargo **workspace** with **role-named** member crates (no brand name anywhere): ``` / virtual workspace crates/ domain/ core types, value objects, invariants (no I/O) db/ sqlx repositories; ALL SQL lives here storage/ BlobStore trait + OpenDAL adapter (S3 / local) search/ search abstraction + Meilisearch adapter auth/ password + OIDC, session/token, extractors api/ axum router, handlers, OpenAPI (utoipa), public + admin server/ binary: config, wiring, startup, migrations runner web/ React SPA (separate build), consumes the OpenAPI migrations/ SQL migrations (post-1.0; pre-1.0 = recreate) ``` Dependency direction points inward toward `domain`. `domain` has no I/O deps. Each crate has one clear purpose, a defined interface, and is testable in isolation. Experimental/volatile dependencies sit behind a crate-owned trait (`BlobStore`, the search trait, …) so they are swappable (§14). ## 6. Data model — hybrid (Approach C) Three layers: ### 6.1 Typed relational core The accountability backbone and the most queried/integrity-critical fields, as real columns/tables with strong types: - object number (configurable format), object name, number of objects, brief description, current location, current owner, recorder, recording date, **visibility**, media links, audit linkage. ### 6.2 Flexible field layer - A **field-definition registry**: each definition has a key, data type, optional **vocabulary/authority binding**, validation rules, grouping, and locale behavior. - Field **values** stored as **JSONB** on the record, validated at write time against the registry. - The **Spectrum 5.0 Cataloguing field set** ships as **seed field definitions** (see [`reference/spectrum-5.0-cataloguing-units-of-information.md`](../../reference/spectrum-5.0-cataloguing-units-of-information.md)). Orgs enable a subset or the full set; custom fields are *data*, not migrations. - **Trade (explicit):** this layer is **runtime-typed by design** — validated against definitions at runtime, not by the compiler. Hard types where structure is fixed (core, IDs, refs), runtime validation where it is dynamic. ### 6.3 Controlled vocabularies & authority records - First-class relational tables for **person / organization / place** authorities and **term sources** (vocabularies) — *store once, link many*. - Referenced from both the typed core and the flexible fields. A field bound to a term source accepts only a **resolved reference** (§9), never a free string. - **Multilingual labels** (sv/en …) on terms and authorities. ### 6.4 Content i18n (capability now, workflow later) - Localizable text values are **language-tagged in the data model from day one** (so no painful migration later). - The **translation workflow/UI is post-MVP**; MVP authors enter content in one language while the model already supports more. ## 7. Surfaces & API Two cleanly separated surfaces — a **load-bearing** rule: - **Public surface** — `/api/public/**`: unauthenticated, **read-only**, serves only **public** records as a typed **`PublicView`** (public-safe fields only). - **Admin/privileged surface** — everything else: authenticated, read/write. This separation enables independent **IP/VPN lockdown** (admin behind an ingress allowlist while public stays open), caching, and rate-limiting — all at the ingress layer, not in app code. An optional in-app IP-allowlist middleware is a post-MVP portable fallback. **OpenAPI:** code-first with **utoipa** — the spec is generated from Rust types/handlers (cannot drift) and is the contract the React client consumes. ## 8. Persistence & data access - **PostgreSQL** via **sqlx** (async, compile-time-checked queries). **All SQL is confined to the `db` crate**, one repository per aggregate — satisfying "no SQL spread everywhere" without an ORM's abstraction. - JSONB for the flexible field values (GIN-indexable for search/filter needs). - **No migrations until 1.0** — pre-1.0 we reshape freely (drop & recreate). Post- 1.0, each instance runs its own migrations on startup (per-org schema version). ## 9. Type-driven design (cross-cutting) - **Newtype IDs** — `ObjectId`, `OrgId`, `MediaId`, `TermId`, `AuthorityId`; never bare UUIDs. - **Validated value objects** — `ObjectNumber`, `Email`, and `TermRef` / `AuthorityRef` that are **constructable only by resolving** against the vocabulary/authority. An unvalidated term cannot exist as that type. (Direct mapping of Spectrum's "use a standard term source / form of name".) - **`PublicView` projection** — a distinct type carrying only public-safe fields; leaking an internal field on the public surface is impossible because the type lacks it. (Preferred over a literal `Record` generic, since visibility is runtime data from the DB.) - **Visibility** — an enum with explicit transition methods (`publish`, `unpublish`, `archive`): a type-driven state machine, not a stringly-typed flag. - **Auth via extractors** — public handlers take no auth extractor; privileged handlers require an `AuthUser` / `Authorized` extractor, so a privileged handler cannot compile without proof of authorization. ## 10. Authentication & authorization - **Email/password** + **external OIDC** (the org-app is an OIDC relying party), scoped to the single org the instance serves. - **No separate IdP and no cross-org switching** in MVP (deferred; rare case). - Sessions: stateless tokens or a sessions table in the org DB (no Redis required). - Authorization enforced through typed extractors (§9); role/permission model kept simple in MVP. ## 11. File storage - **`BlobStore` trait** in the `storage` crate; **OpenDAL** adapter for **S3 and local disk**. Chosen on fit (high-level, multi-backend; our bottleneck is network/S3, not syscall I/O). `fusio` is watch-listed and swappable behind the trait (§14). - Media files are linked to records; derivatives/thumbnails/IIIF are post-MVP. ## 12. Search - **Meilisearch**, one index per org, scoped API key. A search abstraction in the `search` crate; Meili adapter behind it. - MVP: index catalogue records on write; basic full-text + facet search in admin. - Public-facing search is post-MVP. ## 13. Audit & amendment history - **One append-only, immutable log** in the org database: who / when / what, with **field-level before→after diffs**, covering domain create/update/delete and auth/security events. - Doubles as Spectrum **amendment history** surfaced on catalogue records (Spectrum requires a transparent record of changes — never silently erase prior terminology). - MVP audits **writes + auth events**; auditing reads is deferred. ## 14. Visibility & publishing - **Record-level visibility**: `draft` / `internal` / `public`. - A fixed **never-public** field set (location, valuation, insurance, personal data). Per-field publishability is post-MVP. - Public API serves only `public` records via `PublicView`. ## 15. Export & backup (distinct) - **Backup** (operational): `pg_dump` / PITR of the org database. Ops concern. - **Export** (portable handover): a single **SQLite** file (metadata incl. flattened flexible fields + vocab/authority tables) + plain **media files** + a **manifest** — a whole-org archive, openable anywhere, stable long-term. ## 16. Internationalization - **UI:** Swedish + English via a React i18n library + locale files; localized API validation/error messages. - **Data:** multilingual labels on vocab/authority terms; language-tagged content values in the model (workflow post-MVP, §6.4). ## 17. Frontend - **Lean React SPA**, evergreen browsers, consuming the OpenAPI. Separate build in `web/`. - **"Potato hardware" = an explicit bundle-discipline budget**: small dependency set, code-splitting, measured bundle size as a tracked target — *not* a framework compromise. - Suits the data-entry-heavy cataloguing UI (vocabulary autocomplete, dynamic field groups from the registry, inline validation). ## 18. Dependencies & tech stack | Concern | Choice | Notes | |---|---|---| | Language | **Rust 2024** | | | HTTP | **axum** | | | API spec | **utoipa** (code-first OpenAPI) | drives the React client | | DB | **PostgreSQL** + **sqlx** | SQL confined to `db` crate | | Storage | **OpenDAL** behind `BlobStore` | S3 + local; `fusio` watch-listed | | Search | **Meilisearch** behind a search trait | index-per-org | | Cache | **Redis** — *deferred* | add only when needed; key-prefixed | | Frontend | **React** (lean SPA) | bundle budget enforced | | i18n (FE) | React i18n lib | sv/en | **Dependency philosophy:** pre-1.0, choose on **capability/fit, not maturity**; isolate volatile deps behind owned traits (reversible bets); **re-evaluate each bet before 1.0**, when the API surface and data formats lock. ## 19. Testing strategy - **Core & domain:** thorough unit tests; strong types remove whole categories from the test surface. - **Isolation/security:** dedicated **negative tests** (scoped credentials reject foreign access; the public surface never emits internal fields/non-public records). - **Repositories:** integration tests against Postgres. - **Flexible fields:** validation tested against field definitions. - Deliberately **not overboard** elsewhere. ## 20. Decision log | # | Decision | Why | Alternatives rejected | |---|---|---|---| | D1 | Per-org single-tenant binary; tenancy is deployment-only | Simplest core (no tenant plumbing); self-host = same artifact; isolation by construction | Shared multi-tenant app w/ `org_id`+RLS (bleed risk, complex core) | | D2 | Database-per-org + scoped role; index-per-org + scoped key | Hard isolation; clean per-org export; no RLS | Schema-per-org (softer); shared DB + RLS (shared data path) | | D3 | Hybrid data model (typed core + JSONB flexible + relational vocab/authority) | Small tested core + extensible tail; matches "link don't duplicate" | Fixed Spectrum schema (rigid); pure EAV/JSONB (weak integrity) | | D4 | Type-driven design; `PublicView` projection; refs as validated types | Removes bug classes incl. public-data leaks; shrinks tests | Runtime checks only | | D5 | sqlx + repository layer | Compile-time-checked SQL, no ORM, SQL in one place | SeaORM (more abstraction); Diesel (sync) | | D6 | Clean public/admin surface split | Enables IP-lock/caching/publishing cleanly | Single mixed surface | | D7 | Ingress-layer IP/VPN lockdown, admin-only-lockable | Not the app's job; public stays open | App-level firewall (fallback only) | | D8 | Lean React SPA, evergreen + bundle budget | Growth path; ecosystem for data-entry UI; fits weak HW if disciplined | htmx/SSR (only needed for ancient browsers — none required) | | D9 | Append-only audit w/ field diffs = amendment history | One mechanism satisfies ops audit + Spectrum requirement | Separate audit & history systems | | D10 | Export = SQLite + files; backup = pg_dump | Portable, openable anywhere; distinct from ops backup | pg_dump as the only "export" (not portable) | | D11 | OpenDAL behind `BlobStore` | Right altitude, multi-backend; bottleneck is network not syscalls | fusio now (lower-level, DB-engine focus) — watch-listed | | D12 | utoipa code-first OpenAPI | Spec can't drift; drives client | spec-first | | D13 | i18n: UI+vocab labels MVP; content workflow later, model ready now | Avoids painful migration; keeps MVP small | Full content translation in MVP (too big) | | D14 | No IdP / no cross-org switching now | Rare case; keeps auth simple | Build shared IdP now | | D15 | No migrations until 1.0 | Freedom to reshape pre-1.0 | Migrations from day one | | D16 | No product name in code; role-named workspace; name from config | Placeholder must never leak; trivial rename later | Hardcode a working name | ## 21. Open items for the implementation plan - First scaffolding task: **dissolve the current `biggus-dickus` package** into the role-named workspace (the placeholder name must not survive into real code). - Decide the role/permission model's MVP shape (kept minimal). - Decide the object-number format configuration mechanism. - Define the SQLite export schema mapping for the hybrid model. - Choose specific crates for OIDC, JSONB validation, and React i18n during planning.