- docs/VISION.md: product vision + feature catalogue (MVP / post-MVP / later) - docs/specs/2026-06-02-mvp-architecture.md: MVP architecture + 16-entry decision log - reference/: Spectrum 5.0 cataloguing + Riksantikvarieämbetet source material (build-time reference) - CLAUDE.md: project guidance for Claude Code Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
16 KiB
MVP Architecture & Design
Status: approved design, pre-implementation
Date: 2026-06-02
Scope: the MVP — the smallest useful build that exercises every architectural
pillar. Companion to ../VISION.md (full feature catalogue) and
grounded in ../../reference/ (Spectrum 5.0 + Riksantikvarie-
ämbetet source material).
Neutral naming throughout. No product/brand name appears in code or these docs (see §13).
1. Goals & non-goals
Goals
- A small, strongly-typed, well-tested core that is easy to extend.
- An organization can catalogue its collection (Spectrum Cataloguing), attach media, search it, control visibility, and expose public records.
- Airtight org isolation and a full audit trail from day one.
- Easy self-hosting: one binary, one database, minimal dependencies.
Non-goals (MVP)
- Other Spectrum procedures as workflows (entry, accession, loans, location/ movement control, …) — roadmap.
- Reporting/label templates, aggregator/LIDO/OAI-PMH/IIIF, translation workflow UI, fleet provisioning/control plane, migrations machinery (none until 1.0).
2. Guiding principles
- Make illegal states unrepresentable (§9). Parse, don't validate.
- Isolation by construction (§4): credentials + topology, not
org_idfiltering in code. - Module separation; no SQL spread. SQL lives only in repository modules (§5, §8).
- Minimal custom code, reversible dependency bets (§14).
- Self-host is first-class (§12).
- Well-tested, not overboard (§15): strong types shrink the test surface; the isolation/security and the core get thorough tests; the dynamic field layer is validated at runtime.
3. Deployment topology & tenancy
The application binary is always single-tenant. One running instance serves exactly one organization and contains no concept of "other orgs". There is no multi-tenant code path. Multi-tenancy is achieved entirely at the deployment layer:
| Self-host | Hosted fleet | |
|---|---|---|
| App instances | one (1+ pods) | one deployment per org (1+ pods each) |
| Postgres | its own database | one shared server, one database per org |
| Meilisearch | its own index | one shared server, one index per org |
| Files | local disk or S3 | S3 (or RWX volume per org) |
| Domain | the org's domain | each org its own domain |
| Rollout | upgrade the instance | per-org image bump |
Consequences (recorded):
- Per-org rollout & schema version. Bumping one org's image rolls out that org only; the instance runs its own migrations against its own database. Orgs may sit on different versions. (Pre-1.0: recreate rather than migrate.)
- Files: with more than one pod per org, files must be on shared storage (S3 or
RWX volume) — local disk is single-pod/self-host only.
BlobStore(§11) abstracts this. - Cross-org features (a future aggregator searching across museums; fleet admin) are a separate service, never a single org-app. Out of MVP.
4. Isolation model
Because each org-app holds credentials scoped to its own database and its own search index, cross-org access is not "prevented" — it is impossible, because the access path does not exist:
- Postgres: database-per-org + a role granted access to only that database. An instance physically cannot connect to another org's database.
- Meilisearch: index-per-org + an API key scoped to that org's index only.
- No Row-Level Security needed — there is no shared multi-org data in any single database to protect, and the app has no cross-org code.
- Files: per-org bucket/prefix (S3) or per-org volume, with scoped credentials.
Defense-in-depth / verification:
- A single configuration chokepoint establishes "which org am I" at startup from config; nothing reconstructs it ad hoc.
- Negative tests assert the app cannot be pointed outside its configured database/index and that scoped credentials reject foreign access.
5. Crate / module layout
A Cargo workspace with role-named member crates (no brand name anywhere):
/ virtual workspace
crates/
domain/ core types, value objects, invariants (no I/O)
db/ sqlx repositories; ALL SQL lives here
storage/ BlobStore trait + OpenDAL adapter (S3 / local)
search/ search abstraction + Meilisearch adapter
auth/ password + OIDC, session/token, extractors
api/ axum router, handlers, OpenAPI (utoipa), public + admin
server/ binary: config, wiring, startup, migrations runner
web/ React SPA (separate build), consumes the OpenAPI
migrations/ SQL migrations (post-1.0; pre-1.0 = recreate)
Dependency direction points inward toward domain. domain has no I/O deps.
Each crate has one clear purpose, a defined interface, and is testable in
isolation. Experimental/volatile dependencies sit behind a crate-owned trait
(BlobStore, the search trait, …) so they are swappable (§14).
6. Data model — hybrid (Approach C)
Three layers:
6.1 Typed relational core
The accountability backbone and the most queried/integrity-critical fields, as real columns/tables with strong types:
- object number (configurable format), object name, number of objects, brief description, current location, current owner, recorder, recording date, visibility, media links, audit linkage.
6.2 Flexible field layer
- A field-definition registry: each definition has a key, data type, optional vocabulary/authority binding, validation rules, grouping, and locale behavior.
- Field values stored as JSONB on the record, validated at write time against the registry.
- The Spectrum 5.0 Cataloguing field set ships as seed field definitions
(see
reference/spectrum-5.0-cataloguing-units-of-information.md). Orgs enable a subset or the full set; custom fields are data, not migrations. - Trade (explicit): this layer is runtime-typed by design — validated against definitions at runtime, not by the compiler. Hard types where structure is fixed (core, IDs, refs), runtime validation where it is dynamic.
6.3 Controlled vocabularies & authority records
- First-class relational tables for person / organization / place authorities and term sources (vocabularies) — store once, link many.
- Referenced from both the typed core and the flexible fields. A field bound to a term source accepts only a resolved reference (§9), never a free string.
- Multilingual labels (sv/en …) on terms and authorities.
6.4 Content i18n (capability now, workflow later)
- Localizable text values are language-tagged in the data model from day one (so no painful migration later).
- The translation workflow/UI is post-MVP; MVP authors enter content in one language while the model already supports more.
7. Surfaces & API
Two cleanly separated surfaces — a load-bearing rule:
- Public surface —
/api/public/**: unauthenticated, read-only, serves only public records as a typedPublicView(public-safe fields only). - Admin/privileged surface — everything else: authenticated, read/write.
This separation enables independent IP/VPN lockdown (admin behind an ingress allowlist while public stays open), caching, and rate-limiting — all at the ingress layer, not in app code. An optional in-app IP-allowlist middleware is a post-MVP portable fallback.
OpenAPI: code-first with utoipa — the spec is generated from Rust types/handlers (cannot drift) and is the contract the React client consumes.
8. Persistence & data access
- PostgreSQL via sqlx (async, compile-time-checked queries). All SQL is
confined to the
dbcrate, one repository per aggregate — satisfying "no SQL spread everywhere" without an ORM's abstraction. - JSONB for the flexible field values (GIN-indexable for search/filter needs).
- No migrations until 1.0 — pre-1.0 we reshape freely (drop & recreate). Post- 1.0, each instance runs its own migrations on startup (per-org schema version).
9. Type-driven design (cross-cutting)
- Newtype IDs —
ObjectId,OrgId,MediaId,TermId,AuthorityId; never bare UUIDs. - Validated value objects —
ObjectNumber,Email, andTermRef/AuthorityRefthat are constructable only by resolving against the vocabulary/authority. An unvalidated term cannot exist as that type. (Direct mapping of Spectrum's "use a standard term source / form of name".) PublicViewprojection — a distinct type carrying only public-safe fields; leaking an internal field on the public surface is impossible because the type lacks it. (Preferred over a literalRecord<Public>generic, since visibility is runtime data from the DB.)- Visibility — an enum with explicit transition methods (
publish,unpublish,archive): a type-driven state machine, not a stringly-typed flag. - Auth via extractors — public handlers take no auth extractor; privileged
handlers require an
AuthUser/Authorized<Cap>extractor, so a privileged handler cannot compile without proof of authorization.
10. Authentication & authorization
- Email/password + external OIDC (the org-app is an OIDC relying party), scoped to the single org the instance serves.
- No separate IdP and no cross-org switching in MVP (deferred; rare case).
- Sessions: stateless tokens or a sessions table in the org DB (no Redis required).
- Authorization enforced through typed extractors (§9); role/permission model kept simple in MVP.
11. File storage
BlobStoretrait in thestoragecrate; OpenDAL adapter for S3 and local disk. Chosen on fit (high-level, multi-backend; our bottleneck is network/S3, not syscall I/O).fusiois watch-listed and swappable behind the trait (§14).- Media files are linked to records; derivatives/thumbnails/IIIF are post-MVP.
12. Search
- Meilisearch, one index per org, scoped API key. A search abstraction in the
searchcrate; Meili adapter behind it. - MVP: index catalogue records on write; basic full-text + facet search in admin.
- Public-facing search is post-MVP.
13. Audit & amendment history
- One append-only, immutable log in the org database: who / when / what, with field-level before→after diffs, covering domain create/update/delete and auth/security events.
- Doubles as Spectrum amendment history surfaced on catalogue records (Spectrum requires a transparent record of changes — never silently erase prior terminology).
- MVP audits writes + auth events; auditing reads is deferred.
14. Visibility & publishing
- Record-level visibility:
draft/internal/public. - A fixed never-public field set (location, valuation, insurance, personal data). Per-field publishability is post-MVP.
- Public API serves only
publicrecords viaPublicView.
15. Export & backup (distinct)
- Backup (operational):
pg_dump/ PITR of the org database. Ops concern. - Export (portable handover): a single SQLite file (metadata incl. flattened flexible fields + vocab/authority tables) + plain media files + a manifest — a whole-org archive, openable anywhere, stable long-term.
16. Internationalization
- UI: Swedish + English via a React i18n library + locale files; localized API validation/error messages.
- Data: multilingual labels on vocab/authority terms; language-tagged content values in the model (workflow post-MVP, §6.4).
17. Frontend
- Lean React SPA, evergreen browsers, consuming the OpenAPI. Separate build in
web/. - "Potato hardware" = an explicit bundle-discipline budget: small dependency set, code-splitting, measured bundle size as a tracked target — not a framework compromise.
- Suits the data-entry-heavy cataloguing UI (vocabulary autocomplete, dynamic field groups from the registry, inline validation).
18. Dependencies & tech stack
| Concern | Choice | Notes |
|---|---|---|
| Language | Rust 2024 | |
| HTTP | axum | |
| API spec | utoipa (code-first OpenAPI) | drives the React client |
| DB | PostgreSQL + sqlx | SQL confined to db crate |
| Storage | OpenDAL behind BlobStore |
S3 + local; fusio watch-listed |
| Search | Meilisearch behind a search trait | index-per-org |
| Cache | Redis — deferred | add only when needed; key-prefixed |
| Frontend | React (lean SPA) | bundle budget enforced |
| i18n (FE) | React i18n lib | sv/en |
Dependency philosophy: pre-1.0, choose on capability/fit, not maturity; isolate volatile deps behind owned traits (reversible bets); re-evaluate each bet before 1.0, when the API surface and data formats lock.
19. Testing strategy
- Core & domain: thorough unit tests; strong types remove whole categories from the test surface.
- Isolation/security: dedicated negative tests (scoped credentials reject foreign access; the public surface never emits internal fields/non-public records).
- Repositories: integration tests against Postgres.
- Flexible fields: validation tested against field definitions.
- Deliberately not overboard elsewhere.
20. Decision log
| # | Decision | Why | Alternatives rejected |
|---|---|---|---|
| D1 | Per-org single-tenant binary; tenancy is deployment-only | Simplest core (no tenant plumbing); self-host = same artifact; isolation by construction | Shared multi-tenant app w/ org_id+RLS (bleed risk, complex core) |
| D2 | Database-per-org + scoped role; index-per-org + scoped key | Hard isolation; clean per-org export; no RLS | Schema-per-org (softer); shared DB + RLS (shared data path) |
| D3 | Hybrid data model (typed core + JSONB flexible + relational vocab/authority) | Small tested core + extensible tail; matches "link don't duplicate" | Fixed Spectrum schema (rigid); pure EAV/JSONB (weak integrity) |
| D4 | Type-driven design; PublicView projection; refs as validated types |
Removes bug classes incl. public-data leaks; shrinks tests | Runtime checks only |
| D5 | sqlx + repository layer | Compile-time-checked SQL, no ORM, SQL in one place | SeaORM (more abstraction); Diesel (sync) |
| D6 | Clean public/admin surface split | Enables IP-lock/caching/publishing cleanly | Single mixed surface |
| D7 | Ingress-layer IP/VPN lockdown, admin-only-lockable | Not the app's job; public stays open | App-level firewall (fallback only) |
| D8 | Lean React SPA, evergreen + bundle budget | Growth path; ecosystem for data-entry UI; fits weak HW if disciplined | htmx/SSR (only needed for ancient browsers — none required) |
| D9 | Append-only audit w/ field diffs = amendment history | One mechanism satisfies ops audit + Spectrum requirement | Separate audit & history systems |
| D10 | Export = SQLite + files; backup = pg_dump | Portable, openable anywhere; distinct from ops backup | pg_dump as the only "export" (not portable) |
| D11 | OpenDAL behind BlobStore |
Right altitude, multi-backend; bottleneck is network not syscalls | fusio now (lower-level, DB-engine focus) — watch-listed |
| D12 | utoipa code-first OpenAPI | Spec can't drift; drives client | spec-first |
| D13 | i18n: UI+vocab labels MVP; content workflow later, model ready now | Avoids painful migration; keeps MVP small | Full content translation in MVP (too big) |
| D14 | No IdP / no cross-org switching now | Rare case; keeps auth simple | Build shared IdP now |
| D15 | No migrations until 1.0 | Freedom to reshape pre-1.0 | Migrations from day one |
| D16 | No product name in code; role-named workspace; name from config | Placeholder must never leak; trivial rename later | Hardcode a working name |
21. Open items for the implementation plan
- First scaffolding task: dissolve the current
biggus-dickuspackage into the role-named workspace (the placeholder name must not survive into real code). - Decide the role/permission model's MVP shape (kept minimal).
- Decide the object-number format configuration mechanism.
- Define the SQLite export schema mapping for the hybrid model.
- Choose specific crates for OIDC, JSONB validation, and React i18n during planning.