Files
biggus-dickus/docs/specs/2026-06-02-mvp-architecture.md
T
logaritmisk 8f67503f45 docs: add project vision, MVP architecture spec, and reference material
- docs/VISION.md: product vision + feature catalogue (MVP / post-MVP / later)
- docs/specs/2026-06-02-mvp-architecture.md: MVP architecture + 16-entry decision log
- reference/: Spectrum 5.0 cataloguing + Riksantikvarieämbetet source material (build-time reference)
- CLAUDE.md: project guidance for Claude Code

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 00:24:53 +02:00

16 KiB

MVP Architecture & Design

Status: approved design, pre-implementation Date: 2026-06-02 Scope: the MVP — the smallest useful build that exercises every architectural pillar. Companion to ../VISION.md (full feature catalogue) and grounded in ../../reference/ (Spectrum 5.0 + Riksantikvarie- ämbetet source material).

Neutral naming throughout. No product/brand name appears in code or these docs (see §13).


1. Goals & non-goals

Goals

  • A small, strongly-typed, well-tested core that is easy to extend.
  • An organization can catalogue its collection (Spectrum Cataloguing), attach media, search it, control visibility, and expose public records.
  • Airtight org isolation and a full audit trail from day one.
  • Easy self-hosting: one binary, one database, minimal dependencies.

Non-goals (MVP)

  • Other Spectrum procedures as workflows (entry, accession, loans, location/ movement control, …) — roadmap.
  • Reporting/label templates, aggregator/LIDO/OAI-PMH/IIIF, translation workflow UI, fleet provisioning/control plane, migrations machinery (none until 1.0).

2. Guiding principles

  • Make illegal states unrepresentable (§9). Parse, don't validate.
  • Isolation by construction (§4): credentials + topology, not org_id filtering in code.
  • Module separation; no SQL spread. SQL lives only in repository modules (§5, §8).
  • Minimal custom code, reversible dependency bets (§14).
  • Self-host is first-class (§12).
  • Well-tested, not overboard (§15): strong types shrink the test surface; the isolation/security and the core get thorough tests; the dynamic field layer is validated at runtime.

3. Deployment topology & tenancy

The application binary is always single-tenant. One running instance serves exactly one organization and contains no concept of "other orgs". There is no multi-tenant code path. Multi-tenancy is achieved entirely at the deployment layer:

Self-host Hosted fleet
App instances one (1+ pods) one deployment per org (1+ pods each)
Postgres its own database one shared server, one database per org
Meilisearch its own index one shared server, one index per org
Files local disk or S3 S3 (or RWX volume per org)
Domain the org's domain each org its own domain
Rollout upgrade the instance per-org image bump

Consequences (recorded):

  • Per-org rollout & schema version. Bumping one org's image rolls out that org only; the instance runs its own migrations against its own database. Orgs may sit on different versions. (Pre-1.0: recreate rather than migrate.)
  • Files: with more than one pod per org, files must be on shared storage (S3 or RWX volume) — local disk is single-pod/self-host only. BlobStore (§11) abstracts this.
  • Cross-org features (a future aggregator searching across museums; fleet admin) are a separate service, never a single org-app. Out of MVP.

4. Isolation model

Because each org-app holds credentials scoped to its own database and its own search index, cross-org access is not "prevented" — it is impossible, because the access path does not exist:

  • Postgres: database-per-org + a role granted access to only that database. An instance physically cannot connect to another org's database.
  • Meilisearch: index-per-org + an API key scoped to that org's index only.
  • No Row-Level Security needed — there is no shared multi-org data in any single database to protect, and the app has no cross-org code.
  • Files: per-org bucket/prefix (S3) or per-org volume, with scoped credentials.

Defense-in-depth / verification:

  • A single configuration chokepoint establishes "which org am I" at startup from config; nothing reconstructs it ad hoc.
  • Negative tests assert the app cannot be pointed outside its configured database/index and that scoped credentials reject foreign access.

5. Crate / module layout

A Cargo workspace with role-named member crates (no brand name anywhere):

/                      virtual workspace
  crates/
    domain/            core types, value objects, invariants (no I/O)
    db/                sqlx repositories; ALL SQL lives here
    storage/           BlobStore trait + OpenDAL adapter (S3 / local)
    search/            search abstraction + Meilisearch adapter
    auth/              password + OIDC, session/token, extractors
    api/               axum router, handlers, OpenAPI (utoipa), public + admin
    server/            binary: config, wiring, startup, migrations runner
  web/                 React SPA (separate build), consumes the OpenAPI
  migrations/          SQL migrations (post-1.0; pre-1.0 = recreate)

Dependency direction points inward toward domain. domain has no I/O deps. Each crate has one clear purpose, a defined interface, and is testable in isolation. Experimental/volatile dependencies sit behind a crate-owned trait (BlobStore, the search trait, …) so they are swappable (§14).

6. Data model — hybrid (Approach C)

Three layers:

6.1 Typed relational core

The accountability backbone and the most queried/integrity-critical fields, as real columns/tables with strong types:

  • object number (configurable format), object name, number of objects, brief description, current location, current owner, recorder, recording date, visibility, media links, audit linkage.

6.2 Flexible field layer

  • A field-definition registry: each definition has a key, data type, optional vocabulary/authority binding, validation rules, grouping, and locale behavior.
  • Field values stored as JSONB on the record, validated at write time against the registry.
  • The Spectrum 5.0 Cataloguing field set ships as seed field definitions (see reference/spectrum-5.0-cataloguing-units-of-information.md). Orgs enable a subset or the full set; custom fields are data, not migrations.
  • Trade (explicit): this layer is runtime-typed by design — validated against definitions at runtime, not by the compiler. Hard types where structure is fixed (core, IDs, refs), runtime validation where it is dynamic.

6.3 Controlled vocabularies & authority records

  • First-class relational tables for person / organization / place authorities and term sources (vocabularies) — store once, link many.
  • Referenced from both the typed core and the flexible fields. A field bound to a term source accepts only a resolved reference (§9), never a free string.
  • Multilingual labels (sv/en …) on terms and authorities.

6.4 Content i18n (capability now, workflow later)

  • Localizable text values are language-tagged in the data model from day one (so no painful migration later).
  • The translation workflow/UI is post-MVP; MVP authors enter content in one language while the model already supports more.

7. Surfaces & API

Two cleanly separated surfaces — a load-bearing rule:

  • Public surface/api/public/**: unauthenticated, read-only, serves only public records as a typed PublicView (public-safe fields only).
  • Admin/privileged surface — everything else: authenticated, read/write.

This separation enables independent IP/VPN lockdown (admin behind an ingress allowlist while public stays open), caching, and rate-limiting — all at the ingress layer, not in app code. An optional in-app IP-allowlist middleware is a post-MVP portable fallback.

OpenAPI: code-first with utoipa — the spec is generated from Rust types/handlers (cannot drift) and is the contract the React client consumes.

8. Persistence & data access

  • PostgreSQL via sqlx (async, compile-time-checked queries). All SQL is confined to the db crate, one repository per aggregate — satisfying "no SQL spread everywhere" without an ORM's abstraction.
  • JSONB for the flexible field values (GIN-indexable for search/filter needs).
  • No migrations until 1.0 — pre-1.0 we reshape freely (drop & recreate). Post- 1.0, each instance runs its own migrations on startup (per-org schema version).

9. Type-driven design (cross-cutting)

  • Newtype IDsObjectId, OrgId, MediaId, TermId, AuthorityId; never bare UUIDs.
  • Validated value objectsObjectNumber, Email, and TermRef / AuthorityRef that are constructable only by resolving against the vocabulary/authority. An unvalidated term cannot exist as that type. (Direct mapping of Spectrum's "use a standard term source / form of name".)
  • PublicView projection — a distinct type carrying only public-safe fields; leaking an internal field on the public surface is impossible because the type lacks it. (Preferred over a literal Record<Public> generic, since visibility is runtime data from the DB.)
  • Visibility — an enum with explicit transition methods (publish, unpublish, archive): a type-driven state machine, not a stringly-typed flag.
  • Auth via extractors — public handlers take no auth extractor; privileged handlers require an AuthUser / Authorized<Cap> extractor, so a privileged handler cannot compile without proof of authorization.

10. Authentication & authorization

  • Email/password + external OIDC (the org-app is an OIDC relying party), scoped to the single org the instance serves.
  • No separate IdP and no cross-org switching in MVP (deferred; rare case).
  • Sessions: stateless tokens or a sessions table in the org DB (no Redis required).
  • Authorization enforced through typed extractors (§9); role/permission model kept simple in MVP.

11. File storage

  • BlobStore trait in the storage crate; OpenDAL adapter for S3 and local disk. Chosen on fit (high-level, multi-backend; our bottleneck is network/S3, not syscall I/O). fusio is watch-listed and swappable behind the trait (§14).
  • Media files are linked to records; derivatives/thumbnails/IIIF are post-MVP.
  • Meilisearch, one index per org, scoped API key. A search abstraction in the search crate; Meili adapter behind it.
  • MVP: index catalogue records on write; basic full-text + facet search in admin.
  • Public-facing search is post-MVP.

13. Audit & amendment history

  • One append-only, immutable log in the org database: who / when / what, with field-level before→after diffs, covering domain create/update/delete and auth/security events.
  • Doubles as Spectrum amendment history surfaced on catalogue records (Spectrum requires a transparent record of changes — never silently erase prior terminology).
  • MVP audits writes + auth events; auditing reads is deferred.

14. Visibility & publishing

  • Record-level visibility: draft / internal / public.
  • A fixed never-public field set (location, valuation, insurance, personal data). Per-field publishability is post-MVP.
  • Public API serves only public records via PublicView.

15. Export & backup (distinct)

  • Backup (operational): pg_dump / PITR of the org database. Ops concern.
  • Export (portable handover): a single SQLite file (metadata incl. flattened flexible fields + vocab/authority tables) + plain media files + a manifest — a whole-org archive, openable anywhere, stable long-term.

16. Internationalization

  • UI: Swedish + English via a React i18n library + locale files; localized API validation/error messages.
  • Data: multilingual labels on vocab/authority terms; language-tagged content values in the model (workflow post-MVP, §6.4).

17. Frontend

  • Lean React SPA, evergreen browsers, consuming the OpenAPI. Separate build in web/.
  • "Potato hardware" = an explicit bundle-discipline budget: small dependency set, code-splitting, measured bundle size as a tracked target — not a framework compromise.
  • Suits the data-entry-heavy cataloguing UI (vocabulary autocomplete, dynamic field groups from the registry, inline validation).

18. Dependencies & tech stack

Concern Choice Notes
Language Rust 2024
HTTP axum
API spec utoipa (code-first OpenAPI) drives the React client
DB PostgreSQL + sqlx SQL confined to db crate
Storage OpenDAL behind BlobStore S3 + local; fusio watch-listed
Search Meilisearch behind a search trait index-per-org
Cache Redisdeferred add only when needed; key-prefixed
Frontend React (lean SPA) bundle budget enforced
i18n (FE) React i18n lib sv/en

Dependency philosophy: pre-1.0, choose on capability/fit, not maturity; isolate volatile deps behind owned traits (reversible bets); re-evaluate each bet before 1.0, when the API surface and data formats lock.

19. Testing strategy

  • Core & domain: thorough unit tests; strong types remove whole categories from the test surface.
  • Isolation/security: dedicated negative tests (scoped credentials reject foreign access; the public surface never emits internal fields/non-public records).
  • Repositories: integration tests against Postgres.
  • Flexible fields: validation tested against field definitions.
  • Deliberately not overboard elsewhere.

20. Decision log

# Decision Why Alternatives rejected
D1 Per-org single-tenant binary; tenancy is deployment-only Simplest core (no tenant plumbing); self-host = same artifact; isolation by construction Shared multi-tenant app w/ org_id+RLS (bleed risk, complex core)
D2 Database-per-org + scoped role; index-per-org + scoped key Hard isolation; clean per-org export; no RLS Schema-per-org (softer); shared DB + RLS (shared data path)
D3 Hybrid data model (typed core + JSONB flexible + relational vocab/authority) Small tested core + extensible tail; matches "link don't duplicate" Fixed Spectrum schema (rigid); pure EAV/JSONB (weak integrity)
D4 Type-driven design; PublicView projection; refs as validated types Removes bug classes incl. public-data leaks; shrinks tests Runtime checks only
D5 sqlx + repository layer Compile-time-checked SQL, no ORM, SQL in one place SeaORM (more abstraction); Diesel (sync)
D6 Clean public/admin surface split Enables IP-lock/caching/publishing cleanly Single mixed surface
D7 Ingress-layer IP/VPN lockdown, admin-only-lockable Not the app's job; public stays open App-level firewall (fallback only)
D8 Lean React SPA, evergreen + bundle budget Growth path; ecosystem for data-entry UI; fits weak HW if disciplined htmx/SSR (only needed for ancient browsers — none required)
D9 Append-only audit w/ field diffs = amendment history One mechanism satisfies ops audit + Spectrum requirement Separate audit & history systems
D10 Export = SQLite + files; backup = pg_dump Portable, openable anywhere; distinct from ops backup pg_dump as the only "export" (not portable)
D11 OpenDAL behind BlobStore Right altitude, multi-backend; bottleneck is network not syscalls fusio now (lower-level, DB-engine focus) — watch-listed
D12 utoipa code-first OpenAPI Spec can't drift; drives client spec-first
D13 i18n: UI+vocab labels MVP; content workflow later, model ready now Avoids painful migration; keeps MVP small Full content translation in MVP (too big)
D14 No IdP / no cross-org switching now Rare case; keeps auth simple Build shared IdP now
D15 No migrations until 1.0 Freedom to reshape pre-1.0 Migrations from day one
D16 No product name in code; role-named workspace; name from config Placeholder must never leak; trivial rename later Hardcode a working name

21. Open items for the implementation plan

  • First scaffolding task: dissolve the current biggus-dickus package into the role-named workspace (the placeholder name must not survive into real code).
  • Decide the role/permission model's MVP shape (kept minimal).
  • Decide the object-number format configuration mechanism.
  • Define the SQLite export schema mapping for the hybrid model.
  • Choose specific crates for OIDC, JSONB validation, and React i18n during planning.