Files

T

logaritmisk 8f67503f45 docs: add project vision, MVP architecture spec, and reference material

- docs/VISION.md: product vision + feature catalogue (MVP / post-MVP / later)
- docs/specs/2026-06-02-mvp-architecture.md: MVP architecture + 16-entry decision log
- reference/: Spectrum 5.0 cataloguing + Riksantikvarieämbetet source material (build-time reference)
- CLAUDE.md: project guidance for Claude Code

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-02 00:24:53 +02:00

16 KiB

Raw Blame History

MVP Architecture & Design

Status: approved design, pre-implementation Date: 2026-06-02 Scope: the MVP — the smallest useful build that exercises every architectural pillar. Companion to ../VISION.md (full feature catalogue) and grounded in ../../reference/ (Spectrum 5.0 + Riksantikvarie- ämbetet source material).

Neutral naming throughout. No product/brand name appears in code or these docs (see §13).

1. Goals & non-goals

Goals

A small, strongly-typed, well-tested core that is easy to extend.
An organization can catalogue its collection (Spectrum Cataloguing), attach media, search it, control visibility, and expose public records.
Airtight org isolation and a full audit trail from day one.
Easy self-hosting: one binary, one database, minimal dependencies.

Non-goals (MVP)

Other Spectrum procedures as workflows (entry, accession, loans, location/ movement control, …) — roadmap.
Reporting/label templates, aggregator/LIDO/OAI-PMH/IIIF, translation workflow UI, fleet provisioning/control plane, migrations machinery (none until 1.0).

2. Guiding principles

Make illegal states unrepresentable (§9). Parse, don't validate.
Isolation by construction (§4): credentials + topology, not org_id filtering in code.
Module separation; no SQL spread. SQL lives only in repository modules (§5, §8).
Minimal custom code, reversible dependency bets (§14).
Self-host is first-class (§12).
Well-tested, not overboard (§15): strong types shrink the test surface; the isolation/security and the core get thorough tests; the dynamic field layer is validated at runtime.

3. Deployment topology & tenancy

The application binary is always single-tenant. One running instance serves exactly one organization and contains no concept of "other orgs". There is no multi-tenant code path. Multi-tenancy is achieved entirely at the deployment layer:

	Self-host	Hosted fleet
App instances	one (1+ pods)	one deployment per org (1+ pods each)
Postgres	its own database	one shared server, one database per org
Meilisearch	its own index	one shared server, one index per org
Files	local disk or S3	S3 (or RWX volume per org)
Domain	the org's domain	each org its own domain
Rollout	upgrade the instance	per-org image bump

Consequences (recorded):

Per-org rollout & schema version. Bumping one org's image rolls out that org only; the instance runs its own migrations against its own database. Orgs may sit on different versions. (Pre-1.0: recreate rather than migrate.)
Files: with more than one pod per org, files must be on shared storage (S3 or RWX volume) — local disk is single-pod/self-host only. BlobStore (§11) abstracts this.
Cross-org features (a future aggregator searching across museums; fleet admin) are a separate service, never a single org-app. Out of MVP.

4. Isolation model

Because each org-app holds credentials scoped to its own database and its own search index, cross-org access is not "prevented" — it is impossible, because the access path does not exist:

Postgres: database-per-org + a role granted access to only that database. An instance physically cannot connect to another org's database.
Meilisearch: index-per-org + an API key scoped to that org's index only.
No Row-Level Security needed — there is no shared multi-org data in any single database to protect, and the app has no cross-org code.
Files: per-org bucket/prefix (S3) or per-org volume, with scoped credentials.

Defense-in-depth / verification:

A single configuration chokepoint establishes "which org am I" at startup from config; nothing reconstructs it ad hoc.
Negative tests assert the app cannot be pointed outside its configured database/index and that scoped credentials reject foreign access.

5. Crate / module layout

A Cargo workspace with role-named member crates (no brand name anywhere):

/                      virtual workspace
  crates/
    domain/            core types, value objects, invariants (no I/O)
    db/                sqlx repositories; ALL SQL lives here
    storage/           BlobStore trait + OpenDAL adapter (S3 / local)
    search/            search abstraction + Meilisearch adapter
    auth/              password + OIDC, session/token, extractors
    api/               axum router, handlers, OpenAPI (utoipa), public + admin
    server/            binary: config, wiring, startup, migrations runner
  web/                 React SPA (separate build), consumes the OpenAPI
  migrations/          SQL migrations (post-1.0; pre-1.0 = recreate)

Dependency direction points inward toward domain. domain has no I/O deps. Each crate has one clear purpose, a defined interface, and is testable in isolation. Experimental/volatile dependencies sit behind a crate-owned trait (BlobStore, the search trait, …) so they are swappable (§14).

6. Data model — hybrid (Approach C)

Three layers:

6.1 Typed relational core

The accountability backbone and the most queried/integrity-critical fields, as real columns/tables with strong types:

object number (configurable format), object name, number of objects, brief description, current location, current owner, recorder, recording date, visibility, media links, audit linkage.

6.2 Flexible field layer

A field-definition registry: each definition has a key, data type, optional vocabulary/authority binding, validation rules, grouping, and locale behavior.
Field values stored as JSONB on the record, validated at write time against the registry.
The Spectrum 5.0 Cataloguing field set ships as seed field definitions (see reference/spectrum-5.0-cataloguing-units-of-information.md). Orgs enable a subset or the full set; custom fields are data, not migrations.
Trade (explicit): this layer is runtime-typed by design — validated against definitions at runtime, not by the compiler. Hard types where structure is fixed (core, IDs, refs), runtime validation where it is dynamic.

6.3 Controlled vocabularies & authority records

First-class relational tables for person / organization / place authorities and term sources (vocabularies) — store once, link many.
Referenced from both the typed core and the flexible fields. A field bound to a term source accepts only a resolved reference (§9), never a free string.
Multilingual labels (sv/en …) on terms and authorities.

6.4 Content i18n (capability now, workflow later)

Localizable text values are language-tagged in the data model from day one (so no painful migration later).
The translation workflow/UI is post-MVP; MVP authors enter content in one language while the model already supports more.

7. Surfaces & API

Two cleanly separated surfaces — a load-bearing rule:

Public surface — /api/public/**: unauthenticated, read-only, serves only public records as a typed PublicView (public-safe fields only).
Admin/privileged surface — everything else: authenticated, read/write.

This separation enables independent IP/VPN lockdown (admin behind an ingress allowlist while public stays open), caching, and rate-limiting — all at the ingress layer, not in app code. An optional in-app IP-allowlist middleware is a post-MVP portable fallback.

OpenAPI: code-first with utoipa — the spec is generated from Rust types/handlers (cannot drift) and is the contract the React client consumes.

8. Persistence & data access

PostgreSQL via sqlx (async, compile-time-checked queries). All SQL is confined to the db crate, one repository per aggregate — satisfying "no SQL spread everywhere" without an ORM's abstraction.
JSONB for the flexible field values (GIN-indexable for search/filter needs).
No migrations until 1.0 — pre-1.0 we reshape freely (drop & recreate). Post- 1.0, each instance runs its own migrations on startup (per-org schema version).

9. Type-driven design (cross-cutting)

Newtype IDs — ObjectId, OrgId, MediaId, TermId, AuthorityId; never bare UUIDs.
Validated value objects — ObjectNumber, Email, and TermRef / AuthorityRef that are constructable only by resolving against the vocabulary/authority. An unvalidated term cannot exist as that type. (Direct mapping of Spectrum's "use a standard term source / form of name".)
PublicView projection — a distinct type carrying only public-safe fields; leaking an internal field on the public surface is impossible because the type lacks it. (Preferred over a literal Record<Public> generic, since visibility is runtime data from the DB.)
Visibility — an enum with explicit transition methods (publish, unpublish, archive): a type-driven state machine, not a stringly-typed flag.
Auth via extractors — public handlers take no auth extractor; privileged handlers require an AuthUser / Authorized<Cap> extractor, so a privileged handler cannot compile without proof of authorization.

10. Authentication & authorization

Email/password + external OIDC (the org-app is an OIDC relying party), scoped to the single org the instance serves.
No separate IdP and no cross-org switching in MVP (deferred; rare case).
Sessions: stateless tokens or a sessions table in the org DB (no Redis required).
Authorization enforced through typed extractors (§9); role/permission model kept simple in MVP.

11. File storage

BlobStore trait in the storage crate; OpenDAL adapter for S3 and local disk. Chosen on fit (high-level, multi-backend; our bottleneck is network/S3, not syscall I/O). fusio is watch-listed and swappable behind the trait (§14).
Media files are linked to records; derivatives/thumbnails/IIIF are post-MVP.

12. Search

Meilisearch, one index per org, scoped API key. A search abstraction in the search crate; Meili adapter behind it.
MVP: index catalogue records on write; basic full-text + facet search in admin.
Public-facing search is post-MVP.

13. Audit & amendment history

One append-only, immutable log in the org database: who / when / what, with field-level before→after diffs, covering domain create/update/delete and auth/security events.
Doubles as Spectrum amendment history surfaced on catalogue records (Spectrum requires a transparent record of changes — never silently erase prior terminology).
MVP audits writes + auth events; auditing reads is deferred.

14. Visibility & publishing

Record-level visibility: draft / internal / public.
A fixed never-public field set (location, valuation, insurance, personal data). Per-field publishability is post-MVP.
Public API serves only public records via PublicView.

15. Export & backup (distinct)

Backup (operational): pg_dump / PITR of the org database. Ops concern.
Export (portable handover): a single SQLite file (metadata incl. flattened flexible fields + vocab/authority tables) + plain media files + a manifest — a whole-org archive, openable anywhere, stable long-term.

16. Internationalization

UI: Swedish + English via a React i18n library + locale files; localized API validation/error messages.
Data: multilingual labels on vocab/authority terms; language-tagged content values in the model (workflow post-MVP, §6.4).

17. Frontend

Lean React SPA, evergreen browsers, consuming the OpenAPI. Separate build in web/.
"Potato hardware" = an explicit bundle-discipline budget: small dependency set, code-splitting, measured bundle size as a tracked target — not a framework compromise.
Suits the data-entry-heavy cataloguing UI (vocabulary autocomplete, dynamic field groups from the registry, inline validation).

18. Dependencies & tech stack

Concern	Choice	Notes
Language	Rust 2024
HTTP	axum
API spec	utoipa (code-first OpenAPI)	drives the React client
DB	PostgreSQL + sqlx	SQL confined to `db` crate
Storage	OpenDAL behind `BlobStore`	S3 + local; `fusio` watch-listed
Search	Meilisearch behind a search trait	index-per-org
Cache	Redis — deferred	add only when needed; key-prefixed
Frontend	React (lean SPA)	bundle budget enforced
i18n (FE)	React i18n lib	sv/en

Dependency philosophy: pre-1.0, choose on capability/fit, not maturity; isolate volatile deps behind owned traits (reversible bets); re-evaluate each bet before 1.0, when the API surface and data formats lock.

19. Testing strategy

Core & domain: thorough unit tests; strong types remove whole categories from the test surface.
Isolation/security: dedicated negative tests (scoped credentials reject foreign access; the public surface never emits internal fields/non-public records).
Repositories: integration tests against Postgres.
Flexible fields: validation tested against field definitions.
Deliberately not overboard elsewhere.

20. Decision log

#	Decision	Why	Alternatives rejected
D1	Per-org single-tenant binary; tenancy is deployment-only	Simplest core (no tenant plumbing); self-host = same artifact; isolation by construction	Shared multi-tenant app w/ `org_id`+RLS (bleed risk, complex core)
D2	Database-per-org + scoped role; index-per-org + scoped key	Hard isolation; clean per-org export; no RLS	Schema-per-org (softer); shared DB + RLS (shared data path)
D3	Hybrid data model (typed core + JSONB flexible + relational vocab/authority)	Small tested core + extensible tail; matches "link don't duplicate"	Fixed Spectrum schema (rigid); pure EAV/JSONB (weak integrity)
D4	Type-driven design; `PublicView` projection; refs as validated types	Removes bug classes incl. public-data leaks; shrinks tests	Runtime checks only
D5	sqlx + repository layer	Compile-time-checked SQL, no ORM, SQL in one place	SeaORM (more abstraction); Diesel (sync)
D6	Clean public/admin surface split	Enables IP-lock/caching/publishing cleanly	Single mixed surface
D7	Ingress-layer IP/VPN lockdown, admin-only-lockable	Not the app's job; public stays open	App-level firewall (fallback only)
D8	Lean React SPA, evergreen + bundle budget	Growth path; ecosystem for data-entry UI; fits weak HW if disciplined	htmx/SSR (only needed for ancient browsers — none required)
D9	Append-only audit w/ field diffs = amendment history	One mechanism satisfies ops audit + Spectrum requirement	Separate audit & history systems
D10	Export = SQLite + files; backup = pg_dump	Portable, openable anywhere; distinct from ops backup	pg_dump as the only "export" (not portable)
D11	OpenDAL behind `BlobStore`	Right altitude, multi-backend; bottleneck is network not syscalls	fusio now (lower-level, DB-engine focus) — watch-listed
D12	utoipa code-first OpenAPI	Spec can't drift; drives client	spec-first
D13	i18n: UI+vocab labels MVP; content workflow later, model ready now	Avoids painful migration; keeps MVP small	Full content translation in MVP (too big)
D14	No IdP / no cross-org switching now	Rare case; keeps auth simple	Build shared IdP now
D15	No migrations until 1.0	Freedom to reshape pre-1.0	Migrations from day one
D16	No product name in code; role-named workspace; name from config	Placeholder must never leak; trivial rename later	Hardcode a working name

21. Open items for the implementation plan

First scaffolding task: dissolve the current biggus-dickus package into the role-named workspace (the placeholder name must not survive into real code).
Decide the role/permission model's MVP shape (kept minimal).
Decide the object-number format configuration mechanism.
Define the SQLite export schema mapping for the hybrid model.
Choose specific crates for OIDC, JSONB validation, and React i18n during planning.

16 KiB Raw Blame History