docs: add project vision, MVP architecture spec, and reference material

- docs/VISION.md: product vision + feature catalogue (MVP / post-MVP / later)
- docs/specs/2026-06-02-mvp-architecture.md: MVP architecture + 16-entry decision log
- reference/: Spectrum 5.0 cataloguing + Riksantikvarieämbetet source material (build-time reference)
- CLAUDE.md: project guidance for Claude Code

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-02 00:24:53 +02:00
parent 283e27fa06
commit 8f67503f45
18 changed files with 1945 additions and 0 deletions
+207
View File
@@ -0,0 +1,207 @@
# Vision — Collection Management System (working name TBD)
> **Codename note:** the repository folder is a throwaway working name. The real
> product name is undecided and **must never appear in code** (see the
> architecture spec, "Naming"). This document uses neutral terms — "the
> platform", "the system".
## What this is
A modern **collection management system** (Swedish: *samlingsförvaltningssystem*)
for museums and other heritage organizations: software for documenting, managing,
and selectively publishing the objects in a collection. It is built around the
**Spectrum 5.0** standard (Collections Trust) and the guidance from
Riksantikvarieämbetet — see [`reference/`](../reference/) for the source material
this design is grounded in.
It is **not** primarily a web-publishing tool or a digital-asset manager. Its job
is to support the internal processes of collection management — cataloguing,
location/movement control, loans, condition, and so on — with selective public
access layered on top.
## Who it is for
- **Primary (now):** small and mid-sized **non-profit** heritage organizations —
limited budget, limited or volunteer IT, who need something *easy* but correct.
- **Roadmap:** larger institutions with professional staff and IT, who need the
full Spectrum process coverage, custom fields, and the option to run it in their
own environment.
The design tension we hold throughout: **easy by default, flexible when needed.**
A small org runs with a tiny subset and sensible defaults; an advanced org enables
the full standard, adds custom fields, and self-hosts.
## Guiding principles
1. **Small, well-tested, extensible core.** The accountability backbone is small
and strongly typed; extensibility lives in well-bounded modules around it.
2. **Make illegal states unrepresentable.** Lean on Rust's type system to remove
bug classes (newtype IDs, validated value objects, projection types, auth via
extractors). Strong types also shrink the test surface.
3. **Isolation by construction.** An organization's data must *never* bleed into
another's. We achieve this at the deployment/credential layer, not by app
discipline (see architecture spec).
4. **Easy to self-host.** The single-tenant binary *is* the self-host artifact:
one binary, minimal external dependencies, sensible defaults, local-disk
storage option, standalone auth.
5. **Standards-aligned.** Spectrum 5.0 for process/data; CIDOC-CRM / LIDO and
controlled vocabularies (Getty, KulturNav, Wikidata) on the roadmap for
interchange.
6. **Minimal custom code, reversible bets.** Prefer existing crates. Pre-1.0 we
choose dependencies on *fit*, not maturity, and isolate experimental ones
behind our own traits so swapping stays cheap.
7. **Clean public/private separation.** The public, unauthenticated surface is a
distinct, narrow boundary — which makes publishing, caching, rate-limiting, and
network lockdown all clean.
## Architecture in one paragraph
The application binary is **always single-tenant** — one running instance serves
exactly one organization and knows nothing of any other. "Multi-tenancy" is purely
a *deployment* concern: a hosted fleet runs many copies of the same binary, each
with its own Postgres database and Meilisearch index (scoped credentials) against
shared database/search servers, each on its own domain, independently rolled out.
**Self-hosting is the same binary with one database.** Data isolation is therefore
guaranteed by credentials and topology, not by `org_id` filtering in code. See
[`specs/2026-06-02-mvp-architecture.md`](specs/2026-06-02-mvp-architecture.md).
---
## Feature catalogue
Each feature is tagged **[MVP]**, **[Post-MVP]**, or **[Later]**. The MVP cut is
the smallest build that is genuinely useful *and* exercises every architectural
pillar, so nothing structural is discovered late.
### Catalogue core
- **[MVP]** Catalogue records for objects and groups of objects, with a typed
**inventory minimum** (object number, name, count, brief description, current
location, current owner, recorder, recording date).
- **[MVP]** **Hybrid flexible fields** — a field-definition registry + JSONB value
layer, seeded with the **Spectrum 5.0 Cataloguing** field set; orgs enable a
subset or the whole set without schema changes.
- **[MVP]** Object numbering with a configurable standard format; multiple
historical numbers per object.
- **[Post-MVP]** Org-defined **custom fields** beyond Spectrum (the registry
already supports it; this is the management UI + validation polish).
- **[Post-MVP]** Object groups / hierarchical relationships, related-object links.
- **[Later]** Subject-specialist templates / external cataloguing standards.
### Controlled vocabularies & authority records
- **[MVP]** **Authority records** for person, organization, place — *store once,
link many* — referenced from core and flexible fields.
- **[MVP]** **Controlled vocabularies** (term sources) for fields like material,
object name, technique; fields bound to a vocabulary accept only resolved terms.
- **[MVP]** **Multilingual labels** on terms and authorities (sv/en) in the data
model.
- **[Post-MVP]** Import/sync from external vocabularies — Getty AAT/TGN/ULAN,
KulturNav, Wikidata; storing external URIs on local terms.
- **[Later]** Linked-open-data publishing of authorities.
### Media & files
- **[MVP]** Upload and attach images/documents to records via **OpenDAL**
(S3 or local disk), behind a `BlobStore` trait.
- **[Post-MVP]** Thumbnails / derivative generation; per-reproduction licensing;
multiple reproductions per object.
- **[Later]** **IIIF** image serving; bulk/mass ingest pipelines; dedicated
image-management (DAM-style) workflows.
### Search
- **[MVP]** **Meilisearch** indexing of records; basic faceted/full-text search in
the admin UI.
- **[Post-MVP]** Saved searches, advanced filters, sort presets, search across
all fields incl. flexible fields.
- **[Later]** Public-facing search on the published catalogue.
### Audit & history
- **[MVP]** **Append-only, immutable audit log** — who/when/what with field-level
before→after diffs — covering domain writes and auth/security events; surfaced
as Spectrum **amendment history** on records.
- **[Post-MVP]** Auditing of sensitive *reads*; audit export/reporting; retention
policy controls.
### Publishing & public access
- **[MVP]** **Record-level visibility** (draft / internal / public) with a fixed
set of never-public fields (location, valuation, insurance, personal data).
- **[MVP]** **Public read API** (OpenAPI) serving only public records, only
public-safe fields (a typed `PublicView` projection).
- **[Post-MVP]** **Per-field publishability** flags; public collection landing
pages / embeddable widgets.
- **[Later]** Aggregator interoperability — **LIDO** export, **OAI-PMH** harvest,
feeds to **K-samsök/Kringla**, **Europeana**, Sveriges dataportal; Wikidata/
Wikimedia publishing.
### Authentication & access control
- **[MVP]** **Email/password** and **external OIDC** login, scoped to the single
org the instance serves; role/permission model enforced via typed extractors.
- **[Post-MVP]** Granular per-field / per-process permissions; API tokens for
integrations.
- **[Later]** **Shared identity provider** + **cross-org membership and fast
switching** (deferred by decision; revisit if multi-org usage grows).
### Import / export / portability
- **[MVP]** **Portable export**: a single **SQLite** file (metadata incl.
flattened flexible fields + vocab/authority tables) + plain media files + a
manifest — a whole-org archive, openable anywhere.
- **[Post-MVP]** Import from Excel/CSV (the common "we have a spreadsheet" path)
and from another instance's export.
- **[Post-MVP]** Migration tooling from legacy systems.
### Reporting & output
- **[Post-MVP]** Templated outputs: exhibition labels, loan letters, condition
reports, inventory lists; user-defined templates.
- **[Later]** Statistics/dashboards (records per year, % with images, etc.).
### Spectrum procedure coverage
The MVP implements **Cataloguing**. The other Spectrum 5.0 procedures are the
functional roadmap:
- **[Post-MVP] Primary procedures:** Object entry, Acquisition & accessioning,
**Location & movement control**, Inventory, Loans in, Loans out, Object exit,
Documentation planning.
- **[Later] Secondary procedures:** Rights management, Reproduction, Condition
checking & technical assessment, Conservation & collections care, Valuation,
Insurance & indemnity, Use of collections (incl. exhibitions), Emergency/disaster
planning, Damage & loss, Deaccession & disposal, Collections review, Audit.
### Internationalization
- **[MVP]** UI localization (Swedish + English); localized API validation/error
messages; multilingual vocab/authority labels; data model carries language-tagged
content values.
- **[Post-MVP]** Translation **workflow/UI** for per-field record content;
additional UI locales.
### Hosting, fleet & operations
- **[MVP]** Runs as a single instance (self-host or one hosted cell); local-disk or
S3 storage; per-instance migrations on startup.
- **[Post-MVP]** Per-org **provisioning control plane** (create DB + role + Meili
key + deployment + domain); batched/canary rollouts; A/B routing.
- **[Post-MVP]** Optional **Redis** (cache/sessions/rate-limit) with per-org key
prefixing, added only when a real bottleneck appears.
- **[Post-MVP]** In-app IP-allowlist middleware as a portable fallback for
self-hosters without ingress-level controls.
- **[Later]** Multi-Postgres sharding for large fleets; per-org Redis instances.
---
## Explicitly deferred decisions (recorded so they aren't relitigated)
- **Multi-org user switching / shared IdP** — rare case; deferred until it
demonstrably hurts.
- **Database migrations machinery** — not until 1.0. Pre-1.0 the data model is
reshaped freely (recreate, don't migrate).
- **Final product name** — TBD; never hardcoded.
- **Hosting/ops documentation** — later, but the design keeps self-host easy
throughout.
Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

+316
View File
@@ -0,0 +1,316 @@
# MVP Architecture & Design
**Status:** approved design, pre-implementation
**Date:** 2026-06-02
**Scope:** the MVP — the smallest useful build that exercises every architectural
pillar. Companion to [`../VISION.md`](../VISION.md) (full feature catalogue) and
grounded in [`../../reference/`](../../reference/) (Spectrum 5.0 + Riksantikvarie-
ämbetet source material).
> Neutral naming throughout. No product/brand name appears in code or these docs
> (see §13).
---
## 1. Goals & non-goals
**Goals**
- A small, strongly-typed, well-tested **core** that is easy to extend.
- An organization can **catalogue** its collection (Spectrum Cataloguing), attach
**media**, **search** it, control **visibility**, and expose **public** records.
- **Airtight org isolation** and a full **audit trail** from day one.
- **Easy self-hosting**: one binary, one database, minimal dependencies.
**Non-goals (MVP)**
- Other Spectrum procedures as workflows (entry, accession, loans, location/
movement control, …) — roadmap.
- Reporting/label templates, aggregator/LIDO/OAI-PMH/IIIF, translation workflow
UI, fleet provisioning/control plane, migrations machinery (none until 1.0).
---
## 2. Guiding principles
- **Make illegal states unrepresentable** (§9). Parse, don't validate.
- **Isolation by construction** (§4): credentials + topology, not `org_id`
filtering in code.
- **Module separation; no SQL spread.** SQL lives only in repository modules (§5,
§8).
- **Minimal custom code, reversible dependency bets** (§14).
- **Self-host is first-class** (§12).
- **Well-tested, not overboard** (§15): strong types shrink the test surface; the
isolation/security and the core get thorough tests; the dynamic field layer is
validated at runtime.
---
## 3. Deployment topology & tenancy
**The application binary is always single-tenant.** One running instance serves
exactly one organization and contains no concept of "other orgs". There is **no
multi-tenant code path**. Multi-tenancy is achieved entirely at the deployment
layer:
| | Self-host | Hosted fleet |
|---|---|---|
| App instances | one (1+ pods) | one deployment **per org** (1+ pods each) |
| Postgres | its own database | **one shared server**, one **database per org** |
| Meilisearch | its own index | **one shared server**, one **index per org** |
| Files | local disk or S3 | S3 (or RWX volume per org) |
| Domain | the org's domain | each org its own domain |
| Rollout | upgrade the instance | **per-org** image bump |
Consequences (recorded):
- **Per-org rollout & schema version.** Bumping one org's image rolls out that org
only; the instance runs its own migrations against its own database. Orgs may sit
on different versions. (Pre-1.0: recreate rather than migrate.)
- **Files:** with more than one pod per org, files must be on shared storage (S3 or
RWX volume) — local disk is single-pod/self-host only. `BlobStore` (§11) abstracts
this.
- **Cross-org features** (a future aggregator searching across museums; fleet
admin) are a **separate service**, never a single org-app. Out of MVP.
## 4. Isolation model
Because each org-app holds **credentials scoped to its own database and its own
search index**, cross-org access is not "prevented" — it is **impossible, because
the access path does not exist**:
- **Postgres:** database-per-org + a role granted access to *only* that database.
An instance physically cannot connect to another org's database.
- **Meilisearch:** index-per-org + an API key scoped to that org's index only.
- **No Row-Level Security needed** — there is no shared multi-org data in any
single database to protect, and the app has no cross-org code.
- **Files:** per-org bucket/prefix (S3) or per-org volume, with scoped credentials.
Defense-in-depth / verification:
- A **single configuration chokepoint** establishes "which org am I" at startup
from config; nothing reconstructs it ad hoc.
- **Negative tests** assert the app cannot be pointed outside its configured
database/index and that scoped credentials reject foreign access.
## 5. Crate / module layout
A Cargo **workspace** with **role-named** member crates (no brand name anywhere):
```
/ virtual workspace
crates/
domain/ core types, value objects, invariants (no I/O)
db/ sqlx repositories; ALL SQL lives here
storage/ BlobStore trait + OpenDAL adapter (S3 / local)
search/ search abstraction + Meilisearch adapter
auth/ password + OIDC, session/token, extractors
api/ axum router, handlers, OpenAPI (utoipa), public + admin
server/ binary: config, wiring, startup, migrations runner
web/ React SPA (separate build), consumes the OpenAPI
migrations/ SQL migrations (post-1.0; pre-1.0 = recreate)
```
Dependency direction points inward toward `domain`. `domain` has no I/O deps.
Each crate has one clear purpose, a defined interface, and is testable in
isolation. Experimental/volatile dependencies sit behind a crate-owned trait
(`BlobStore`, the search trait, …) so they are swappable (§14).
## 6. Data model — hybrid (Approach C)
Three layers:
### 6.1 Typed relational core
The accountability backbone and the most queried/integrity-critical fields, as
real columns/tables with strong types:
- object number (configurable format), object name, number of objects,
brief description, current location, current owner, recorder, recording date,
**visibility**, media links, audit linkage.
### 6.2 Flexible field layer
- A **field-definition registry**: each definition has a key, data type, optional
**vocabulary/authority binding**, validation rules, grouping, and locale
behavior.
- Field **values** stored as **JSONB** on the record, validated at write time
against the registry.
- The **Spectrum 5.0 Cataloguing field set** ships as **seed field definitions**
(see [`reference/spectrum-5.0-cataloguing-units-of-information.md`](../../reference/spectrum-5.0-cataloguing-units-of-information.md)).
Orgs enable a subset or the full set; custom fields are *data*, not migrations.
- **Trade (explicit):** this layer is **runtime-typed by design** — validated
against definitions at runtime, not by the compiler. Hard types where structure
is fixed (core, IDs, refs), runtime validation where it is dynamic.
### 6.3 Controlled vocabularies & authority records
- First-class relational tables for **person / organization / place** authorities
and **term sources** (vocabularies) — *store once, link many*.
- Referenced from both the typed core and the flexible fields. A field bound to a
term source accepts only a **resolved reference** (§9), never a free string.
- **Multilingual labels** (sv/en …) on terms and authorities.
### 6.4 Content i18n (capability now, workflow later)
- Localizable text values are **language-tagged in the data model from day one**
(so no painful migration later).
- The **translation workflow/UI is post-MVP**; MVP authors enter content in one
language while the model already supports more.
## 7. Surfaces & API
Two cleanly separated surfaces — a **load-bearing** rule:
- **Public surface** — `/api/public/**`: unauthenticated, **read-only**, serves
only **public** records as a typed **`PublicView`** (public-safe fields only).
- **Admin/privileged surface** — everything else: authenticated, read/write.
This separation enables independent **IP/VPN lockdown** (admin behind an ingress
allowlist while public stays open), caching, and rate-limiting — all at the
ingress layer, not in app code. An optional in-app IP-allowlist middleware is a
post-MVP portable fallback.
**OpenAPI:** code-first with **utoipa** — the spec is generated from Rust
types/handlers (cannot drift) and is the contract the React client consumes.
## 8. Persistence & data access
- **PostgreSQL** via **sqlx** (async, compile-time-checked queries). **All SQL is
confined to the `db` crate**, one repository per aggregate — satisfying "no SQL
spread everywhere" without an ORM's abstraction.
- JSONB for the flexible field values (GIN-indexable for search/filter needs).
- **No migrations until 1.0** — pre-1.0 we reshape freely (drop & recreate). Post-
1.0, each instance runs its own migrations on startup (per-org schema version).
## 9. Type-driven design (cross-cutting)
- **Newtype IDs** — `ObjectId`, `OrgId`, `MediaId`, `TermId`, `AuthorityId`; never
bare UUIDs.
- **Validated value objects** — `ObjectNumber`, `Email`, and `TermRef` /
`AuthorityRef` that are **constructable only by resolving** against the
vocabulary/authority. An unvalidated term cannot exist as that type. (Direct
mapping of Spectrum's "use a standard term source / form of name".)
- **`PublicView` projection** — a distinct type carrying only public-safe fields;
leaking an internal field on the public surface is impossible because the type
lacks it. (Preferred over a literal `Record<Public>` generic, since visibility is
runtime data from the DB.)
- **Visibility** — an enum with explicit transition methods (`publish`,
`unpublish`, `archive`): a type-driven state machine, not a stringly-typed flag.
- **Auth via extractors** — public handlers take no auth extractor; privileged
handlers require an `AuthUser` / `Authorized<Cap>` extractor, so a privileged
handler cannot compile without proof of authorization.
## 10. Authentication & authorization
- **Email/password** + **external OIDC** (the org-app is an OIDC relying party),
scoped to the single org the instance serves.
- **No separate IdP and no cross-org switching** in MVP (deferred; rare case).
- Sessions: stateless tokens or a sessions table in the org DB (no Redis required).
- Authorization enforced through typed extractors (§9); role/permission model kept
simple in MVP.
## 11. File storage
- **`BlobStore` trait** in the `storage` crate; **OpenDAL** adapter for **S3 and
local disk**. Chosen on fit (high-level, multi-backend; our bottleneck is
network/S3, not syscall I/O). `fusio` is watch-listed and swappable behind the
trait (§14).
- Media files are linked to records; derivatives/thumbnails/IIIF are post-MVP.
## 12. Search
- **Meilisearch**, one index per org, scoped API key. A search abstraction in the
`search` crate; Meili adapter behind it.
- MVP: index catalogue records on write; basic full-text + facet search in admin.
- Public-facing search is post-MVP.
## 13. Audit & amendment history
- **One append-only, immutable log** in the org database: who / when / what, with
**field-level before→after diffs**, covering domain create/update/delete and
auth/security events.
- Doubles as Spectrum **amendment history** surfaced on catalogue records
(Spectrum requires a transparent record of changes — never silently erase prior
terminology).
- MVP audits **writes + auth events**; auditing reads is deferred.
## 14. Visibility & publishing
- **Record-level visibility**: `draft` / `internal` / `public`.
- A fixed **never-public** field set (location, valuation, insurance, personal
data). Per-field publishability is post-MVP.
- Public API serves only `public` records via `PublicView`.
## 15. Export & backup (distinct)
- **Backup** (operational): `pg_dump` / PITR of the org database. Ops concern.
- **Export** (portable handover): a single **SQLite** file (metadata incl.
flattened flexible fields + vocab/authority tables) + plain **media files** + a
**manifest** — a whole-org archive, openable anywhere, stable long-term.
## 16. Internationalization
- **UI:** Swedish + English via a React i18n library + locale files; localized API
validation/error messages.
- **Data:** multilingual labels on vocab/authority terms; language-tagged content
values in the model (workflow post-MVP, §6.4).
## 17. Frontend
- **Lean React SPA**, evergreen browsers, consuming the OpenAPI. Separate build in
`web/`.
- **"Potato hardware" = an explicit bundle-discipline budget**: small dependency
set, code-splitting, measured bundle size as a tracked target — *not* a framework
compromise.
- Suits the data-entry-heavy cataloguing UI (vocabulary autocomplete, dynamic field
groups from the registry, inline validation).
## 18. Dependencies & tech stack
| Concern | Choice | Notes |
|---|---|---|
| Language | **Rust 2024** | |
| HTTP | **axum** | |
| API spec | **utoipa** (code-first OpenAPI) | drives the React client |
| DB | **PostgreSQL** + **sqlx** | SQL confined to `db` crate |
| Storage | **OpenDAL** behind `BlobStore` | S3 + local; `fusio` watch-listed |
| Search | **Meilisearch** behind a search trait | index-per-org |
| Cache | **Redis***deferred* | add only when needed; key-prefixed |
| Frontend | **React** (lean SPA) | bundle budget enforced |
| i18n (FE) | React i18n lib | sv/en |
**Dependency philosophy:** pre-1.0, choose on **capability/fit, not maturity**;
isolate volatile deps behind owned traits (reversible bets); **re-evaluate each
bet before 1.0**, when the API surface and data formats lock.
## 19. Testing strategy
- **Core & domain:** thorough unit tests; strong types remove whole categories from
the test surface.
- **Isolation/security:** dedicated **negative tests** (scoped credentials reject
foreign access; the public surface never emits internal fields/non-public
records).
- **Repositories:** integration tests against Postgres.
- **Flexible fields:** validation tested against field definitions.
- Deliberately **not overboard** elsewhere.
## 20. Decision log
| # | Decision | Why | Alternatives rejected |
|---|---|---|---|
| D1 | Per-org single-tenant binary; tenancy is deployment-only | Simplest core (no tenant plumbing); self-host = same artifact; isolation by construction | Shared multi-tenant app w/ `org_id`+RLS (bleed risk, complex core) |
| D2 | Database-per-org + scoped role; index-per-org + scoped key | Hard isolation; clean per-org export; no RLS | Schema-per-org (softer); shared DB + RLS (shared data path) |
| D3 | Hybrid data model (typed core + JSONB flexible + relational vocab/authority) | Small tested core + extensible tail; matches "link don't duplicate" | Fixed Spectrum schema (rigid); pure EAV/JSONB (weak integrity) |
| D4 | Type-driven design; `PublicView` projection; refs as validated types | Removes bug classes incl. public-data leaks; shrinks tests | Runtime checks only |
| D5 | sqlx + repository layer | Compile-time-checked SQL, no ORM, SQL in one place | SeaORM (more abstraction); Diesel (sync) |
| D6 | Clean public/admin surface split | Enables IP-lock/caching/publishing cleanly | Single mixed surface |
| D7 | Ingress-layer IP/VPN lockdown, admin-only-lockable | Not the app's job; public stays open | App-level firewall (fallback only) |
| D8 | Lean React SPA, evergreen + bundle budget | Growth path; ecosystem for data-entry UI; fits weak HW if disciplined | htmx/SSR (only needed for ancient browsers — none required) |
| D9 | Append-only audit w/ field diffs = amendment history | One mechanism satisfies ops audit + Spectrum requirement | Separate audit & history systems |
| D10 | Export = SQLite + files; backup = pg_dump | Portable, openable anywhere; distinct from ops backup | pg_dump as the only "export" (not portable) |
| D11 | OpenDAL behind `BlobStore` | Right altitude, multi-backend; bottleneck is network not syscalls | fusio now (lower-level, DB-engine focus) — watch-listed |
| D12 | utoipa code-first OpenAPI | Spec can't drift; drives client | spec-first |
| D13 | i18n: UI+vocab labels MVP; content workflow later, model ready now | Avoids painful migration; keeps MVP small | Full content translation in MVP (too big) |
| D14 | No IdP / no cross-org switching now | Rare case; keeps auth simple | Build shared IdP now |
| D15 | No migrations until 1.0 | Freedom to reshape pre-1.0 | Migrations from day one |
| D16 | No product name in code; role-named workspace; name from config | Placeholder must never leak; trivial rename later | Hardcode a working name |
## 21. Open items for the implementation plan
- First scaffolding task: **dissolve the current `biggus-dickus` package** into the
role-named workspace (the placeholder name must not survive into real code).
- Decide the role/permission model's MVP shape (kept minimal).
- Decide the object-number format configuration mechanism.
- Define the SQLite export schema mapping for the hybrid model.
- Choose specific crates for OIDC, JSONB validation, and React i18n during planning.