Files
biggus-dickus/docs/plans/2026-06-02-spectrum-seed.md
logaritmisk 91a9eb2964 docs: add Spectrum cataloguing seed plan
Idempotent seed of a representative subset: 3 vocabularies + 12 descriptive field
definitions with term/authority bindings. Empty vocabularies; wiring deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:17:56 +02:00

224 lines
10 KiB
Markdown

# Spectrum Cataloguing Seed Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Seed a representative subset of the Spectrum Cataloguing field set — empty controlled vocabularies + the descriptive field definitions that bind to them and to authorities — turning the abstract registry (Plans 2/4) into usable museum fields. Idempotent; no terms seeded (orgs/imports populate vocabularies later).
**Architecture:** A new `db::seed` module with `seed_spectrum_cataloguing(&mut PgConnection)`: get-or-create the vocabularies by key, then get-or-create each field definition by key (using the vocabularies' ids for `Term`-bound fields). Built entirely on the existing `db::vocab`/`db::fields` repositories. No migration, no domain changes. Invoking the seed (CLI / server flag / per-org provisioning) is a deferred follow-on.
**Tech Stack:** Rust 2024, sqlx 0.8. Tests use `#[sqlx::test]`.
## Design decisions (approved)
- Representative subset (~12 descriptive fields + 3 vocabularies), not all ~90 Spectrum units; the inventory minimum stays in the typed core (Plan 3).
- Seed empty vocabularies + the field definitions only — not terms.
- Idempotent (get-or-create by unique key); safe to re-run.
- Wiring (how/when the seed runs) deferred.
## Prerequisites
- Postgres for tests; pass `DATABASE_URL` inline. Pass transaction connections as `&mut tx` (NOT `&mut *tx`).
## File Structure
```
crates/db/
src/seed.rs seed_spectrum_cataloguing + helpers
src/lib.rs pub mod seed;
tests/seed.rs
```
---
## Task 1: `db::seed` — Spectrum cataloguing seed
**Files:** create `crates/db/src/seed.rs`, `crates/db/tests/seed.rs`; modify `crates/db/src/lib.rs`.
- [ ] **Step 1: Write the failing test** `crates/db/tests/seed.rs`:
```rust
use db::{Db, fields, seed, vocab};
use domain::{AuthorityKind, FieldType};
use sqlx::PgPool;
#[sqlx::test]
async fn seed_creates_vocabularies_and_field_definitions(pool: PgPool) {
let db = Db::from_pool(pool);
let mut tx = db.pool().begin().await.unwrap();
seed::seed_spectrum_cataloguing(&mut tx).await.unwrap();
tx.commit().await.unwrap();
for key in ["material", "object_name", "technique"] {
assert!(
vocab::vocabulary_by_key(db.pool(), key).await.unwrap().is_some(),
"vocabulary {key} should be seeded"
);
}
// a Term field is bound to the right vocabulary
let material_vocab = vocab::vocabulary_by_key(db.pool(), "material").await.unwrap().unwrap();
let material_field = fields::field_definition_by_key(db.pool(), "material").await.unwrap().unwrap();
assert_eq!(material_field.field_type, FieldType::Term { vocabulary_id: material_vocab.id });
// an Authority field carries its kind
let place = fields::field_definition_by_key(db.pool(), "production_place").await.unwrap().unwrap();
assert_eq!(place.field_type, FieldType::Authority { kind: Some(AuthorityKind::Place) });
// a localized-text and a date field exist
let title = fields::field_definition_by_key(db.pool(), "title").await.unwrap().unwrap();
assert_eq!(title.field_type, FieldType::LocalizedText);
let date = fields::field_definition_by_key(db.pool(), "production_date").await.unwrap().unwrap();
assert_eq!(date.field_type, FieldType::Date);
assert_eq!(fields::list_field_definitions(db.pool()).await.unwrap().len(), 12);
}
#[sqlx::test]
async fn seed_is_idempotent(pool: PgPool) {
let db = Db::from_pool(pool);
for _ in 0..2 {
let mut tx = db.pool().begin().await.unwrap();
seed::seed_spectrum_cataloguing(&mut tx).await.unwrap();
tx.commit().await.unwrap();
}
// re-running did not duplicate (would have hit the UNIQUE key constraints otherwise)
assert_eq!(fields::list_field_definitions(db.pool()).await.unwrap().len(), 12);
let materials = vocab::vocabulary_by_key(db.pool(), "material").await.unwrap();
assert!(materials.is_some());
}
```
- [ ] **Step 2: Run to verify it fails.** `DATABASE_URL=<url> cargo test -p db --test seed` → FAIL (`db::seed` missing).
- [ ] **Step 3: Implement** `crates/db/src/seed.rs`:
```rust
//! Seed data: a representative subset of the Spectrum Cataloguing field set.
//!
//! Idempotent — each vocabulary and field definition is created only if a row with
//! that key does not already exist. Vocabularies are seeded empty; their terms are
//! populated by the organization or a later import. The inventory-minimum fields
//! (object number, name, location, …) live in the typed object core, not here.
use domain::{AuthorityKind, FieldType, LocalizedLabel, NewFieldDefinition, VocabularyId};
use crate::{fields, vocab};
/// Seed the Spectrum cataloguing vocabularies and field definitions on `conn`.
/// Pass a transaction connection (`&mut *tx`) so the whole seed is atomic.
pub async fn seed_spectrum_cataloguing(conn: &mut sqlx::PgConnection) -> Result<(), sqlx::Error> {
let material = ensure_vocabulary(conn, "material").await?;
let object_name = ensure_vocabulary(conn, "object_name").await?;
let technique = ensure_vocabulary(conn, "technique").await?;
let definitions = [
def("object_type", FieldType::Term { vocabulary_id: object_name }, "identification",
&[("sv", "Sakord"), ("en", "Object type")]),
def("title", FieldType::LocalizedText, "identification",
&[("sv", "Titel"), ("en", "Title")]),
def("comments", FieldType::Text, "identification",
&[("sv", "Kommentarer"), ("en", "Comments")]),
def("material", FieldType::Term { vocabulary_id: material }, "description",
&[("sv", "Material"), ("en", "Material")]),
def("technique", FieldType::Term { vocabulary_id: technique }, "description",
&[("sv", "Teknik"), ("en", "Technique")]),
def("physical_description", FieldType::Text, "description",
&[("sv", "Fysisk beskrivning"), ("en", "Physical description")]),
def("dimensions", FieldType::Text, "description",
&[("sv", "Mått"), ("en", "Dimensions")]),
def("inscription", FieldType::Text, "description",
&[("sv", "Inskription"), ("en", "Inscription")]),
def("content_description", FieldType::Text, "content",
&[("sv", "Innehållsbeskrivning"), ("en", "Content description")]),
def("production_date", FieldType::Date, "production",
&[("sv", "Tillverkningsdatum"), ("en", "Production date")]),
def("production_place", FieldType::Authority { kind: Some(AuthorityKind::Place) }, "production",
&[("sv", "Tillverkningsplats"), ("en", "Production place")]),
def("production_person", FieldType::Authority { kind: Some(AuthorityKind::Person) }, "production",
&[("sv", "Tillverkare"), ("en", "Producer")]),
];
for definition in &definitions {
ensure_field_definition(conn, definition).await?;
}
Ok(())
}
/// Get-or-create a vocabulary by key, returning its id.
async fn ensure_vocabulary(
conn: &mut sqlx::PgConnection,
key: &str,
) -> Result<VocabularyId, sqlx::Error> {
if let Some(existing) = vocab::vocabulary_by_key(&mut *conn, key).await? {
Ok(existing.id)
} else {
Ok(vocab::create_vocabulary(&mut *conn, key).await?.id)
}
}
/// Create a field definition only if its key is not already present.
async fn ensure_field_definition(
conn: &mut sqlx::PgConnection,
definition: &NewFieldDefinition,
) -> Result<(), sqlx::Error> {
if fields::field_definition_by_key(&mut *conn, &definition.key).await?.is_none() {
fields::create_field_definition(&mut *conn, definition).await?;
}
Ok(())
}
fn def(
key: &str,
field_type: FieldType,
group: &str,
label_pairs: &[(&str, &str)],
) -> NewFieldDefinition {
NewFieldDefinition {
key: key.to_owned(),
field_type,
required: false,
group_key: Some(group.to_owned()),
labels: label_pairs
.iter()
.map(|(lang, label)| LocalizedLabel { lang: (*lang).to_owned(), label: (*label).to_owned() })
.collect(),
}
}
```
Add to `crates/db/src/lib.rs` (top-level): `pub mod seed;`
- [ ] **Step 4: Run to verify it passes.** `DATABASE_URL=<url> cargo test -p db --test seed` → PASS (2 tests).
- [ ] **Step 5: Full workspace check.**
```bash
cargo +nightly fmt --check
DATABASE_URL=<url> cargo clippy --workspace --all-targets -- -D warnings
DATABASE_URL=<url> cargo test --workspace
```
Expected: all green.
- [ ] **Step 6: Commit.**
```bash
git add crates/db
git commit -m "feat(db): seed a representative Spectrum cataloguing field set (idempotent)"
```
---
## Self-Review (completed)
**Spec coverage:**
- Representative Spectrum descriptive field set as vocabularies + field definitions → the `definitions` array + `ensure_*`. ✓
- Empty vocabularies, no terms; inventory minimum stays in the core. ✓
- Idempotent (get-or-create by key) → `ensure_vocabulary`/`ensure_field_definition`; tested by `seed_is_idempotent`. ✓
- Built on existing repos; no migration/domain change; SQL stays in `db`. ✓
- Wiring deferred. ✓ (intentional)
**Placeholder scan:** none. `<url>` is the documented `DATABASE_URL`.
**Type consistency:** `seed_spectrum_cataloguing(&mut PgConnection) -> Result<(), sqlx::Error>`; uses `vocab::vocabulary_by_key`/`create_vocabulary`, `fields::field_definition_by_key`/`create_field_definition`, and `domain::{FieldType, NewFieldDefinition, LocalizedLabel, AuthorityKind, VocabularyId}` exactly as defined. The test's expected count (12) matches the `definitions` array length.
## Notes for follow-on plans
- **Wiring the seed:** options are a server `--seed`/config flag at startup, a small CLI subcommand, or running it as part of per-org provisioning (the control plane). Decide alongside the provisioning work.
- **Populating vocabulary terms:** Getty AAT / KulturNav / Wikidata import (VISION post-MVP) fills the empty `material`/`object_name`/`technique` vocabularies.
- The seeded set is a starting point — extend toward the full Spectrum unit list (`reference/spectrum-5.0-cataloguing-units-of-information.md`) as needed.