From 851181d91d3fc4b284ae886e0985ebf950841b47 Mon Sep 17 00:00:00 2001 From: Anders Olsson Date: Tue, 2 Jun 2026 11:39:55 +0200 Subject: [PATCH] docs: add Plan 6 (Meilisearch search) implementation plan search crate (SearchClient adapter) indexing core + flexible fields with term/authority resolved to labels; reindex_all; on-write sync deferred to API. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/plans/2026-06-02-search.md | 464 ++++++++++++++++++++++++++++++++ 1 file changed, 464 insertions(+) create mode 100644 docs/plans/2026-06-02-search.md diff --git a/docs/plans/2026-06-02-search.md b/docs/plans/2026-06-02-search.md new file mode 100644 index 0000000..9083599 --- /dev/null +++ b/docs/plans/2026-06-02-search.md @@ -0,0 +1,464 @@ +# Search (Meilisearch) Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** A `search` crate that indexes catalogue objects (core + flexible fields, with term/authority values resolved to their labels) into Meilisearch and runs full-text search, plus a `reindex_all` rebuild. On-write sync orchestration is deferred to the API/service layer (Plan 7+); this plan builds the capability and `reindex_all`. + +**Architecture:** A new role-named crate `search` depending on `db` + `domain` (cycle-free: `search → db → domain`). It exposes a `SearchClient` (Meilisearch adapter behind our own type, so the engine stays swappable), a `SearchDocument` (the indexed shape), `build_document` (reads `db` to resolve a `CatalogueObject`'s flexible fields to searchable text), and `reindex_all`. Search returns object ids; callers load full objects from `db`. `visibility` is a filterable attribute (for the future public API). + +**Tech Stack:** Rust 2024, `meilisearch-sdk` (async client), `serde` (document), `thiserror` (SearchError), tokio. Tests run against a real Meilisearch (Docker) + Postgres. + +## Design decisions (approved) +- `search` crate: `SearchClient` wrapping `meilisearch-sdk`, swappable behind our type. +- Index doc = core text + flexible values flattened to searchable text; **term/authority resolved to labels**; `localized_text` → all language strings; `visibility` filterable. Search returns object ids. +- Build the capability + `reindex_all` now; **on-write sync is wired at the API/service layer (Plan 7+)**. Eventual consistency (Meili not transactional with Postgres). +- Integration tests use a real Meilisearch in Docker, each test on a **unique index** for isolation. + +## ⚠️ Implementer note on the Meilisearch SDK +The `meilisearch-sdk` API (method names, async task handling) varies by version. The **code blocks below are the intended shape**; adapt the exact SDK calls to the installed version while preserving behavior. **The tests are the contract** — make them pass. Key behaviors: indexing operations must `wait_for_completion` (Meilisearch indexes asynchronously) so a subsequent search sees the document. Verify the current `meilisearch-sdk` version via the cratesio tooling and pin it. + +## Prerequisites +- Postgres (as before) AND a Meilisearch instance. The controller will start Meilisearch in Docker (e.g. `getmeili/meilisearch`) with a master key. Tests read `MEILI_URL` (e.g. `http://localhost:7700`) and `MEILI_MASTER_KEY`; pass them inline alongside `DATABASE_URL`. Pass transaction connections as `&mut tx`. + +## File Structure +``` +Cargo.toml + search member; meilisearch-sdk in workspace deps +crates/search/ + Cargo.toml + src/lib.rs SearchError, SearchDocument, SearchClient, build_document, reindex_all + tests/search.rs (Meili only) index/search/remove + tests/reindex.rs (Meili + Postgres) build_document + reindex_all +``` + +--- + +## Task 1: `search` crate — client, document, index/search/remove + +**Files:** modify root `Cargo.toml`; create `crates/search/Cargo.toml`, `crates/search/src/lib.rs`, `crates/search/tests/search.rs`. + +- [ ] **Step 1: Workspace + crate setup.** + - In root `Cargo.toml`, add `"crates/search"` to `members`, and add to `[workspace.dependencies]` (verify the latest version via cratesio): + ```toml + meilisearch-sdk = "0.28" + ``` + - Create `crates/search/Cargo.toml`: + ```toml + [package] + name = "search" + version = "0.0.0" + edition.workspace = true + rust-version.workspace = true + + [dependencies] + meilisearch-sdk.workspace = true + serde = { workspace = true } + thiserror.workspace = true + domain = { path = "../domain" } + db = { path = "../db" } + + [dev-dependencies] + tokio.workspace = true + uuid.workspace = true + serde_json.workspace = true + sqlx.workspace = true + ``` + +- [ ] **Step 2: Write the failing test** `crates/search/tests/search.rs` (Meilisearch only — hand-built documents, no Postgres): +```rust +use search::{SearchClient, SearchDocument}; + +fn meili() -> (String, String) { + ( + std::env::var("MEILI_URL").expect("MEILI_URL must be set"), + std::env::var("MEILI_MASTER_KEY").expect("MEILI_MASTER_KEY must be set"), + ) +} + +fn unique_index() -> String { + format!("objects_test_{}", uuid::Uuid::new_v4().simple()) +} + +fn doc(id: &str, object_name: &str, fields_text: &[&str]) -> SearchDocument { + SearchDocument { + id: id.to_string(), + object_number: format!("N-{id}"), + object_name: object_name.to_string(), + brief_description: None, + current_owner: None, + recorder: None, + visibility: "draft".to_string(), + fields_text: fields_text.iter().map(|s| s.to_string()).collect(), + } +} + +#[tokio::test] +async fn index_search_and_remove() { + let (url, key) = meili(); + let client = SearchClient::connect(&url, &key, &unique_index()).unwrap(); + client.ensure_index().await.unwrap(); + + let vase = domain::ObjectId::new(); + let chair = domain::ObjectId::new(); + client.index_object(&doc(&vase.to_string(), "vase", &["wood", "trä"])).await.unwrap(); + client.index_object(&doc(&chair.to_string(), "chair", &["oak"])).await.unwrap(); + + // full-text on a flexible value + let hits = client.search("wood").await.unwrap(); + assert_eq!(hits, vec![vase]); + + // full-text on the object name + let hits = client.search("chair").await.unwrap(); + assert_eq!(hits, vec![chair]); + + // remove + client.remove_object(vase).await.unwrap(); + assert!(client.search("wood").await.unwrap().is_empty()); +} +``` + +- [ ] **Step 3: Run to verify it fails.** `MEILI_URL= MEILI_MASTER_KEY= cargo test -p search --test search` → FAIL (crate/types missing). + +- [ ] **Step 4: Implement** `crates/search/src/lib.rs` (adapt the SDK calls to the installed version; keep behavior + signatures): +```rust +//! Full-text search over catalogue objects, backed by Meilisearch. + +use db::Db; +use domain::{CatalogueObject, ObjectId}; +use serde::{Deserialize, Serialize}; + +/// Errors from the search subsystem. +#[derive(Debug, thiserror::Error)] +pub enum SearchError { + #[error(transparent)] + Meili(#[from] meilisearch_sdk::errors::Error), + #[error(transparent)] + Db(#[from] sqlx::Error), + #[error("invalid object id in index: {0}")] + BadId(String), +} + +/// The indexed shape of a catalogue object. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SearchDocument { + pub id: String, + pub object_number: String, + pub object_name: String, + pub brief_description: Option, + pub current_owner: Option, + pub recorder: Option, + /// Filterable: "draft" | "internal" | "public". + pub visibility: String, + /// Flexible field values flattened to searchable text (term/authority labels, + /// localized strings, and scalar values). + pub fields_text: Vec, +} + +/// A Meilisearch-backed search client scoped to one index. +pub struct SearchClient { + client: meilisearch_sdk::client::Client, + index_uid: String, +} + +impl SearchClient { + /// Connect to Meilisearch at `url` with `api_key`, scoped to `index_uid`. + pub fn connect(url: &str, api_key: &str, index_uid: &str) -> Result { + let client = meilisearch_sdk::client::Client::new(url, Some(api_key))?; + Ok(Self { client, index_uid: index_uid.to_owned() }) + } + + /// Create the index (primary key "id") if absent and set filterable attributes. + pub async fn ensure_index(&self) -> Result<(), SearchError> { + // Create the index if it doesn't exist (ignore "index already exists"). + let task = self.client.create_index(&self.index_uid, Some("id")).await?; + task.wait_for_completion(&self.client, None, None).await?; + let index = self.client.index(&self.index_uid); + index + .set_filterable_attributes(["visibility"]) + .await? + .wait_for_completion(&self.client, None, None) + .await?; + Ok(()) + } + + /// Upsert one object document (waits for indexing to complete). + pub async fn index_object(&self, doc: &SearchDocument) -> Result<(), SearchError> { + self.client + .index(&self.index_uid) + .add_or_replace_documents(std::slice::from_ref(doc), Some("id")) + .await? + .wait_for_completion(&self.client, None, None) + .await?; + Ok(()) + } + + /// Remove one object from the index by id (waits for completion). + pub async fn remove_object(&self, id: ObjectId) -> Result<(), SearchError> { + self.client + .index(&self.index_uid) + .delete_document(id.to_string()) + .await? + .wait_for_completion(&self.client, None, None) + .await?; + Ok(()) + } + + /// Full-text search; returns matching object ids (in Meilisearch ranking order). + pub async fn search(&self, query: &str) -> Result, SearchError> { + let results = self + .client + .index(&self.index_uid) + .search() + .with_query(query) + .execute::() + .await?; + results + .hits + .into_iter() + .map(|hit| hit.result.id.parse::().map_err(|_| SearchError::BadId(hit.result.id))) + .collect() + } + + /// Rebuild the whole index from the database (clears then re-adds all objects). + pub async fn reindex_all(&self, db: &Db) -> Result<(), SearchError> { + let index = self.client.index(&self.index_uid); + index.delete_all_documents().await?.wait_for_completion(&self.client, None, None).await?; + + let objects = db::catalog::list_objects(db.pool()).await?; + let mut docs = Vec::with_capacity(objects.len()); + for object in &objects { + docs.push(build_document(db, object).await?); + } + if !docs.is_empty() { + index + .add_or_replace_documents(&docs, Some("id")) + .await? + .wait_for_completion(&self.client, None, None) + .await?; + } + Ok(()) + } +} + +/// Build a [`SearchDocument`] from an object, resolving its flexible fields to +/// searchable text (term/authority → labels, localized text → all values). +/// Implemented in Task 2; declared here so the crate compiles. +pub async fn build_document( + _db: &Db, + _object: &CatalogueObject, +) -> Result { + unimplemented!("implemented in Task 2") +} +``` +NOTE: `ObjectId: FromStr` (Err = `uuid::Error`) exists from the id macro. `reindex_all`/`build_document` are needed for compilation now (Task 1 test doesn't call them) — `build_document` is a stub `unimplemented!()` filled in Task 2. If clippy flags the stub's unused params, the leading underscores suppress that; if it flags `unimplemented!` in a non-test fn, add `#[allow(clippy::unimplemented)]` to `build_document` with a `// Task 2` note, OR move `reindex_all`+`build_document` entirely into Task 2 (preferred if it keeps Task 1 clippy-clean — in that case omit them here and add `pub mod`-level items in Task 2). + +- [ ] **Step 5: Run to verify it passes.** `MEILI_URL= MEILI_MASTER_KEY= cargo test -p search --test search` → PASS. (You may need to adapt SDK calls; iterate until the test passes.) + +- [ ] **Step 6: Lint.** `cargo +nightly fmt`; `cargo clippy -p search --all-targets -- -D warnings` → clean. + +- [ ] **Step 7: Commit.** +```bash +git add Cargo.toml crates/search +git commit -m "feat(search): add Meilisearch-backed SearchClient (index, search, remove)" +``` + +--- + +## Task 2: `build_document` + `reindex_all` (db integration) + +**Files:** modify `crates/search/src/lib.rs`; create `crates/search/tests/reindex.rs`. + +- [ ] **Step 1: Write the failing test** `crates/search/tests/reindex.rs` (Meilisearch + Postgres): +```rust +use db::{Db, catalog, fields, vocab}; +use domain::{ + AuditActor, FieldType, LocalizedLabel, NewFieldDefinition, NewTerm, ObjectInput, Visibility, +}; +use search::SearchClient; +use sqlx::PgPool; + +fn meili() -> (String, String) { + ( + std::env::var("MEILI_URL").expect("MEILI_URL must be set"), + std::env::var("MEILI_MASTER_KEY").expect("MEILI_MASTER_KEY must be set"), + ) +} + +fn unique_index() -> String { + format!("reindex_test_{}", uuid::Uuid::new_v4().simple()) +} + +#[sqlx::test] +async fn reindex_resolves_term_labels_and_finds_by_label(pool: PgPool) { + let db = Db::from_pool(pool); + + // a material vocabulary with a "wood" term + let material = vocab::create_vocabulary(db.pool(), "material").await.unwrap(); + let mut tx = db.pool().begin().await.unwrap(); + let wood = vocab::add_term( + &mut tx, + &NewTerm { + vocabulary_id: material.id, + external_uri: None, + labels: vec![LocalizedLabel { lang: "en".into(), label: "wood".into() }], + }, + ) + .await + .unwrap(); + fields::create_field_definition( + &mut tx, + &NewFieldDefinition { + key: "material".into(), + field_type: FieldType::Term { vocabulary_id: material.id }, + required: false, + group_key: None, + labels: vec![LocalizedLabel { lang: "en".into(), label: "material".into() }], + }, + ) + .await + .unwrap(); + let object_id = catalog::create_object( + &mut tx, + AuditActor::System, + &ObjectInput { + object_number: "LM-1".into(), + object_name: "vase".into(), + number_of_objects: 1, + brief_description: None, + current_location: None, + current_owner: None, + recorder: None, + recording_date: None, + visibility: Visibility::Public, + }, + ) + .await + .unwrap(); + tx.commit().await.unwrap(); + + // set the material field to the wood term + let mut tx = db.pool().begin().await.unwrap(); + catalog::set_object_fields( + &mut tx, + AuditActor::System, + object_id, + serde_json::json!({ "material": wood.to_string() }).as_object().unwrap(), + ) + .await + .unwrap(); + tx.commit().await.unwrap(); + + let (url, key) = meili(); + let client = SearchClient::connect(&url, &key, &unique_index()).unwrap(); + client.ensure_index().await.unwrap(); + client.reindex_all(&db).await.unwrap(); + + // found by the object name + assert_eq!(client.search("vase").await.unwrap(), vec![object_id]); + // found by the resolved TERM LABEL (not the uuid) + assert_eq!(client.search("wood").await.unwrap(), vec![object_id]); +} +``` + +- [ ] **Step 2: Run to verify it fails.** With both env vars + `DATABASE_URL`: `... cargo test -p search --test reindex` → FAIL (`build_document` is `unimplemented!`). + +- [ ] **Step 3: Implement `build_document`** in `crates/search/src/lib.rs` — replace the stub body with a real implementation that flattens the object's flexible fields to searchable text, resolving term/authority values to labels: +```rust +pub async fn build_document( + db: &Db, + object: &CatalogueObject, +) -> Result { + let mut fields_text = Vec::new(); + + if let Some(map) = object.fields.as_object() { + for (key, value) in map { + let Some(def) = db::fields::field_definition_by_key(db.pool(), key).await? else { + continue; // a field with no definition (stale) — skip + }; + match def.field_type { + domain::FieldType::Text | domain::FieldType::Date => { + if let Some(s) = value.as_str() { + fields_text.push(s.to_owned()); + } + } + domain::FieldType::Integer | domain::FieldType::Boolean => { + fields_text.push(value.to_string()); + } + domain::FieldType::LocalizedText => { + if let Some(obj) = value.as_object() { + for v in obj.values() { + if let Some(s) = v.as_str() { + fields_text.push(s.to_owned()); + } + } + } + } + domain::FieldType::Term { .. } => { + if let Some(term_id) = value.as_str().and_then(|s| s.parse().ok()) { + if let Some(term) = db::vocab::term_by_id(db.pool(), term_id).await? { + fields_text.extend(term.labels.into_iter().map(|l| l.label)); + } + } + } + domain::FieldType::Authority { .. } => { + if let Some(authority_id) = value.as_str().and_then(|s| s.parse().ok()) { + if let Some(authority) = + db::authority::authority_by_id(db.pool(), authority_id).await? + { + fields_text.extend(authority.labels.into_iter().map(|l| l.label)); + } + } + } + } + } + } + + Ok(SearchDocument { + id: object.id.to_string(), + object_number: object.object_number.clone(), + object_name: object.object_name.clone(), + brief_description: object.brief_description.clone(), + current_owner: object.current_owner.clone(), + recorder: object.recorder.clone(), + visibility: object.visibility.as_str().to_owned(), + fields_text, + }) +} +``` +(`db::vocab::term_by_id` takes a `TermId`; `db::authority::authority_by_id` takes an `AuthorityId` — `value.as_str().and_then(|s| s.parse().ok())` parses into the inferred id type. If type inference needs help, annotate: `let term_id: domain::TermId = ...`.) + +- [ ] **Step 4: Run to verify it passes.** `MEILI_URL= MEILI_MASTER_KEY= DATABASE_URL= cargo test -p search --test reindex` → PASS. + +- [ ] **Step 5: Full workspace check.** +```bash +cargo +nightly fmt --check +DATABASE_URL= MEILI_URL= MEILI_MASTER_KEY= cargo clippy --workspace --all-targets -- -D warnings +DATABASE_URL= MEILI_URL= MEILI_MASTER_KEY= cargo test --workspace +``` +Expected: all green. (The `search` tests need the MEILI env vars; the rest need `DATABASE_URL`.) + +- [ ] **Step 6: Commit.** +```bash +git add crates/search +git commit -m "feat(search): build documents resolving term/authority labels; reindex_all" +``` + +--- + +## Self-Review (completed) + +**Spec coverage (Plan 6 / VISION search MVP):** +- `search` crate, Meilisearch adapter behind `SearchClient`, swappable → Task 1. ✓ +- Index core + flexible text; term/authority resolved to labels; localized → all values; visibility filterable; search returns object ids → Tasks 1–2. ✓ +- Build capability + `reindex_all` now; on-write sync deferred to API/service → this plan + notes. ✓ +- `search → db → domain` (no cycle); SQL stays in `db` (search calls db repos) → Cargo deps. ✓ +- Real-Meili integration tests, unique index per test → Tasks 1–2. ✓ + +**Placeholder scan:** the only `unimplemented!` is the Task 1 `build_document` stub, explicitly filled in Task 2 (with a fallback instruction). ``/`` are documented env values. No other placeholders. + +**Type consistency:** `SearchDocument` fields used identically in tests + `build_document`; `SearchClient::{connect, ensure_index, index_object, remove_object, search, reindex_all}` signatures consistent across tasks/tests; `search` returns `Vec` parsed via `ObjectId: FromStr`; `build_document` matches on `domain::FieldType` (Plan 4) and calls `db::vocab::term_by_id`/`db::authority::authority_by_id`/`db::fields::field_definition_by_key`/`db::catalog::list_objects` as defined. + +## Notes for follow-on plans +- **On-write sync (API/service, Plan 7+):** after a catalogue create/update/delete/set_fields commits, call `index_object`/`remove_object` best-effort (log failures; `reindex_all` is the recovery path). Meili is not transactional with Postgres — eventual consistency. +- **Public API (Plan 7):** `search` already stores `visibility` as filterable; add a `with_filter("visibility = public")` search variant for the public surface. +- **Per-deployment index/credentials:** production uses a fixed index uid (e.g. `objects`) with a scoped Meili key per the single-tenant deployment; only tests use unique index names. +- **Reindex cost:** `reindex_all` is N+1 over objects×fields (resolves labels per field) — fine for now; batch when collections grow (relates to #12).