docs: add Plan 6 (Meilisearch search) implementation plan

search crate (SearchClient adapter) indexing core + flexible fields with
term/authority resolved to labels; reindex_all; on-write sync deferred to API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-02 11:39:55 +02:00
parent 5ee9fd88f1
commit 851181d91d
+464
View File
@@ -0,0 +1,464 @@
# Search (Meilisearch) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** A `search` crate that indexes catalogue objects (core + flexible fields, with term/authority values resolved to their labels) into Meilisearch and runs full-text search, plus a `reindex_all` rebuild. On-write sync orchestration is deferred to the API/service layer (Plan 7+); this plan builds the capability and `reindex_all`.
**Architecture:** A new role-named crate `search` depending on `db` + `domain` (cycle-free: `search → db → domain`). It exposes a `SearchClient` (Meilisearch adapter behind our own type, so the engine stays swappable), a `SearchDocument` (the indexed shape), `build_document` (reads `db` to resolve a `CatalogueObject`'s flexible fields to searchable text), and `reindex_all`. Search returns object ids; callers load full objects from `db`. `visibility` is a filterable attribute (for the future public API).
**Tech Stack:** Rust 2024, `meilisearch-sdk` (async client), `serde` (document), `thiserror` (SearchError), tokio. Tests run against a real Meilisearch (Docker) + Postgres.
## Design decisions (approved)
- `search` crate: `SearchClient` wrapping `meilisearch-sdk`, swappable behind our type.
- Index doc = core text + flexible values flattened to searchable text; **term/authority resolved to labels**; `localized_text` → all language strings; `visibility` filterable. Search returns object ids.
- Build the capability + `reindex_all` now; **on-write sync is wired at the API/service layer (Plan 7+)**. Eventual consistency (Meili not transactional with Postgres).
- Integration tests use a real Meilisearch in Docker, each test on a **unique index** for isolation.
## ⚠️ Implementer note on the Meilisearch SDK
The `meilisearch-sdk` API (method names, async task handling) varies by version. The **code blocks below are the intended shape**; adapt the exact SDK calls to the installed version while preserving behavior. **The tests are the contract** — make them pass. Key behaviors: indexing operations must `wait_for_completion` (Meilisearch indexes asynchronously) so a subsequent search sees the document. Verify the current `meilisearch-sdk` version via the cratesio tooling and pin it.
## Prerequisites
- Postgres (as before) AND a Meilisearch instance. The controller will start Meilisearch in Docker (e.g. `getmeili/meilisearch`) with a master key. Tests read `MEILI_URL` (e.g. `http://localhost:7700`) and `MEILI_MASTER_KEY`; pass them inline alongside `DATABASE_URL`. Pass transaction connections as `&mut tx`.
## File Structure
```
Cargo.toml + search member; meilisearch-sdk in workspace deps
crates/search/
Cargo.toml
src/lib.rs SearchError, SearchDocument, SearchClient, build_document, reindex_all
tests/search.rs (Meili only) index/search/remove
tests/reindex.rs (Meili + Postgres) build_document + reindex_all
```
---
## Task 1: `search` crate — client, document, index/search/remove
**Files:** modify root `Cargo.toml`; create `crates/search/Cargo.toml`, `crates/search/src/lib.rs`, `crates/search/tests/search.rs`.
- [ ] **Step 1: Workspace + crate setup.**
- In root `Cargo.toml`, add `"crates/search"` to `members`, and add to `[workspace.dependencies]` (verify the latest version via cratesio):
```toml
meilisearch-sdk = "0.28"
```
- Create `crates/search/Cargo.toml`:
```toml
[package]
name = "search"
version = "0.0.0"
edition.workspace = true
rust-version.workspace = true
[dependencies]
meilisearch-sdk.workspace = true
serde = { workspace = true }
thiserror.workspace = true
domain = { path = "../domain" }
db = { path = "../db" }
[dev-dependencies]
tokio.workspace = true
uuid.workspace = true
serde_json.workspace = true
sqlx.workspace = true
```
- [ ] **Step 2: Write the failing test** `crates/search/tests/search.rs` (Meilisearch only — hand-built documents, no Postgres):
```rust
use search::{SearchClient, SearchDocument};
fn meili() -> (String, String) {
(
std::env::var("MEILI_URL").expect("MEILI_URL must be set"),
std::env::var("MEILI_MASTER_KEY").expect("MEILI_MASTER_KEY must be set"),
)
}
fn unique_index() -> String {
format!("objects_test_{}", uuid::Uuid::new_v4().simple())
}
fn doc(id: &str, object_name: &str, fields_text: &[&str]) -> SearchDocument {
SearchDocument {
id: id.to_string(),
object_number: format!("N-{id}"),
object_name: object_name.to_string(),
brief_description: None,
current_owner: None,
recorder: None,
visibility: "draft".to_string(),
fields_text: fields_text.iter().map(|s| s.to_string()).collect(),
}
}
#[tokio::test]
async fn index_search_and_remove() {
let (url, key) = meili();
let client = SearchClient::connect(&url, &key, &unique_index()).unwrap();
client.ensure_index().await.unwrap();
let vase = domain::ObjectId::new();
let chair = domain::ObjectId::new();
client.index_object(&doc(&vase.to_string(), "vase", &["wood", "trä"])).await.unwrap();
client.index_object(&doc(&chair.to_string(), "chair", &["oak"])).await.unwrap();
// full-text on a flexible value
let hits = client.search("wood").await.unwrap();
assert_eq!(hits, vec![vase]);
// full-text on the object name
let hits = client.search("chair").await.unwrap();
assert_eq!(hits, vec![chair]);
// remove
client.remove_object(vase).await.unwrap();
assert!(client.search("wood").await.unwrap().is_empty());
}
```
- [ ] **Step 3: Run to verify it fails.** `MEILI_URL=<url> MEILI_MASTER_KEY=<key> cargo test -p search --test search` → FAIL (crate/types missing).
- [ ] **Step 4: Implement** `crates/search/src/lib.rs` (adapt the SDK calls to the installed version; keep behavior + signatures):
```rust
//! Full-text search over catalogue objects, backed by Meilisearch.
use db::Db;
use domain::{CatalogueObject, ObjectId};
use serde::{Deserialize, Serialize};
/// Errors from the search subsystem.
#[derive(Debug, thiserror::Error)]
pub enum SearchError {
#[error(transparent)]
Meili(#[from] meilisearch_sdk::errors::Error),
#[error(transparent)]
Db(#[from] sqlx::Error),
#[error("invalid object id in index: {0}")]
BadId(String),
}
/// The indexed shape of a catalogue object.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SearchDocument {
pub id: String,
pub object_number: String,
pub object_name: String,
pub brief_description: Option<String>,
pub current_owner: Option<String>,
pub recorder: Option<String>,
/// Filterable: "draft" | "internal" | "public".
pub visibility: String,
/// Flexible field values flattened to searchable text (term/authority labels,
/// localized strings, and scalar values).
pub fields_text: Vec<String>,
}
/// A Meilisearch-backed search client scoped to one index.
pub struct SearchClient {
client: meilisearch_sdk::client::Client,
index_uid: String,
}
impl SearchClient {
/// Connect to Meilisearch at `url` with `api_key`, scoped to `index_uid`.
pub fn connect(url: &str, api_key: &str, index_uid: &str) -> Result<Self, SearchError> {
let client = meilisearch_sdk::client::Client::new(url, Some(api_key))?;
Ok(Self { client, index_uid: index_uid.to_owned() })
}
/// Create the index (primary key "id") if absent and set filterable attributes.
pub async fn ensure_index(&self) -> Result<(), SearchError> {
// Create the index if it doesn't exist (ignore "index already exists").
let task = self.client.create_index(&self.index_uid, Some("id")).await?;
task.wait_for_completion(&self.client, None, None).await?;
let index = self.client.index(&self.index_uid);
index
.set_filterable_attributes(["visibility"])
.await?
.wait_for_completion(&self.client, None, None)
.await?;
Ok(())
}
/// Upsert one object document (waits for indexing to complete).
pub async fn index_object(&self, doc: &SearchDocument) -> Result<(), SearchError> {
self.client
.index(&self.index_uid)
.add_or_replace_documents(std::slice::from_ref(doc), Some("id"))
.await?
.wait_for_completion(&self.client, None, None)
.await?;
Ok(())
}
/// Remove one object from the index by id (waits for completion).
pub async fn remove_object(&self, id: ObjectId) -> Result<(), SearchError> {
self.client
.index(&self.index_uid)
.delete_document(id.to_string())
.await?
.wait_for_completion(&self.client, None, None)
.await?;
Ok(())
}
/// Full-text search; returns matching object ids (in Meilisearch ranking order).
pub async fn search(&self, query: &str) -> Result<Vec<ObjectId>, SearchError> {
let results = self
.client
.index(&self.index_uid)
.search()
.with_query(query)
.execute::<SearchDocument>()
.await?;
results
.hits
.into_iter()
.map(|hit| hit.result.id.parse::<ObjectId>().map_err(|_| SearchError::BadId(hit.result.id)))
.collect()
}
/// Rebuild the whole index from the database (clears then re-adds all objects).
pub async fn reindex_all(&self, db: &Db) -> Result<(), SearchError> {
let index = self.client.index(&self.index_uid);
index.delete_all_documents().await?.wait_for_completion(&self.client, None, None).await?;
let objects = db::catalog::list_objects(db.pool()).await?;
let mut docs = Vec::with_capacity(objects.len());
for object in &objects {
docs.push(build_document(db, object).await?);
}
if !docs.is_empty() {
index
.add_or_replace_documents(&docs, Some("id"))
.await?
.wait_for_completion(&self.client, None, None)
.await?;
}
Ok(())
}
}
/// Build a [`SearchDocument`] from an object, resolving its flexible fields to
/// searchable text (term/authority → labels, localized text → all values).
/// Implemented in Task 2; declared here so the crate compiles.
pub async fn build_document(
_db: &Db,
_object: &CatalogueObject,
) -> Result<SearchDocument, SearchError> {
unimplemented!("implemented in Task 2")
}
```
NOTE: `ObjectId: FromStr` (Err = `uuid::Error`) exists from the id macro. `reindex_all`/`build_document` are needed for compilation now (Task 1 test doesn't call them) — `build_document` is a stub `unimplemented!()` filled in Task 2. If clippy flags the stub's unused params, the leading underscores suppress that; if it flags `unimplemented!` in a non-test fn, add `#[allow(clippy::unimplemented)]` to `build_document` with a `// Task 2` note, OR move `reindex_all`+`build_document` entirely into Task 2 (preferred if it keeps Task 1 clippy-clean — in that case omit them here and add `pub mod`-level items in Task 2).
- [ ] **Step 5: Run to verify it passes.** `MEILI_URL=<url> MEILI_MASTER_KEY=<key> cargo test -p search --test search` → PASS. (You may need to adapt SDK calls; iterate until the test passes.)
- [ ] **Step 6: Lint.** `cargo +nightly fmt`; `cargo clippy -p search --all-targets -- -D warnings` → clean.
- [ ] **Step 7: Commit.**
```bash
git add Cargo.toml crates/search
git commit -m "feat(search): add Meilisearch-backed SearchClient (index, search, remove)"
```
---
## Task 2: `build_document` + `reindex_all` (db integration)
**Files:** modify `crates/search/src/lib.rs`; create `crates/search/tests/reindex.rs`.
- [ ] **Step 1: Write the failing test** `crates/search/tests/reindex.rs` (Meilisearch + Postgres):
```rust
use db::{Db, catalog, fields, vocab};
use domain::{
AuditActor, FieldType, LocalizedLabel, NewFieldDefinition, NewTerm, ObjectInput, Visibility,
};
use search::SearchClient;
use sqlx::PgPool;
fn meili() -> (String, String) {
(
std::env::var("MEILI_URL").expect("MEILI_URL must be set"),
std::env::var("MEILI_MASTER_KEY").expect("MEILI_MASTER_KEY must be set"),
)
}
fn unique_index() -> String {
format!("reindex_test_{}", uuid::Uuid::new_v4().simple())
}
#[sqlx::test]
async fn reindex_resolves_term_labels_and_finds_by_label(pool: PgPool) {
let db = Db::from_pool(pool);
// a material vocabulary with a "wood" term
let material = vocab::create_vocabulary(db.pool(), "material").await.unwrap();
let mut tx = db.pool().begin().await.unwrap();
let wood = vocab::add_term(
&mut tx,
&NewTerm {
vocabulary_id: material.id,
external_uri: None,
labels: vec![LocalizedLabel { lang: "en".into(), label: "wood".into() }],
},
)
.await
.unwrap();
fields::create_field_definition(
&mut tx,
&NewFieldDefinition {
key: "material".into(),
field_type: FieldType::Term { vocabulary_id: material.id },
required: false,
group_key: None,
labels: vec![LocalizedLabel { lang: "en".into(), label: "material".into() }],
},
)
.await
.unwrap();
let object_id = catalog::create_object(
&mut tx,
AuditActor::System,
&ObjectInput {
object_number: "LM-1".into(),
object_name: "vase".into(),
number_of_objects: 1,
brief_description: None,
current_location: None,
current_owner: None,
recorder: None,
recording_date: None,
visibility: Visibility::Public,
},
)
.await
.unwrap();
tx.commit().await.unwrap();
// set the material field to the wood term
let mut tx = db.pool().begin().await.unwrap();
catalog::set_object_fields(
&mut tx,
AuditActor::System,
object_id,
serde_json::json!({ "material": wood.to_string() }).as_object().unwrap(),
)
.await
.unwrap();
tx.commit().await.unwrap();
let (url, key) = meili();
let client = SearchClient::connect(&url, &key, &unique_index()).unwrap();
client.ensure_index().await.unwrap();
client.reindex_all(&db).await.unwrap();
// found by the object name
assert_eq!(client.search("vase").await.unwrap(), vec![object_id]);
// found by the resolved TERM LABEL (not the uuid)
assert_eq!(client.search("wood").await.unwrap(), vec![object_id]);
}
```
- [ ] **Step 2: Run to verify it fails.** With both env vars + `DATABASE_URL`: `... cargo test -p search --test reindex` → FAIL (`build_document` is `unimplemented!`).
- [ ] **Step 3: Implement `build_document`** in `crates/search/src/lib.rs` — replace the stub body with a real implementation that flattens the object's flexible fields to searchable text, resolving term/authority values to labels:
```rust
pub async fn build_document(
db: &Db,
object: &CatalogueObject,
) -> Result<SearchDocument, SearchError> {
let mut fields_text = Vec::new();
if let Some(map) = object.fields.as_object() {
for (key, value) in map {
let Some(def) = db::fields::field_definition_by_key(db.pool(), key).await? else {
continue; // a field with no definition (stale) — skip
};
match def.field_type {
domain::FieldType::Text | domain::FieldType::Date => {
if let Some(s) = value.as_str() {
fields_text.push(s.to_owned());
}
}
domain::FieldType::Integer | domain::FieldType::Boolean => {
fields_text.push(value.to_string());
}
domain::FieldType::LocalizedText => {
if let Some(obj) = value.as_object() {
for v in obj.values() {
if let Some(s) = v.as_str() {
fields_text.push(s.to_owned());
}
}
}
}
domain::FieldType::Term { .. } => {
if let Some(term_id) = value.as_str().and_then(|s| s.parse().ok()) {
if let Some(term) = db::vocab::term_by_id(db.pool(), term_id).await? {
fields_text.extend(term.labels.into_iter().map(|l| l.label));
}
}
}
domain::FieldType::Authority { .. } => {
if let Some(authority_id) = value.as_str().and_then(|s| s.parse().ok()) {
if let Some(authority) =
db::authority::authority_by_id(db.pool(), authority_id).await?
{
fields_text.extend(authority.labels.into_iter().map(|l| l.label));
}
}
}
}
}
}
Ok(SearchDocument {
id: object.id.to_string(),
object_number: object.object_number.clone(),
object_name: object.object_name.clone(),
brief_description: object.brief_description.clone(),
current_owner: object.current_owner.clone(),
recorder: object.recorder.clone(),
visibility: object.visibility.as_str().to_owned(),
fields_text,
})
}
```
(`db::vocab::term_by_id` takes a `TermId`; `db::authority::authority_by_id` takes an `AuthorityId` — `value.as_str().and_then(|s| s.parse().ok())` parses into the inferred id type. If type inference needs help, annotate: `let term_id: domain::TermId = ...`.)
- [ ] **Step 4: Run to verify it passes.** `MEILI_URL=<url> MEILI_MASTER_KEY=<key> DATABASE_URL=<url> cargo test -p search --test reindex` → PASS.
- [ ] **Step 5: Full workspace check.**
```bash
cargo +nightly fmt --check
DATABASE_URL=<url> MEILI_URL=<url> MEILI_MASTER_KEY=<key> cargo clippy --workspace --all-targets -- -D warnings
DATABASE_URL=<url> MEILI_URL=<url> MEILI_MASTER_KEY=<key> cargo test --workspace
```
Expected: all green. (The `search` tests need the MEILI env vars; the rest need `DATABASE_URL`.)
- [ ] **Step 6: Commit.**
```bash
git add crates/search
git commit -m "feat(search): build documents resolving term/authority labels; reindex_all"
```
---
## Self-Review (completed)
**Spec coverage (Plan 6 / VISION search MVP):**
- `search` crate, Meilisearch adapter behind `SearchClient`, swappable → Task 1. ✓
- Index core + flexible text; term/authority resolved to labels; localized → all values; visibility filterable; search returns object ids → Tasks 12. ✓
- Build capability + `reindex_all` now; on-write sync deferred to API/service → this plan + notes. ✓
- `search → db → domain` (no cycle); SQL stays in `db` (search calls db repos) → Cargo deps. ✓
- Real-Meili integration tests, unique index per test → Tasks 12. ✓
**Placeholder scan:** the only `unimplemented!` is the Task 1 `build_document` stub, explicitly filled in Task 2 (with a fallback instruction). `<url>`/`<key>` are documented env values. No other placeholders.
**Type consistency:** `SearchDocument` fields used identically in tests + `build_document`; `SearchClient::{connect, ensure_index, index_object, remove_object, search, reindex_all}` signatures consistent across tasks/tests; `search` returns `Vec<ObjectId>` parsed via `ObjectId: FromStr`; `build_document` matches on `domain::FieldType` (Plan 4) and calls `db::vocab::term_by_id`/`db::authority::authority_by_id`/`db::fields::field_definition_by_key`/`db::catalog::list_objects` as defined.
## Notes for follow-on plans
- **On-write sync (API/service, Plan 7+):** after a catalogue create/update/delete/set_fields commits, call `index_object`/`remove_object` best-effort (log failures; `reindex_all` is the recovery path). Meili is not transactional with Postgres — eventual consistency.
- **Public API (Plan 7):** `search` already stores `visibility` as filterable; add a `with_filter("visibility = public")` search variant for the public surface.
- **Per-deployment index/credentials:** production uses a fixed index uid (e.g. `objects`) with a scoped Meili key per the single-tenant deployment; only tests use unique index names.
- **Reindex cost:** `reindex_all` is N+1 over objects×fields (resolves labels per field) — fine for now; batch when collections grow (relates to #12).