Files
biggus-dickus/docs/plans/2026-06-02-search.md
T
logaritmisk 851181d91d docs: add Plan 6 (Meilisearch search) implementation plan
search crate (SearchClient adapter) indexing core + flexible fields with
term/authority resolved to labels; reindex_all; on-write sync deferred to API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:39:55 +02:00

465 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Search (Meilisearch) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** A `search` crate that indexes catalogue objects (core + flexible fields, with term/authority values resolved to their labels) into Meilisearch and runs full-text search, plus a `reindex_all` rebuild. On-write sync orchestration is deferred to the API/service layer (Plan 7+); this plan builds the capability and `reindex_all`.
**Architecture:** A new role-named crate `search` depending on `db` + `domain` (cycle-free: `search → db → domain`). It exposes a `SearchClient` (Meilisearch adapter behind our own type, so the engine stays swappable), a `SearchDocument` (the indexed shape), `build_document` (reads `db` to resolve a `CatalogueObject`'s flexible fields to searchable text), and `reindex_all`. Search returns object ids; callers load full objects from `db`. `visibility` is a filterable attribute (for the future public API).
**Tech Stack:** Rust 2024, `meilisearch-sdk` (async client), `serde` (document), `thiserror` (SearchError), tokio. Tests run against a real Meilisearch (Docker) + Postgres.
## Design decisions (approved)
- `search` crate: `SearchClient` wrapping `meilisearch-sdk`, swappable behind our type.
- Index doc = core text + flexible values flattened to searchable text; **term/authority resolved to labels**; `localized_text` → all language strings; `visibility` filterable. Search returns object ids.
- Build the capability + `reindex_all` now; **on-write sync is wired at the API/service layer (Plan 7+)**. Eventual consistency (Meili not transactional with Postgres).
- Integration tests use a real Meilisearch in Docker, each test on a **unique index** for isolation.
## ⚠️ Implementer note on the Meilisearch SDK
The `meilisearch-sdk` API (method names, async task handling) varies by version. The **code blocks below are the intended shape**; adapt the exact SDK calls to the installed version while preserving behavior. **The tests are the contract** — make them pass. Key behaviors: indexing operations must `wait_for_completion` (Meilisearch indexes asynchronously) so a subsequent search sees the document. Verify the current `meilisearch-sdk` version via the cratesio tooling and pin it.
## Prerequisites
- Postgres (as before) AND a Meilisearch instance. The controller will start Meilisearch in Docker (e.g. `getmeili/meilisearch`) with a master key. Tests read `MEILI_URL` (e.g. `http://localhost:7700`) and `MEILI_MASTER_KEY`; pass them inline alongside `DATABASE_URL`. Pass transaction connections as `&mut tx`.
## File Structure
```
Cargo.toml + search member; meilisearch-sdk in workspace deps
crates/search/
Cargo.toml
src/lib.rs SearchError, SearchDocument, SearchClient, build_document, reindex_all
tests/search.rs (Meili only) index/search/remove
tests/reindex.rs (Meili + Postgres) build_document + reindex_all
```
---
## Task 1: `search` crate — client, document, index/search/remove
**Files:** modify root `Cargo.toml`; create `crates/search/Cargo.toml`, `crates/search/src/lib.rs`, `crates/search/tests/search.rs`.
- [ ] **Step 1: Workspace + crate setup.**
- In root `Cargo.toml`, add `"crates/search"` to `members`, and add to `[workspace.dependencies]` (verify the latest version via cratesio):
```toml
meilisearch-sdk = "0.28"
```
- Create `crates/search/Cargo.toml`:
```toml
[package]
name = "search"
version = "0.0.0"
edition.workspace = true
rust-version.workspace = true
[dependencies]
meilisearch-sdk.workspace = true
serde = { workspace = true }
thiserror.workspace = true
domain = { path = "../domain" }
db = { path = "../db" }
[dev-dependencies]
tokio.workspace = true
uuid.workspace = true
serde_json.workspace = true
sqlx.workspace = true
```
- [ ] **Step 2: Write the failing test** `crates/search/tests/search.rs` (Meilisearch only — hand-built documents, no Postgres):
```rust
use search::{SearchClient, SearchDocument};
fn meili() -> (String, String) {
(
std::env::var("MEILI_URL").expect("MEILI_URL must be set"),
std::env::var("MEILI_MASTER_KEY").expect("MEILI_MASTER_KEY must be set"),
)
}
fn unique_index() -> String {
format!("objects_test_{}", uuid::Uuid::new_v4().simple())
}
fn doc(id: &str, object_name: &str, fields_text: &[&str]) -> SearchDocument {
SearchDocument {
id: id.to_string(),
object_number: format!("N-{id}"),
object_name: object_name.to_string(),
brief_description: None,
current_owner: None,
recorder: None,
visibility: "draft".to_string(),
fields_text: fields_text.iter().map(|s| s.to_string()).collect(),
}
}
#[tokio::test]
async fn index_search_and_remove() {
let (url, key) = meili();
let client = SearchClient::connect(&url, &key, &unique_index()).unwrap();
client.ensure_index().await.unwrap();
let vase = domain::ObjectId::new();
let chair = domain::ObjectId::new();
client.index_object(&doc(&vase.to_string(), "vase", &["wood", "trä"])).await.unwrap();
client.index_object(&doc(&chair.to_string(), "chair", &["oak"])).await.unwrap();
// full-text on a flexible value
let hits = client.search("wood").await.unwrap();
assert_eq!(hits, vec![vase]);
// full-text on the object name
let hits = client.search("chair").await.unwrap();
assert_eq!(hits, vec![chair]);
// remove
client.remove_object(vase).await.unwrap();
assert!(client.search("wood").await.unwrap().is_empty());
}
```
- [ ] **Step 3: Run to verify it fails.** `MEILI_URL=<url> MEILI_MASTER_KEY=<key> cargo test -p search --test search` → FAIL (crate/types missing).
- [ ] **Step 4: Implement** `crates/search/src/lib.rs` (adapt the SDK calls to the installed version; keep behavior + signatures):
```rust
//! Full-text search over catalogue objects, backed by Meilisearch.
use db::Db;
use domain::{CatalogueObject, ObjectId};
use serde::{Deserialize, Serialize};
/// Errors from the search subsystem.
#[derive(Debug, thiserror::Error)]
pub enum SearchError {
#[error(transparent)]
Meili(#[from] meilisearch_sdk::errors::Error),
#[error(transparent)]
Db(#[from] sqlx::Error),
#[error("invalid object id in index: {0}")]
BadId(String),
}
/// The indexed shape of a catalogue object.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SearchDocument {
pub id: String,
pub object_number: String,
pub object_name: String,
pub brief_description: Option<String>,
pub current_owner: Option<String>,
pub recorder: Option<String>,
/// Filterable: "draft" | "internal" | "public".
pub visibility: String,
/// Flexible field values flattened to searchable text (term/authority labels,
/// localized strings, and scalar values).
pub fields_text: Vec<String>,
}
/// A Meilisearch-backed search client scoped to one index.
pub struct SearchClient {
client: meilisearch_sdk::client::Client,
index_uid: String,
}
impl SearchClient {
/// Connect to Meilisearch at `url` with `api_key`, scoped to `index_uid`.
pub fn connect(url: &str, api_key: &str, index_uid: &str) -> Result<Self, SearchError> {
let client = meilisearch_sdk::client::Client::new(url, Some(api_key))?;
Ok(Self { client, index_uid: index_uid.to_owned() })
}
/// Create the index (primary key "id") if absent and set filterable attributes.
pub async fn ensure_index(&self) -> Result<(), SearchError> {
// Create the index if it doesn't exist (ignore "index already exists").
let task = self.client.create_index(&self.index_uid, Some("id")).await?;
task.wait_for_completion(&self.client, None, None).await?;
let index = self.client.index(&self.index_uid);
index
.set_filterable_attributes(["visibility"])
.await?
.wait_for_completion(&self.client, None, None)
.await?;
Ok(())
}
/// Upsert one object document (waits for indexing to complete).
pub async fn index_object(&self, doc: &SearchDocument) -> Result<(), SearchError> {
self.client
.index(&self.index_uid)
.add_or_replace_documents(std::slice::from_ref(doc), Some("id"))
.await?
.wait_for_completion(&self.client, None, None)
.await?;
Ok(())
}
/// Remove one object from the index by id (waits for completion).
pub async fn remove_object(&self, id: ObjectId) -> Result<(), SearchError> {
self.client
.index(&self.index_uid)
.delete_document(id.to_string())
.await?
.wait_for_completion(&self.client, None, None)
.await?;
Ok(())
}
/// Full-text search; returns matching object ids (in Meilisearch ranking order).
pub async fn search(&self, query: &str) -> Result<Vec<ObjectId>, SearchError> {
let results = self
.client
.index(&self.index_uid)
.search()
.with_query(query)
.execute::<SearchDocument>()
.await?;
results
.hits
.into_iter()
.map(|hit| hit.result.id.parse::<ObjectId>().map_err(|_| SearchError::BadId(hit.result.id)))
.collect()
}
/// Rebuild the whole index from the database (clears then re-adds all objects).
pub async fn reindex_all(&self, db: &Db) -> Result<(), SearchError> {
let index = self.client.index(&self.index_uid);
index.delete_all_documents().await?.wait_for_completion(&self.client, None, None).await?;
let objects = db::catalog::list_objects(db.pool()).await?;
let mut docs = Vec::with_capacity(objects.len());
for object in &objects {
docs.push(build_document(db, object).await?);
}
if !docs.is_empty() {
index
.add_or_replace_documents(&docs, Some("id"))
.await?
.wait_for_completion(&self.client, None, None)
.await?;
}
Ok(())
}
}
/// Build a [`SearchDocument`] from an object, resolving its flexible fields to
/// searchable text (term/authority → labels, localized text → all values).
/// Implemented in Task 2; declared here so the crate compiles.
pub async fn build_document(
_db: &Db,
_object: &CatalogueObject,
) -> Result<SearchDocument, SearchError> {
unimplemented!("implemented in Task 2")
}
```
NOTE: `ObjectId: FromStr` (Err = `uuid::Error`) exists from the id macro. `reindex_all`/`build_document` are needed for compilation now (Task 1 test doesn't call them) — `build_document` is a stub `unimplemented!()` filled in Task 2. If clippy flags the stub's unused params, the leading underscores suppress that; if it flags `unimplemented!` in a non-test fn, add `#[allow(clippy::unimplemented)]` to `build_document` with a `// Task 2` note, OR move `reindex_all`+`build_document` entirely into Task 2 (preferred if it keeps Task 1 clippy-clean — in that case omit them here and add `pub mod`-level items in Task 2).
- [ ] **Step 5: Run to verify it passes.** `MEILI_URL=<url> MEILI_MASTER_KEY=<key> cargo test -p search --test search` → PASS. (You may need to adapt SDK calls; iterate until the test passes.)
- [ ] **Step 6: Lint.** `cargo +nightly fmt`; `cargo clippy -p search --all-targets -- -D warnings` → clean.
- [ ] **Step 7: Commit.**
```bash
git add Cargo.toml crates/search
git commit -m "feat(search): add Meilisearch-backed SearchClient (index, search, remove)"
```
---
## Task 2: `build_document` + `reindex_all` (db integration)
**Files:** modify `crates/search/src/lib.rs`; create `crates/search/tests/reindex.rs`.
- [ ] **Step 1: Write the failing test** `crates/search/tests/reindex.rs` (Meilisearch + Postgres):
```rust
use db::{Db, catalog, fields, vocab};
use domain::{
AuditActor, FieldType, LocalizedLabel, NewFieldDefinition, NewTerm, ObjectInput, Visibility,
};
use search::SearchClient;
use sqlx::PgPool;
fn meili() -> (String, String) {
(
std::env::var("MEILI_URL").expect("MEILI_URL must be set"),
std::env::var("MEILI_MASTER_KEY").expect("MEILI_MASTER_KEY must be set"),
)
}
fn unique_index() -> String {
format!("reindex_test_{}", uuid::Uuid::new_v4().simple())
}
#[sqlx::test]
async fn reindex_resolves_term_labels_and_finds_by_label(pool: PgPool) {
let db = Db::from_pool(pool);
// a material vocabulary with a "wood" term
let material = vocab::create_vocabulary(db.pool(), "material").await.unwrap();
let mut tx = db.pool().begin().await.unwrap();
let wood = vocab::add_term(
&mut tx,
&NewTerm {
vocabulary_id: material.id,
external_uri: None,
labels: vec![LocalizedLabel { lang: "en".into(), label: "wood".into() }],
},
)
.await
.unwrap();
fields::create_field_definition(
&mut tx,
&NewFieldDefinition {
key: "material".into(),
field_type: FieldType::Term { vocabulary_id: material.id },
required: false,
group_key: None,
labels: vec![LocalizedLabel { lang: "en".into(), label: "material".into() }],
},
)
.await
.unwrap();
let object_id = catalog::create_object(
&mut tx,
AuditActor::System,
&ObjectInput {
object_number: "LM-1".into(),
object_name: "vase".into(),
number_of_objects: 1,
brief_description: None,
current_location: None,
current_owner: None,
recorder: None,
recording_date: None,
visibility: Visibility::Public,
},
)
.await
.unwrap();
tx.commit().await.unwrap();
// set the material field to the wood term
let mut tx = db.pool().begin().await.unwrap();
catalog::set_object_fields(
&mut tx,
AuditActor::System,
object_id,
serde_json::json!({ "material": wood.to_string() }).as_object().unwrap(),
)
.await
.unwrap();
tx.commit().await.unwrap();
let (url, key) = meili();
let client = SearchClient::connect(&url, &key, &unique_index()).unwrap();
client.ensure_index().await.unwrap();
client.reindex_all(&db).await.unwrap();
// found by the object name
assert_eq!(client.search("vase").await.unwrap(), vec![object_id]);
// found by the resolved TERM LABEL (not the uuid)
assert_eq!(client.search("wood").await.unwrap(), vec![object_id]);
}
```
- [ ] **Step 2: Run to verify it fails.** With both env vars + `DATABASE_URL`: `... cargo test -p search --test reindex` → FAIL (`build_document` is `unimplemented!`).
- [ ] **Step 3: Implement `build_document`** in `crates/search/src/lib.rs` — replace the stub body with a real implementation that flattens the object's flexible fields to searchable text, resolving term/authority values to labels:
```rust
pub async fn build_document(
db: &Db,
object: &CatalogueObject,
) -> Result<SearchDocument, SearchError> {
let mut fields_text = Vec::new();
if let Some(map) = object.fields.as_object() {
for (key, value) in map {
let Some(def) = db::fields::field_definition_by_key(db.pool(), key).await? else {
continue; // a field with no definition (stale) — skip
};
match def.field_type {
domain::FieldType::Text | domain::FieldType::Date => {
if let Some(s) = value.as_str() {
fields_text.push(s.to_owned());
}
}
domain::FieldType::Integer | domain::FieldType::Boolean => {
fields_text.push(value.to_string());
}
domain::FieldType::LocalizedText => {
if let Some(obj) = value.as_object() {
for v in obj.values() {
if let Some(s) = v.as_str() {
fields_text.push(s.to_owned());
}
}
}
}
domain::FieldType::Term { .. } => {
if let Some(term_id) = value.as_str().and_then(|s| s.parse().ok()) {
if let Some(term) = db::vocab::term_by_id(db.pool(), term_id).await? {
fields_text.extend(term.labels.into_iter().map(|l| l.label));
}
}
}
domain::FieldType::Authority { .. } => {
if let Some(authority_id) = value.as_str().and_then(|s| s.parse().ok()) {
if let Some(authority) =
db::authority::authority_by_id(db.pool(), authority_id).await?
{
fields_text.extend(authority.labels.into_iter().map(|l| l.label));
}
}
}
}
}
}
Ok(SearchDocument {
id: object.id.to_string(),
object_number: object.object_number.clone(),
object_name: object.object_name.clone(),
brief_description: object.brief_description.clone(),
current_owner: object.current_owner.clone(),
recorder: object.recorder.clone(),
visibility: object.visibility.as_str().to_owned(),
fields_text,
})
}
```
(`db::vocab::term_by_id` takes a `TermId`; `db::authority::authority_by_id` takes an `AuthorityId` — `value.as_str().and_then(|s| s.parse().ok())` parses into the inferred id type. If type inference needs help, annotate: `let term_id: domain::TermId = ...`.)
- [ ] **Step 4: Run to verify it passes.** `MEILI_URL=<url> MEILI_MASTER_KEY=<key> DATABASE_URL=<url> cargo test -p search --test reindex` → PASS.
- [ ] **Step 5: Full workspace check.**
```bash
cargo +nightly fmt --check
DATABASE_URL=<url> MEILI_URL=<url> MEILI_MASTER_KEY=<key> cargo clippy --workspace --all-targets -- -D warnings
DATABASE_URL=<url> MEILI_URL=<url> MEILI_MASTER_KEY=<key> cargo test --workspace
```
Expected: all green. (The `search` tests need the MEILI env vars; the rest need `DATABASE_URL`.)
- [ ] **Step 6: Commit.**
```bash
git add crates/search
git commit -m "feat(search): build documents resolving term/authority labels; reindex_all"
```
---
## Self-Review (completed)
**Spec coverage (Plan 6 / VISION search MVP):**
- `search` crate, Meilisearch adapter behind `SearchClient`, swappable → Task 1. ✓
- Index core + flexible text; term/authority resolved to labels; localized → all values; visibility filterable; search returns object ids → Tasks 12. ✓
- Build capability + `reindex_all` now; on-write sync deferred to API/service → this plan + notes. ✓
- `search → db → domain` (no cycle); SQL stays in `db` (search calls db repos) → Cargo deps. ✓
- Real-Meili integration tests, unique index per test → Tasks 12. ✓
**Placeholder scan:** the only `unimplemented!` is the Task 1 `build_document` stub, explicitly filled in Task 2 (with a fallback instruction). `<url>`/`<key>` are documented env values. No other placeholders.
**Type consistency:** `SearchDocument` fields used identically in tests + `build_document`; `SearchClient::{connect, ensure_index, index_object, remove_object, search, reindex_all}` signatures consistent across tasks/tests; `search` returns `Vec<ObjectId>` parsed via `ObjectId: FromStr`; `build_document` matches on `domain::FieldType` (Plan 4) and calls `db::vocab::term_by_id`/`db::authority::authority_by_id`/`db::fields::field_definition_by_key`/`db::catalog::list_objects` as defined.
## Notes for follow-on plans
- **On-write sync (API/service, Plan 7+):** after a catalogue create/update/delete/set_fields commits, call `index_object`/`remove_object` best-effort (log failures; `reindex_all` is the recovery path). Meili is not transactional with Postgres — eventual consistency.
- **Public API (Plan 7):** `search` already stores `visibility` as filterable; add a `with_filter("visibility = public")` search variant for the public surface.
- **Per-deployment index/credentials:** production uses a fixed index uid (e.g. `objects`) with a scoped Meili key per the single-tenant deployment; only tests use unique index names.
- **Reindex cost:** `reindex_all` is N+1 over objects×fields (resolves labels per field) — fine for now; batch when collections grow (relates to #12).