Files
whoareyou/docs/superpowers/plans/2026-06-05-wasm-provider-service.md
T
2026-06-05 14:34:29 +02:00

2225 lines
62 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# WASM Provider Service Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Rebuild whoareyou as an async HTTP service that looks up Swedish phone numbers via WASM-component providers (hitta.se in v1), retiring the CLI.
**Architecture:** Cargo workspace with an axum server hosting wasmtime; providers are pure WASM components (WIT contract: `metadata`/`requests`/`parse`) — the host fetches all URLs and caches parsed results in moka. Provider parse logic is plain Rust, unit-tested natively against HTML fixtures; WIT glue is a thin `cfg(wasm32)` layer.
**Tech Stack:** Rust edition 2024 · tokio · axum 0.8 · reqwest 0.13 · moka 0.12 · wasmtime + wasmtime-wasi 45 · wit-bindgen 0.57 · thiserror 2 · tracing · insta 1.47
**Spec:** `docs/superpowers/specs/2026-06-05-wasm-provider-service-design.md`
---
## File structure
```
whoareyou/
├── Cargo.toml # workspace (NEW)
├── justfile # build orchestration (NEW)
├── wit/provider.wit # provider contract (NEW)
├── crates/
│ ├── server/ # package whoareyou-server (lib + bin)
│ │ ├── Cargo.toml
│ │ ├── src/lib.rs # module exports
│ │ ├── src/main.rs # wiring only
│ │ ├── src/config.rs # env config
│ │ ├── src/error.rs # HostError, FetchError, ConfigError
│ │ ├── src/model.rs # Entry, Comment, ProviderResult, API types
│ │ ├── src/service.rs # ProviderHandle + Fetch traits, LookupService
│ │ ├── src/fetch.rs # ReqwestFetcher
│ │ ├── src/http.rs # axum router, normalize()
│ │ ├── src/wasm.rs # wasmtime host, WasmProvider
│ │ └── tests/component.rs # loads the real .wasm
│ └── providers/hitta/ # package whoareyou-provider-hitta (cdylib+rlib)
│ ├── Cargo.toml
│ ├── src/lib.rs
│ ├── src/parser.rs # pure parse logic + native tests
│ └── src/component.rs # wit-bindgen glue (wasm32 only)
├── fixtures/hitta/*.html # KEPT (+ one fresh fixture)
├── fetch-fixture # KEPT, trimmed to hitta
└── DELETED: src/, definitions/, _build.rs, NOTEPAD.md, old Cargo.toml contents
```
`whoareyou-server` is a lib + thin bin so `tests/component.rs` can use its modules.
---
### Task 1: Workspace scaffold & demolition
**Files:**
- Delete: `src/`, `definitions/`, `_build.rs`, `NOTEPAD.md`
- Create: `Cargo.toml` (workspace), `wit/provider.wit`, `crates/server/{Cargo.toml,src/lib.rs,src/main.rs}`, `crates/providers/hitta/{Cargo.toml,src/lib.rs}`
- Modify: `.gitignore`
- [ ] **Step 1: Install the wasm target**
Run: `rustup target add wasm32-wasip2`
Expected: installs or "is up to date".
- [ ] **Step 2: Delete the old code**
```bash
git rm -r src definitions _build.rs NOTEPAD.md
```
(The old hitta parser is reproduced in Task 3 — nothing needed from the deleted tree.)
- [ ] **Step 3: Write the workspace `Cargo.toml`** (replaces the old package manifest)
```toml
[workspace]
resolver = "3"
members = ["crates/server", "crates/providers/hitta"]
[workspace.package]
version = "0.1.0"
edition = "2024"
authors = ["Anders Olsson <anders.e.olsson@gmail.com>"]
```
- [ ] **Step 4: Write `wit/provider.wit`**
```wit
package whoareyou:provider@0.1.0;
interface lookup {
record provider-info {
name: string,
version: string,
}
record request {
url: string,
}
record response {
status: u16,
body: string,
}
record comment {
timestamp: option<s64>,
title: option<string>,
message: string,
}
record entry {
messages: list<string>,
history: list<string>,
comments: list<comment>,
}
variant lookup-error {
no-data,
parse-failed(string),
}
metadata: func() -> provider-info;
requests: func(number: string) -> list<request>;
parse: func(number: string, responses: list<response>) -> result<entry, lookup-error>;
}
world provider {
export lookup;
}
```
- [ ] **Step 5: Create the server crate stub**
`crates/server/Cargo.toml`:
```toml
[package]
name = "whoareyou-server"
version.workspace = true
edition.workspace = true
authors.workspace = true
[dependencies]
[dev-dependencies]
```
`crates/server/src/lib.rs`:
```rust
// modules added as they are implemented
```
`crates/server/src/main.rs`:
```rust
fn main() {}
```
- [ ] **Step 6: Create the hitta provider crate stub**
`crates/providers/hitta/Cargo.toml`:
```toml
[package]
name = "whoareyou-provider-hitta"
version.workspace = true
edition.workspace = true
authors.workspace = true
[lib]
crate-type = ["cdylib", "rlib"]
[dependencies]
[dev-dependencies]
```
`crates/providers/hitta/src/lib.rs`:
```rust
// modules added as they are implemented
```
- [ ] **Step 7: Ignore the components dir**
Append to `.gitignore` (create if missing):
```
components/
```
- [ ] **Step 8: Verify the workspace builds**
Run: `cargo check --workspace`
Expected: success (two empty crates). `Cargo.lock` regenerates — that's fine.
- [ ] **Step 9: Commit**
```bash
git add -A
git commit -m "refactor!: replace CLI with workspace scaffold for WASM provider service"
```
---
### Task 2: Refresh hitta fixture & audit page structure
The 2019 fixtures predate any hitta.se redesign. Before porting the parser, capture what the site serves **today** so Task 3 is written against reality.
**Files:**
- Create: `fixtures/hitta/fresh-0104754350.html`
- [ ] **Step 1: Fetch a fresh copy of a known number's page**
Run:
```bash
curl -sL -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)" \
"https://www.hitta.se/vem-ringde/0104754350" \
-o fixtures/hitta/fresh-0104754350.html
wc -c fixtures/hitta/fresh-0104754350.html
```
Expected: a non-trivial file (> 10 KB). If the response is a bot-block page (check with `head -c 2000`), retry with the `http --follow` (httpie) variant from `fetch-fixture`, or fetch the page in a real browser (View Source → save). The fixture MUST contain real page markup before continuing.
- [ ] **Step 2: Audit the page structure**
Run:
```bash
grep -c "__NEXT_DATA__" fixtures/hitta/fresh-0104754350.html
grep -o '__NEXT_DATA__[^>]\{0,80\}' fixtures/hitta/fresh-0104754350.html | head -3
```
Two outcomes — record which one applies, it determines Step 3 of Task 3:
- **(a) `__NEXT_DATA__` still present.** Check whether it's still `<script>__NEXT_DATA__ = {...};__NEXT_LOADED_PAGES__` (2019 inline style) or the modern `<script id="__NEXT_DATA__" type="application/json">{...}</script>` form. Note which.
- **(b) Gone entirely.** Inspect the page (`python3 -m json.tool` on any embedded JSON, or read the HTML) and locate where phone data + comments live now. Write down the JSON path to: comments list, comment text, comment timestamp, and the statistics/"X others searched" text — Task 3's serde structs must be adapted to those paths (the *shape* of the parser — regex/JSON extraction → typed structs → `ParsedEntry` — stays identical).
- [ ] **Step 3: Commit the fixture**
```bash
git add fixtures/hitta/fresh-0104754350.html
git commit -m "test: add fresh hitta.se fixture for parser port"
```
---
### Task 3: hitta parser (pure logic, native TDD)
Port the old `src/probe/hitta.rs` parse logic (reproduced below) into the provider crate as plain functions. All tests run natively — no WASM involved.
**Files:**
- Create: `crates/providers/hitta/src/parser.rs`
- Modify: `crates/providers/hitta/src/lib.rs`, `crates/providers/hitta/Cargo.toml`
- [ ] **Step 1: Add dependencies**
In `crates/providers/hitta/Cargo.toml` set:
```toml
[dependencies]
regex = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
[dev-dependencies]
insta = { version = "1.47", features = ["yaml"] }
```
- [ ] **Step 2: Declare the module**
`crates/providers/hitta/src/lib.rs`:
```rust
pub mod parser;
```
- [ ] **Step 3: Write the failing tests**
Append to `crates/providers/hitta/src/parser.rs` (create the file with ONLY this test module first; the types/functions it references don't exist yet, that's the point):
```rust
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn requests_single_hitta_url() {
assert_eq!(
request_urls("0700000000"),
vec!["https://www.hitta.se/vem-ringde/0700000000".to_string()]
);
}
#[test]
fn parses_number_with_comments() {
let body = include_str!("../../../../fixtures/hitta/0104754350.html");
let entry = parse(body).unwrap();
assert_eq!(entry.messages, Vec::<String>::new());
assert_eq!(entry.history, vec!["42 andra har rapporterat detta nummer"]);
assert_eq!(entry.comments.len(), 29);
// newest first
let first = &entry.comments[0];
assert_eq!(first.timestamp, Some(1547746162)); // 2019-01-17T17:29:22Z
assert_eq!(first.title, None);
assert_eq!(first.message, "Varmsälj från Folksam");
}
#[test]
fn parses_number_with_history_only() {
let body = include_str!("../../../../fixtures/hitta/0702269893.html");
let entry = parse(body).unwrap();
assert_eq!(entry.history, vec!["Tre andra har också sökt på detta nummer"]);
assert!(entry.comments.is_empty());
}
#[test]
fn no_phone_data_is_no_data() {
let body = include_str!("../../../../fixtures/hitta/0313908905.html");
assert_eq!(parse(body), Err(ParseError::NoData));
}
#[test]
fn unparseable_page_is_failed() {
let body = include_str!("../../../../fixtures/hitta/0701807618.html");
assert!(matches!(parse(body), Err(ParseError::Failed(_))));
}
#[test]
fn garbage_is_failed() {
assert!(matches!(parse("<html></html>"), Err(ParseError::Failed(_))));
}
#[test]
fn parses_fresh_fixture() {
let body = include_str!("../../../../fixtures/hitta/fresh-0104754350.html");
insta::assert_yaml_snapshot!(parse(body));
}
}
```
Semantics note (differs from the old CLI): the old code returned `Ok` with an
all-empty entry when JSON parsed but `phoneData` was absent. That is now
`Err(ParseError::NoData)`. Old fixtures `0313908905`, `0751793426/83/99` fall
in that bucket; `0701807618`, `0546780862` fail the regex → `Failed`.
- [ ] **Step 4: Run tests to verify they fail**
Run: `cargo test -p whoareyou-provider-hitta`
Expected: COMPILE ERROR — `request_urls`, `parse`, `ParseError` not found.
- [ ] **Step 5: Implement the parser**
Prepend to `crates/providers/hitta/src/parser.rs` (above the test module). This is the 2019 logic ported; **if Task 2 found outcome (b) or the modern `<script id="__NEXT_DATA__">` form, adapt `NEXT_DATA_RE` / the serde structs to the JSON paths recorded in Task 2** — keep the public surface (`request_urls`, `parse`, the three types) exactly as below:
```rust
use std::sync::LazyLock;
use regex::Regex;
use serde::{Deserialize, Serialize};
#[derive(Debug, PartialEq, Serialize)]
pub struct ParsedEntry {
pub messages: Vec<String>,
pub history: Vec<String>,
pub comments: Vec<ParsedComment>,
}
#[derive(Debug, PartialEq, Serialize)]
pub struct ParsedComment {
/// Unix epoch seconds, UTC.
pub timestamp: Option<i64>,
pub title: Option<String>,
pub message: String,
}
#[derive(Debug, PartialEq, Serialize)]
pub enum ParseError {
/// Page fetched and understood, but it contains no data for the number.
NoData,
/// Page structure did not match expectations — scraper rot signal.
Failed(String),
}
static NEXT_DATA_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r"<script>__NEXT_DATA__ = (.*?);__NEXT_LOADED_PAGES__").unwrap()
});
#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct Data {
props: Props,
}
#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct Props {
page_props: PageProps,
}
#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct PageProps {
phone_data: Option<PhoneData>,
}
#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct PhoneData {
#[serde(default)]
comments: Vec<RawComment>,
statistics_text: String,
}
#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct RawComment {
comment: String,
/// Milliseconds since epoch.
timestamp: u64,
}
pub fn request_urls(number: &str) -> Vec<String> {
vec![format!("https://www.hitta.se/vem-ringde/{number}")]
}
pub fn parse(body: &str) -> Result<ParsedEntry, ParseError> {
let captures = NEXT_DATA_RE
.captures(body)
.ok_or_else(|| ParseError::Failed("__NEXT_DATA__ not found".to_string()))?;
let json = captures.get(1).unwrap().as_str();
let data: Data = serde_json::from_str(json)
.map_err(|e| ParseError::Failed(format!("deserialize __NEXT_DATA__: {e}")))?;
let Some(phone_data) = data.props.page_props.phone_data else {
return Err(ParseError::NoData);
};
let mut comments: Vec<ParsedComment> = phone_data
.comments
.into_iter()
.map(|c| ParsedComment {
timestamp: Some((c.timestamp / 1000) as i64),
title: None,
message: c.comment,
})
.collect();
comments.sort_by(|a, b| b.timestamp.cmp(&a.timestamp));
Ok(ParsedEntry {
messages: Vec::new(),
history: vec![phone_data.statistics_text],
comments,
})
}
```
- [ ] **Step 6: Run tests to verify they pass**
Run: `cargo test -p whoareyou-provider-hitta`
Expected: all pass except possibly `parses_fresh_fixture` (pending snapshot).
If the fresh-fixture test FAILS to parse (`Failed`/`NoData` against a real
page that visibly has data), the site changed — adapt the regex/structs per
Task 2's notes until the fresh fixture parses, while keeping the 2019-fixture
tests passing (if the old format is truly gone from the new code path, update
those tests' expectations to `Failed` and note it in the commit message).
- [ ] **Step 7: Accept the fresh-fixture snapshot after eyeballing it**
Run: `cargo insta review` (or `cargo insta accept` after inspecting the `.snap.new` file manually)
Expected: snapshot under `crates/providers/hitta/src/snapshots/` showing a plausible entry (or an honest `NoData`/`Failed` for a dead number — verify it matches what the fixture actually contains).
- [ ] **Step 8: Run the full test suite**
Run: `cargo test --workspace`
Expected: PASS.
- [ ] **Step 9: Commit**
```bash
git add crates/providers/hitta .gitignore Cargo.lock
git commit -m "feat: port hitta.se parser as pure native-testable functions"
```
---
### Task 4: hitta component glue (WIT export)
**Files:**
- Create: `crates/providers/hitta/src/component.rs`
- Modify: `crates/providers/hitta/src/lib.rs`, `crates/providers/hitta/Cargo.toml`
- [ ] **Step 1: Add wit-bindgen for wasm32 only**
Append to `crates/providers/hitta/Cargo.toml`:
```toml
[target.'cfg(target_arch = "wasm32")'.dependencies]
wit-bindgen = "0.57"
```
- [ ] **Step 2: Write the glue**
`crates/providers/hitta/src/component.rs`:
```rust
use crate::parser;
wit_bindgen::generate!({
world: "provider",
path: "../../../wit",
});
use exports::whoareyou::provider::lookup::{
Comment, Entry, Guest, LookupError, ProviderInfo, Request, Response,
};
struct Component;
impl Guest for Component {
fn metadata() -> ProviderInfo {
ProviderInfo {
name: "hitta.se".to_string(),
version: env!("CARGO_PKG_VERSION").to_string(),
}
}
fn requests(number: String) -> Vec<Request> {
parser::request_urls(&number)
.into_iter()
.map(|url| Request { url })
.collect()
}
fn parse(_number: String, responses: Vec<Response>) -> Result<Entry, LookupError> {
let Some(first) = responses.first() else {
return Err(LookupError::ParseFailed("no responses provided".to_string()));
};
match parser::parse(&first.body) {
Ok(entry) => Ok(Entry {
messages: entry.messages,
history: entry.history,
comments: entry
.comments
.into_iter()
.map(|c| Comment {
timestamp: c.timestamp,
title: c.title,
message: c.message,
})
.collect(),
}),
Err(parser::ParseError::NoData) => Err(LookupError::NoData),
Err(parser::ParseError::Failed(msg)) => Err(LookupError::ParseFailed(msg)),
}
}
}
export!(Component);
```
- [ ] **Step 3: Gate it into the crate**
`crates/providers/hitta/src/lib.rs`:
```rust
pub mod parser;
#[cfg(target_arch = "wasm32")]
mod component;
```
- [ ] **Step 4: Build the component**
Run: `cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta`
Expected: success; `target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm` exists.
- [ ] **Step 5: Verify native tests still pass**
Run: `cargo test -p whoareyou-provider-hitta`
Expected: PASS (glue is cfg'd out natively).
- [ ] **Step 6: Commit**
```bash
git add crates/providers/hitta
git commit -m "feat: export hitta parser as a WASM component via wit-bindgen"
```
---
### Task 5: Server model types
**Files:**
- Create: `crates/server/src/model.rs`
- Modify: `crates/server/src/lib.rs`, `crates/server/Cargo.toml`
- [ ] **Step 1: Add first server dependencies**
In `crates/server/Cargo.toml`:
```toml
[dependencies]
serde = { version = "1", features = ["derive"] }
[dev-dependencies]
serde_json = "1"
```
- [ ] **Step 2: Write the failing test**
`crates/server/src/model.rs` (test module only, types come next step):
```rust
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn provider_result_serializes_to_api_shape() {
let ok = ProviderResult::Ok {
entry: Entry {
messages: vec![],
history: vec!["42 andra".to_string()],
comments: vec![Comment {
timestamp: Some(1547746162),
title: None,
message: "Varmsälj".to_string(),
}],
},
};
let json = serde_json::to_value(&ok).unwrap();
assert_eq!(json["status"], "ok");
assert_eq!(json["entry"]["history"][0], "42 andra");
assert_eq!(json["entry"]["comments"][0]["timestamp"], 1547746162);
assert_eq!(
serde_json::to_value(&ProviderResult::NoData).unwrap()["status"],
"no_data"
);
assert_eq!(
serde_json::to_value(&ProviderResult::FetchFailed).unwrap()["status"],
"fetch_failed"
);
assert_eq!(
serde_json::to_value(&ProviderResult::ParseFailed).unwrap()["status"],
"parse_failed"
);
}
}
```
`crates/server/src/lib.rs`:
```rust
pub mod model;
```
- [ ] **Step 3: Run test to verify it fails**
Run: `cargo test -p whoareyou-server`
Expected: COMPILE ERROR — types not defined.
- [ ] **Step 4: Implement the types**
Prepend to `crates/server/src/model.rs`:
```rust
use std::collections::BTreeMap;
use serde::Serialize;
#[derive(Debug, Clone, PartialEq, Serialize)]
pub struct Entry {
pub messages: Vec<String>,
pub history: Vec<String>,
pub comments: Vec<Comment>,
}
#[derive(Debug, Clone, PartialEq, Serialize)]
pub struct Comment {
/// Unix epoch seconds, UTC.
pub timestamp: Option<i64>,
pub title: Option<String>,
pub message: String,
}
/// Per-provider outcome as exposed in the API (and cached).
#[derive(Debug, Clone, PartialEq, Serialize)]
#[serde(tag = "status", rename_all = "snake_case")]
pub enum ProviderResult {
Ok { entry: Entry },
NoData,
FetchFailed,
ParseFailed,
}
/// A fetched HTTP response handed to a provider's `parse`.
#[derive(Debug, Clone)]
pub struct FetchedResponse {
pub status: u16,
pub body: String,
}
/// Outcome of a provider's `parse` call, before API mapping.
#[derive(Debug)]
pub enum ParseOutcome {
Ok(Entry),
NoData,
Failed(String),
}
#[derive(Debug, Serialize)]
pub struct LookupResponse {
pub number: String,
pub results: BTreeMap<String, ProviderResult>,
}
```
- [ ] **Step 5: Run test to verify it passes**
Run: `cargo test -p whoareyou-server`
Expected: PASS.
- [ ] **Step 6: Commit**
```bash
git add crates/server Cargo.lock
git commit -m "feat: add server model types and API serialization shape"
```
---
### Task 6: Errors and fetcher
**Files:**
- Create: `crates/server/src/error.rs`, `crates/server/src/fetch.rs`
- Modify: `crates/server/src/lib.rs`, `crates/server/Cargo.toml`
- [ ] **Step 1: Add dependencies**
Extend `crates/server/Cargo.toml` `[dependencies]`:
```toml
async-trait = "0.1"
reqwest = "0.13"
thiserror = "2"
tokio = { version = "1", features = ["full"] }
wasmtime = { version = "45", features = ["component-model"] }
```
(wasmtime is needed now because `HostError` wraps `wasmtime::Error`.)
- [ ] **Step 2: Write `crates/server/src/error.rs`**
```rust
use thiserror::Error;
/// Errors from hosting/calling a WASM component.
#[derive(Debug, Error)]
pub enum HostError {
#[error("wasm error: {0}")]
Wasm(#[from] wasmtime::Error),
#[error("io error: {0}")]
Io(#[from] std::io::Error),
}
#[derive(Debug, Error)]
pub enum FetchError {
#[error("request failed: {0}")]
Request(#[from] reqwest::Error),
}
#[derive(Debug, Error)]
pub enum ConfigError {
#[error("invalid value for {key}: {message}")]
Invalid { key: String, message: String },
}
```
- [ ] **Step 3: Write `crates/server/src/fetch.rs`**
The `Fetch` trait lives in `service.rs` (Task 7); to keep this task compiling
standalone, define the trait there first — so this task only adds the
*implementation file* with a stub trait import deferred. Simplest ordering:
write `fetch.rs` now but leave it out of `lib.rs` until Task 7 wires it in.
`crates/server/src/fetch.rs`:
```rust
use std::time::Duration;
use async_trait::async_trait;
use crate::error::FetchError;
use crate::model::FetchedResponse;
use crate::service::Fetch;
pub struct ReqwestFetcher {
client: reqwest::Client,
}
impl ReqwestFetcher {
pub fn new(timeout: Duration) -> Result<Self, FetchError> {
let client = reqwest::Client::builder()
.timeout(timeout)
.user_agent(concat!("whoareyou/", env!("CARGO_PKG_VERSION")))
.build()?;
Ok(Self { client })
}
}
#[async_trait]
impl Fetch for ReqwestFetcher {
async fn fetch(&self, url: &str) -> Result<FetchedResponse, FetchError> {
let response = self.client.get(url).send().await?;
let status = response.status().as_u16();
let body = response.text().await?;
Ok(FetchedResponse { status, body })
}
}
```
- [ ] **Step 4: Wire only `error` into `lib.rs`**
`crates/server/src/lib.rs`:
```rust
pub mod error;
pub mod model;
```
- [ ] **Step 5: Verify it compiles**
Run: `cargo check -p whoareyou-server`
Expected: success (`fetch.rs` is not yet a module, so its `crate::service` import is not compiled).
- [ ] **Step 6: Commit**
```bash
git add crates/server Cargo.lock
git commit -m "feat: add server error types and reqwest fetcher"
```
---
### Task 7: LookupService (orchestration + cache, TDD)
**Files:**
- Create: `crates/server/src/service.rs`
- Modify: `crates/server/src/lib.rs`, `crates/server/Cargo.toml`
- [ ] **Step 1: Add dependencies**
Extend `crates/server/Cargo.toml` `[dependencies]`:
```toml
futures = "0.3"
moka = { version = "0.12", features = ["future"] }
tracing = "0.1"
```
- [ ] **Step 2: Write the failing tests**
`crates/server/src/service.rs`, test module first:
```rust
#[cfg(test)]
mod tests {
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::time::Duration;
use async_trait::async_trait;
use super::*;
use crate::error::{FetchError, HostError};
use crate::model::{Comment, Entry, FetchedResponse, ParseOutcome, ProviderResult};
fn entry() -> Entry {
Entry {
messages: vec![],
history: vec!["history".to_string()],
comments: vec![Comment {
timestamp: Some(1547746162),
title: None,
message: "spam".to_string(),
}],
}
}
/// Provider whose parse outcome is scripted per call.
struct FakeProvider {
name: &'static str,
outcome: fn() -> ParseOutcome,
}
impl ProviderHandle for FakeProvider {
fn name(&self) -> &str {
self.name
}
fn requests(&self, number: &str) -> Result<Vec<String>, HostError> {
Ok(vec![format!("https://example.test/{number}")])
}
fn parse(
&self,
_number: &str,
_responses: &[FetchedResponse],
) -> ParseOutcome {
(self.outcome)()
}
}
/// Fetcher that counts calls and can be told to fail.
struct FakeFetcher {
calls: AtomicUsize,
fail: bool,
}
impl FakeFetcher {
fn new(fail: bool) -> Self {
Self { calls: AtomicUsize::new(0), fail }
}
}
#[async_trait]
impl Fetch for FakeFetcher {
async fn fetch(&self, _url: &str) -> Result<FetchedResponse, FetchError> {
self.calls.fetch_add(1, Ordering::SeqCst);
if self.fail {
// construct a real reqwest error by failing a bad URL... instead
// keep FetchError easy to fabricate via a connection refused on a
// reserved port? No — simplest: add a test-only variant? Use
// reqwest from an invalid builder is convoluted. See note below.
unreachable!("replaced in Step 4");
}
Ok(FetchedResponse { status: 200, body: "body".to_string() })
}
}
fn service(
providers: Vec<Arc<dyn ProviderHandle>>,
fetcher: Arc<dyn Fetch>,
) -> LookupService {
LookupService::new(providers, fetcher, Duration::from_secs(60))
}
#[tokio::test]
async fn ok_result_is_returned_and_cached() {
let provider = Arc::new(FakeProvider {
name: "fake.se",
outcome: || ParseOutcome::Ok(entry()),
});
let fetcher = Arc::new(FakeFetcher::new(false));
let svc = service(vec![provider], fetcher.clone());
let results = svc.lookup("0700000000").await;
assert_eq!(results["fake.se"], ProviderResult::Ok { entry: entry() });
// second lookup served from cache — fetcher not called again
let results = svc.lookup("0700000000").await;
assert_eq!(results["fake.se"], ProviderResult::Ok { entry: entry() });
assert_eq!(fetcher.calls.load(Ordering::SeqCst), 1);
}
#[tokio::test]
async fn no_data_is_cached() {
let provider = Arc::new(FakeProvider { name: "fake.se", outcome: || ParseOutcome::NoData });
let fetcher = Arc::new(FakeFetcher::new(false));
let svc = service(vec![provider], fetcher.clone());
assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::NoData);
assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::NoData);
assert_eq!(fetcher.calls.load(Ordering::SeqCst), 1);
}
#[tokio::test]
async fn parse_failure_maps_and_is_cached() {
let provider = Arc::new(FakeProvider {
name: "fake.se",
outcome: || ParseOutcome::Failed("rot".to_string()),
});
let fetcher = Arc::new(FakeFetcher::new(false));
let svc = service(vec![provider], fetcher.clone());
assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::ParseFailed);
assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::ParseFailed);
assert_eq!(fetcher.calls.load(Ordering::SeqCst), 1);
}
#[tokio::test]
async fn fetch_failure_is_not_cached() {
let provider = Arc::new(FakeProvider {
name: "fake.se",
outcome: || ParseOutcome::NoData,
});
let fetcher = Arc::new(FakeFetcher::new(true));
let svc = service(vec![provider], fetcher.clone());
assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::FetchFailed);
assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::FetchFailed);
// NOT cached: fetcher tried twice
assert_eq!(fetcher.calls.load(Ordering::SeqCst), 2);
}
#[tokio::test]
async fn multiple_providers_keyed_by_name() {
let a = Arc::new(FakeProvider { name: "a.se", outcome: || ParseOutcome::NoData });
let b = Arc::new(FakeProvider {
name: "b.se",
outcome: || ParseOutcome::Ok(entry()),
});
let fetcher = Arc::new(FakeFetcher::new(false));
let svc = service(vec![a, b], fetcher);
let results = svc.lookup("0700000000").await;
assert_eq!(results.len(), 2);
assert_eq!(results["a.se"], ProviderResult::NoData);
assert!(matches!(results["b.se"], ProviderResult::Ok { .. }));
}
}
```
**Fabricating a `FetchError` in tests:** `reqwest::Error` cannot be constructed
directly. Make the fail path real instead of fabricated — in Step 4's
implementation of `FakeFetcher::fetch`, replace the `unreachable!` with an
actual failing request against a closed local port:
```rust
if self.fail {
let err = reqwest::Client::new()
.get("http://127.0.0.1:1/unreachable")
.send()
.await
.unwrap_err();
return Err(FetchError::Request(err));
}
```
(Port 1 is never listening; connection is refused immediately — no external
network involved.)
- [ ] **Step 3: Run tests to verify they fail**
Run: `cargo test -p whoareyou-server`
Expected: COMPILE ERROR — `ProviderHandle`, `Fetch`, `LookupService` not defined.
- [ ] **Step 4: Implement the service**
Prepend to `crates/server/src/service.rs` (and fix the `FakeFetcher` fail path as noted above):
```rust
use std::collections::BTreeMap;
use std::sync::Arc;
use std::time::Duration;
use async_trait::async_trait;
use moka::future::Cache;
use tracing::warn;
use crate::error::{FetchError, HostError};
use crate::model::{FetchedResponse, ParseOutcome, ProviderResult};
/// A loaded provider. Implemented by `wasm::WasmProvider`; faked in tests.
/// Methods are sync — WASM calls are CPU-bound; the service wraps them in
/// `spawn_blocking`.
pub trait ProviderHandle: Send + Sync {
fn name(&self) -> &str;
fn requests(&self, number: &str) -> Result<Vec<String>, HostError>;
fn parse(&self, number: &str, responses: &[FetchedResponse]) -> ParseOutcome;
}
#[async_trait]
pub trait Fetch: Send + Sync {
async fn fetch(&self, url: &str) -> Result<FetchedResponse, FetchError>;
}
pub struct LookupService {
providers: Vec<Arc<dyn ProviderHandle>>,
fetcher: Arc<dyn Fetch>,
cache: Cache<String, ProviderResult>,
}
impl LookupService {
pub fn new(
providers: Vec<Arc<dyn ProviderHandle>>,
fetcher: Arc<dyn Fetch>,
cache_ttl: Duration,
) -> Self {
Self {
providers,
fetcher,
cache: Cache::builder().time_to_live(cache_ttl).build(),
}
}
pub fn provider_names(&self) -> Vec<&str> {
self.providers.iter().map(|p| p.name()).collect()
}
/// Run all providers concurrently; one result per provider name.
pub async fn lookup(&self, number: &str) -> BTreeMap<String, ProviderResult> {
let tasks = self.providers.iter().map(|provider| {
let provider = provider.clone();
let fetcher = self.fetcher.clone();
let cache = self.cache.clone();
let number = number.to_string();
async move {
let name = provider.name().to_string();
let key = format!("{name}:{number}");
if let Some(hit) = cache.get(&key).await {
return (name, hit);
}
let result = run_provider(provider, &number, fetcher).await;
// Transient failures must not poison the cache.
if result != ProviderResult::FetchFailed {
cache.insert(key, result.clone()).await;
}
(name, result)
}
});
futures::future::join_all(tasks).await.into_iter().collect()
}
}
async fn run_provider(
provider: Arc<dyn ProviderHandle>,
number: &str,
fetcher: Arc<dyn Fetch>,
) -> ProviderResult {
let name = provider.name().to_string();
let urls = {
let provider = provider.clone();
let number = number.to_string();
match tokio::task::spawn_blocking(move || provider.requests(&number)).await {
Ok(Ok(urls)) => urls,
Ok(Err(error)) => {
warn!(provider = %name, %error, "requests() failed");
return ProviderResult::ParseFailed;
}
Err(error) => {
warn!(provider = %name, %error, "requests() panicked");
return ProviderResult::ParseFailed;
}
}
};
let fetched = futures::future::join_all(urls.iter().map(|url| fetcher.fetch(url))).await;
let mut responses = Vec::with_capacity(fetched.len());
for result in fetched {
match result {
Ok(response) => responses.push(response),
Err(error) => {
warn!(provider = %name, %error, "fetch failed");
return ProviderResult::FetchFailed;
}
}
}
let outcome = {
let provider = provider.clone();
let number = number.to_string();
tokio::task::spawn_blocking(move || provider.parse(&number, &responses)).await
};
match outcome {
Ok(ParseOutcome::Ok(entry)) => ProviderResult::Ok { entry },
Ok(ParseOutcome::NoData) => ProviderResult::NoData,
Ok(ParseOutcome::Failed(message)) => {
warn!(provider = %name, %message, "parse failed — scraper rot?");
ProviderResult::ParseFailed
}
Err(error) => {
warn!(provider = %name, %error, "parse() panicked");
ProviderResult::ParseFailed
}
}
}
```
- [ ] **Step 5: Wire modules into `lib.rs`**
`crates/server/src/lib.rs`:
```rust
pub mod error;
pub mod fetch;
pub mod model;
pub mod service;
```
- [ ] **Step 6: Run tests to verify they pass**
Run: `cargo test -p whoareyou-server`
Expected: PASS (all five service tests + model test).
- [ ] **Step 7: Commit**
```bash
git add crates/server Cargo.lock
git commit -m "feat: add LookupService with moka cache and provider orchestration"
```
---
### Task 8: HTTP layer (axum, TDD)
**Files:**
- Create: `crates/server/src/http.rs`
- Modify: `crates/server/src/lib.rs`, `crates/server/Cargo.toml`
- [ ] **Step 1: Add dependencies**
Extend `crates/server/Cargo.toml`:
```toml
[dependencies]
# add:
axum = "0.8"
serde_json = "1"
[dev-dependencies]
# add:
http-body-util = "0.1"
tower = { version = "0.5", features = ["util"] }
```
(`serde_json` moves from dev-dependencies to dependencies — remove the dev entry.)
- [ ] **Step 2: Write the failing tests**
`crates/server/src/http.rs`, test module first:
```rust
#[cfg(test)]
mod tests {
use std::sync::Arc;
use std::time::Duration;
use async_trait::async_trait;
use axum::body::Body;
use axum::http::{Request, StatusCode};
use http_body_util::BodyExt;
use tower::ServiceExt;
use super::*;
use crate::error::{FetchError, HostError};
use crate::model::{FetchedResponse, ParseOutcome};
use crate::service::{Fetch, LookupService, ProviderHandle};
struct NoDataProvider;
impl ProviderHandle for NoDataProvider {
fn name(&self) -> &str {
"fake.se"
}
fn requests(&self, number: &str) -> Result<Vec<String>, HostError> {
Ok(vec![format!("https://example.test/{number}")])
}
fn parse(&self, _: &str, _: &[FetchedResponse]) -> ParseOutcome {
ParseOutcome::NoData
}
}
struct StaticFetcher;
#[async_trait]
impl Fetch for StaticFetcher {
async fn fetch(&self, _: &str) -> Result<FetchedResponse, FetchError> {
Ok(FetchedResponse { status: 200, body: String::new() })
}
}
fn app() -> axum::Router {
let service = LookupService::new(
vec![Arc::new(NoDataProvider)],
Arc::new(StaticFetcher),
Duration::from_secs(60),
);
router(Arc::new(service))
}
#[test]
fn normalize_strips_separators() {
assert_eq!(normalize("0700 00-00.00"), Some("0700000000".to_string()));
assert_eq!(normalize("+46701234567"), Some("+46701234567".to_string()));
}
#[test]
fn normalize_rejects_garbage() {
assert_eq!(normalize("not-a-number"), None);
assert_eq!(normalize(""), None);
assert_eq!(normalize("0"), None);
assert_eq!(normalize("07001231231231231231"), None); // > 15 digits
assert_eq!(normalize("070+123"), None); // '+' not at start
}
#[tokio::test]
async fn lookup_returns_results_keyed_by_provider() {
let response = app()
.oneshot(
Request::builder()
.uri("/api/v1/number/0700 00-00 00")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::OK);
let bytes = response.into_body().collect().await.unwrap().to_bytes();
let json: serde_json::Value = serde_json::from_slice(&bytes).unwrap();
assert_eq!(json["number"], "0700000000");
assert_eq!(json["results"]["fake.se"]["status"], "no_data");
}
#[tokio::test]
async fn invalid_number_is_400() {
let response = app()
.oneshot(
Request::builder()
.uri("/api/v1/number/banana")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::BAD_REQUEST);
}
#[tokio::test]
async fn healthz_is_ok() {
let response = app()
.oneshot(Request::builder().uri("/healthz").body(Body::empty()).unwrap())
.await
.unwrap();
assert_eq!(response.status(), StatusCode::OK);
}
}
```
- [ ] **Step 3: Run tests to verify they fail**
Run: `cargo test -p whoareyou-server`
Expected: COMPILE ERROR — `router`, `normalize` not defined.
- [ ] **Step 4: Implement the HTTP layer**
Prepend to `crates/server/src/http.rs`:
```rust
use std::sync::Arc;
use axum::Json;
use axum::Router;
use axum::extract::{Path, State};
use axum::http::StatusCode;
use axum::response::{IntoResponse, Response};
use axum::routing::get;
use serde_json::json;
use crate::model::LookupResponse;
use crate::service::LookupService;
pub fn router(service: Arc<LookupService>) -> Router {
Router::new()
.route("/api/v1/number/{number}", get(lookup_number))
.route("/healthz", get(|| async { "ok" }))
.with_state(service)
}
async fn lookup_number(
State(service): State<Arc<LookupService>>,
Path(raw): Path<String>,
) -> Response {
let Some(number) = normalize(&raw) else {
return (
StatusCode::BAD_REQUEST,
Json(json!({ "error": "invalid phone number" })),
)
.into_response();
};
let results = service.lookup(&number).await;
Json(LookupResponse { number, results }).into_response()
}
/// Strip separators and validate: optional leading '+', then 215 digits.
pub fn normalize(raw: &str) -> Option<String> {
let cleaned: String = raw
.chars()
.filter(|c| !matches!(c, ' ' | '-' | '.'))
.collect();
let digits = cleaned.strip_prefix('+').unwrap_or(&cleaned);
let valid = (2..=15).contains(&digits.len())
&& digits.chars().all(|c| c.is_ascii_digit());
valid.then_some(cleaned)
}
```
- [ ] **Step 5: Wire the module**
`crates/server/src/lib.rs`:
```rust
pub mod error;
pub mod fetch;
pub mod http;
pub mod model;
pub mod service;
```
- [ ] **Step 6: Run tests to verify they pass**
Run: `cargo test -p whoareyou-server`
Expected: PASS.
- [ ] **Step 7: Commit**
```bash
git add crates/server Cargo.lock
git commit -m "feat: add axum HTTP layer with lookup endpoint and healthz"
```
---
### Task 9: wasmtime host (WasmProvider)
**Files:**
- Create: `crates/server/src/wasm.rs`
- Modify: `crates/server/src/lib.rs`, `crates/server/Cargo.toml`
- [ ] **Step 1: Add wasmtime-wasi**
Extend `crates/server/Cargo.toml` `[dependencies]`:
```toml
wasmtime-wasi = "45"
```
- [ ] **Step 2: Write `crates/server/src/wasm.rs`**
> **API-drift note:** the `WasiView`/`WasiCtxView` shape below matches recent
> wasmtime-wasi releases as of this plan's writing. If `cargo check` disagrees,
> consult https://docs.rs/wasmtime-wasi/45 — the intent is fixed: a store data
> struct holding `WasiCtx` + `ResourceTable`, WASI added to the linker sync,
> no preopens / no env / no inherited stdio. Adapt mechanically; do not change
> the public surface of this module.
```rust
use std::path::Path;
use wasmtime::component::{Component, Linker};
use wasmtime::{Config, Engine, Store};
use wasmtime_wasi::ResourceTable;
use wasmtime_wasi::p2::{WasiCtx, WasiCtxBuilder, WasiCtxView, WasiView};
use crate::error::HostError;
use crate::model::{Comment, Entry, FetchedResponse, ParseOutcome};
use crate::service::ProviderHandle;
wasmtime::component::bindgen!({
world: "provider",
path: "../../wit",
});
use exports::whoareyou::provider::lookup::{LookupError as WitLookupError, Response as WitResponse};
/// How many epoch ticks a guest call may run. The epoch thread ticks every
/// 100 ms → 50 ticks ≈ 5 s budget per call.
const EPOCH_DEADLINE_TICKS: u64 = 50;
pub const EPOCH_TICK: std::time::Duration = std::time::Duration::from_millis(100);
pub struct HostState {
ctx: WasiCtx,
table: ResourceTable,
}
impl WasiView for HostState {
fn ctx(&mut self) -> WasiCtxView<'_> {
WasiCtxView { ctx: &mut self.ctx, table: &mut self.table }
}
}
pub fn engine() -> Result<Engine, HostError> {
let mut config = Config::new();
config.epoch_interruption(true);
Ok(Engine::new(&config)?)
}
pub fn linker(engine: &Engine) -> Result<Linker<HostState>, HostError> {
let mut linker = Linker::new(engine);
wasmtime_wasi::p2::add_to_linker_sync(&mut linker)?;
Ok(linker)
}
/// Spawn the thread that advances the engine epoch so runaway guest calls
/// trap instead of hanging the service. Call once at startup.
pub fn spawn_epoch_thread(engine: &Engine) {
let engine = engine.clone();
std::thread::spawn(move || {
loop {
std::thread::sleep(EPOCH_TICK);
engine.increment_epoch();
}
});
}
pub struct WasmProvider {
name: String,
version: String,
engine: Engine,
pre: ProviderPre<HostState>,
}
impl WasmProvider {
/// Compile a component from disk and read its metadata once.
/// Fails fast if the component does not satisfy the provider world.
pub fn load(
engine: &Engine,
linker: &Linker<HostState>,
path: &Path,
) -> Result<Self, HostError> {
let component = Component::from_file(engine, path)?;
let pre = ProviderPre::new(linker.instantiate_pre(&component)?)?;
let mut provider = Self {
name: String::new(),
version: String::new(),
engine: engine.clone(),
pre,
};
let mut store = provider.new_store();
let instance = provider.pre.instantiate(&mut store)?;
let info = instance.whoareyou_provider_lookup().call_metadata(&mut store)?;
provider.name = info.name;
provider.version = info.version;
Ok(provider)
}
pub fn version(&self) -> &str {
&self.version
}
fn new_store(&self) -> Store<HostState> {
// No preopens, no env, no inherited stdio — fully sandboxed guest.
let ctx = WasiCtxBuilder::new().build();
let mut store = Store::new(
&self.engine,
HostState { ctx, table: ResourceTable::new() },
);
store.set_epoch_deadline(EPOCH_DEADLINE_TICKS);
store
}
}
impl ProviderHandle for WasmProvider {
fn name(&self) -> &str {
&self.name
}
fn requests(&self, number: &str) -> Result<Vec<String>, HostError> {
let mut store = self.new_store();
let instance = self.pre.instantiate(&mut store)?;
let requests = instance
.whoareyou_provider_lookup()
.call_requests(&mut store, number)?;
Ok(requests.into_iter().map(|r| r.url).collect())
}
fn parse(&self, number: &str, responses: &[FetchedResponse]) -> ParseOutcome {
let wit_responses: Vec<WitResponse> = responses
.iter()
.map(|r| WitResponse { status: r.status, body: r.body.clone() })
.collect();
let mut store = self.new_store();
let result = (|| {
let instance = self.pre.instantiate(&mut store)?;
instance
.whoareyou_provider_lookup()
.call_parse(&mut store, number, &wit_responses)
})();
match result {
Ok(Ok(entry)) => ParseOutcome::Ok(Entry {
messages: entry.messages,
history: entry.history,
comments: entry
.comments
.into_iter()
.map(|c| Comment {
timestamp: c.timestamp,
title: c.title,
message: c.message,
})
.collect(),
}),
Ok(Err(WitLookupError::NoData)) => ParseOutcome::NoData,
Ok(Err(WitLookupError::ParseFailed(message))) => ParseOutcome::Failed(message),
// Trap (incl. epoch deadline exceeded) or instantiation failure.
Err(error) => ParseOutcome::Failed(format!("component error: {error}")),
}
}
}
```
- [ ] **Step 3: Wire the module**
`crates/server/src/lib.rs`:
```rust
pub mod error;
pub mod fetch;
pub mod http;
pub mod model;
pub mod service;
pub mod wasm;
```
- [ ] **Step 4: Verify it compiles (adapt API drift here if needed)**
Run: `cargo check -p whoareyou-server`
Expected: success. If `WasiView`/`WasiCtxView`/`add_to_linker_sync` signatures
drifted in wasmtime-wasi 45, fix per the docs.rs note above and re-check.
- [ ] **Step 5: Run all tests**
Run: `cargo test -p whoareyou-server`
Expected: PASS (no new tests — real coverage lands in Task 10's integration test).
- [ ] **Step 6: Commit**
```bash
git add crates/server Cargo.lock
git commit -m "feat: add wasmtime host with epoch-bounded WasmProvider"
```
---
### Task 10: Component integration test
Proves the WIT boundary end-to-end: the real `.wasm` built from Task 4, loaded by the real host from Task 9.
**Files:**
- Create: `crates/server/tests/component.rs`
- [ ] **Step 1: Build the component**
Run: `cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta`
Expected: `target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm` exists.
- [ ] **Step 2: Write the integration test**
`crates/server/tests/component.rs`:
```rust
use std::path::Path;
use whoareyou_server::model::{FetchedResponse, ParseOutcome};
use whoareyou_server::service::ProviderHandle;
use whoareyou_server::wasm;
const COMPONENT_PATH: &str = concat!(
env!("CARGO_MANIFEST_DIR"),
"/../../target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm"
);
fn load_provider() -> wasm::WasmProvider {
let path = Path::new(COMPONENT_PATH);
assert!(
path.exists(),
"hitta component not built — run `just build-components` first"
);
let engine = wasm::engine().unwrap();
let linker = wasm::linker(&engine).unwrap();
wasm::spawn_epoch_thread(&engine);
wasm::WasmProvider::load(&engine, &linker, path).unwrap()
}
#[test]
fn metadata_identifies_hitta() {
let provider = load_provider();
assert_eq!(provider.name(), "hitta.se");
assert!(!provider.version().is_empty());
}
#[test]
fn requests_contain_the_number() {
let provider = load_provider();
let urls = provider.requests("0104754350").unwrap();
assert_eq!(urls, vec!["https://www.hitta.se/vem-ringde/0104754350"]);
}
#[test]
fn parse_roundtrips_a_fixture_through_wasm() {
let provider = load_provider();
let body = include_str!("../../../fixtures/hitta/0104754350.html").to_string();
let outcome = provider.parse(
"0104754350",
&[FetchedResponse { status: 200, body }],
);
let ParseOutcome::Ok(entry) = outcome else {
panic!("expected Ok entry, got {outcome:?}");
};
assert_eq!(entry.history, vec!["42 andra har rapporterat detta nummer"]);
assert_eq!(entry.comments.len(), 29);
assert_eq!(entry.comments[0].timestamp, Some(1547746162));
}
#[test]
fn parse_maps_no_data() {
let provider = load_provider();
let body = include_str!("../../../fixtures/hitta/0313908905.html").to_string();
let outcome = provider.parse(
"0313908905",
&[FetchedResponse { status: 200, body }],
);
assert!(matches!(outcome, ParseOutcome::NoData), "got {outcome:?}");
}
```
- [ ] **Step 3: Run the integration test**
Run: `cargo test -p whoareyou-server --test component`
Expected: 4 tests PASS. (If `0104754350.html` parse expectations changed in
Task 3 Step 6's contingency branch, mirror the same expectations here.)
- [ ] **Step 4: Commit**
```bash
git add crates/server/tests
git commit -m "test: prove WIT boundary with real component integration test"
```
---
### Task 11: Config + main wiring
**Files:**
- Create: `crates/server/src/config.rs`
- Modify: `crates/server/src/main.rs`, `crates/server/src/lib.rs`, `crates/server/Cargo.toml`
- [ ] **Step 1: Add binary dependencies**
Extend `crates/server/Cargo.toml` `[dependencies]`:
```toml
anyhow = "1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
```
- [ ] **Step 2: Write the failing config tests**
`crates/server/src/config.rs`, test module first:
```rust
#[cfg(test)]
mod tests {
use std::collections::HashMap;
use super::*;
fn env(pairs: &[(&str, &str)]) -> impl Fn(&str) -> Option<String> + '_ {
let map: HashMap<String, String> = pairs
.iter()
.map(|(k, v)| (k.to_string(), v.to_string()))
.collect();
move |key: &str| map.get(key).cloned()
}
#[test]
fn defaults_apply_when_unset() {
let config = AppConfig::from_lookup(env(&[])).unwrap();
assert_eq!(config.listen.to_string(), "127.0.0.1:8080");
assert_eq!(config.components_dir, std::path::PathBuf::from("components"));
assert_eq!(config.cache_ttl, std::time::Duration::from_secs(24 * 3600));
assert_eq!(config.fetch_timeout, std::time::Duration::from_secs(10));
}
#[test]
fn env_overrides_apply() {
let config = AppConfig::from_lookup(env(&[
("WHOAREYOU_LISTEN", "0.0.0.0:9000"),
("WHOAREYOU_COMPONENTS_DIR", "/opt/providers"),
("WHOAREYOU_CACHE_TTL_HOURS", "1"),
("WHOAREYOU_FETCH_TIMEOUT_SECS", "30"),
]))
.unwrap();
assert_eq!(config.listen.to_string(), "0.0.0.0:9000");
assert_eq!(config.components_dir, std::path::PathBuf::from("/opt/providers"));
assert_eq!(config.cache_ttl, std::time::Duration::from_secs(3600));
assert_eq!(config.fetch_timeout, std::time::Duration::from_secs(30));
}
#[test]
fn invalid_values_error() {
assert!(AppConfig::from_lookup(env(&[("WHOAREYOU_LISTEN", "not-an-addr")])).is_err());
assert!(AppConfig::from_lookup(env(&[("WHOAREYOU_CACHE_TTL_HOURS", "soon")])).is_err());
}
}
```
- [ ] **Step 3: Run tests to verify they fail**
Run: `cargo test -p whoareyou-server config`
Expected: COMPILE ERROR — `AppConfig` not defined. (First wire `pub mod config;` into `lib.rs`.)
- [ ] **Step 4: Implement config**
Prepend to `crates/server/src/config.rs`:
```rust
use std::net::SocketAddr;
use std::path::PathBuf;
use std::time::Duration;
use crate::error::ConfigError;
#[derive(Debug)]
pub struct AppConfig {
pub listen: SocketAddr,
pub components_dir: PathBuf,
pub cache_ttl: Duration,
pub fetch_timeout: Duration,
}
impl AppConfig {
pub fn from_env() -> Result<Self, ConfigError> {
Self::from_lookup(|key| std::env::var(key).ok())
}
pub fn from_lookup(get: impl Fn(&str) -> Option<String>) -> Result<Self, ConfigError> {
let listen = match get("WHOAREYOU_LISTEN") {
Some(value) => value.parse().map_err(|e| ConfigError::Invalid {
key: "WHOAREYOU_LISTEN".to_string(),
message: format!("{e}"),
})?,
None => SocketAddr::from(([127, 0, 0, 1], 8080)),
};
let components_dir = get("WHOAREYOU_COMPONENTS_DIR")
.map(PathBuf::from)
.unwrap_or_else(|| PathBuf::from("components"));
let cache_ttl_hours: u64 = parse_or("WHOAREYOU_CACHE_TTL_HOURS", &get, 24)?;
let fetch_timeout_secs: u64 = parse_or("WHOAREYOU_FETCH_TIMEOUT_SECS", &get, 10)?;
Ok(Self {
listen,
components_dir,
cache_ttl: Duration::from_secs(cache_ttl_hours * 3600),
fetch_timeout: Duration::from_secs(fetch_timeout_secs),
})
}
}
fn parse_or(
key: &str,
get: &impl Fn(&str) -> Option<String>,
default: u64,
) -> Result<u64, ConfigError> {
match get(key) {
Some(value) => value.parse().map_err(|e| ConfigError::Invalid {
key: key.to_string(),
message: format!("{e}"),
}),
None => Ok(default),
}
}
```
`crates/server/src/lib.rs` final state:
```rust
pub mod config;
pub mod error;
pub mod fetch;
pub mod http;
pub mod model;
pub mod service;
pub mod wasm;
```
- [ ] **Step 5: Run tests to verify they pass**
Run: `cargo test -p whoareyou-server config`
Expected: PASS.
- [ ] **Step 6: Write `main.rs`**
`crates/server/src/main.rs`:
```rust
use std::sync::Arc;
use anyhow::Context;
use tracing::info;
use tracing_subscriber::EnvFilter;
use whoareyou_server::config::AppConfig;
use whoareyou_server::fetch::ReqwestFetcher;
use whoareyou_server::service::{LookupService, ProviderHandle};
use whoareyou_server::{http, wasm};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
tracing_subscriber::fmt()
.with_env_filter(
EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info")),
)
.init();
let config = AppConfig::from_env()?;
let engine = wasm::engine()?;
let linker = wasm::linker(&engine)?;
wasm::spawn_epoch_thread(&engine);
let mut providers: Vec<Arc<dyn ProviderHandle>> = Vec::new();
let dir = std::fs::read_dir(&config.components_dir).with_context(|| {
format!("reading components dir {:?}", config.components_dir)
})?;
for entry in dir {
let path = entry?.path();
if path.extension().is_some_and(|ext| ext == "wasm") {
let provider = wasm::WasmProvider::load(&engine, &linker, &path)
.with_context(|| format!("loading component {path:?}"))?;
info!(
name = provider.name(),
version = provider.version(),
?path,
"loaded provider"
);
providers.push(Arc::new(provider));
}
}
anyhow::ensure!(
!providers.is_empty(),
"no .wasm components found in {:?}",
config.components_dir
);
let fetcher = Arc::new(ReqwestFetcher::new(config.fetch_timeout)?);
let service = Arc::new(LookupService::new(providers, fetcher, config.cache_ttl));
let app = http::router(service);
let listener = tokio::net::TcpListener::bind(config.listen).await?;
info!("listening on http://{}", config.listen);
axum::serve(listener, app).await?;
Ok(())
}
```
- [ ] **Step 7: Full workspace check + tests**
Run: `cargo test --workspace`
Expected: PASS.
- [ ] **Step 8: Smoke-test the real service (network)**
```bash
mkdir -p components
cp target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm components/hitta.wasm
cargo run -p whoareyou-server &
sleep 3
curl -s http://127.0.0.1:8080/healthz
curl -s "http://127.0.0.1:8080/api/v1/number/0104754350" | python3 -m json.tool
kill %1
```
Expected: `ok` from healthz; lookup returns JSON with a `results["hitta.se"]`
object whose `status` is one of `ok`/`no_data`/`parse_failed` (live site —
`parse_failed` here while the fixture tests pass means hitta.se serves
different markup to the server's User-Agent; if so, record it as a follow-up
issue, it does not block this task).
- [ ] **Step 9: Commit**
```bash
git add crates/server Cargo.lock
git commit -m "feat: wire config, component loading, and axum serve in main"
```
---
### Task 12: justfile, docs, cleanup
**Files:**
- Create: `justfile`
- Modify: `fetch-fixture`, `README.md`, `CLAUDE.md`
- [ ] **Step 1: Write the `justfile`**
```just
# Build provider components and copy them where the server looks.
build-components:
cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta
mkdir -p components
cp target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm components/hitta.wasm
# Full build: components first, then the server.
build: build-components
cargo build --release
# All tests (the integration test needs the built component).
test: build-components
cargo test --workspace
# Run the service locally.
run: build-components
cargo run -p whoareyou-server
fmt:
cargo +nightly fmt
lint:
cargo clippy --workspace
```
- [ ] **Step 2: Verify `just test` works end to end**
Run: `just test`
Expected: builds the component, all tests PASS.
- [ ] **Step 3: Trim `fetch-fixture` to live providers**
Replace `fetch-fixture` contents:
```bash
#!/bin/bash
# Refresh HTML fixtures for provider parser tests.
# Usage: ./fetch-fixture <number>
set -euo pipefail
curl -sL -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)" \
"https://www.hitta.se/vem-ringde/$1" \
-o "fixtures/hitta/$1.html"
echo "fixtures/hitta/$1.html: $(wc -c < "fixtures/hitta/$1.html") bytes"
```
- [ ] **Step 4: Rewrite `README.md`**
```markdown
# whoareyou
Who is calling me? A self-hosted HTTP service that looks up Swedish phone
numbers across reverse-lookup sites. Providers are sandboxed WASM components.
## Usage
```shell
$ just run
$ curl "http://127.0.0.1:8080/api/v1/number/0700000000"
```
## Configuration (env)
| Variable | Default |
|---|---|
| `WHOAREYOU_LISTEN` | `127.0.0.1:8080` |
| `WHOAREYOU_COMPONENTS_DIR` | `components` |
| `WHOAREYOU_CACHE_TTL_HOURS` | `24` |
| `WHOAREYOU_FETCH_TIMEOUT_SECS` | `10` |
## Development
```shell
$ rustup target add wasm32-wasip2
$ just test
```
Provider contract lives in `wit/provider.wit`. See
`docs/superpowers/specs/2026-06-05-wasm-provider-service-design.md`.
```
- [ ] **Step 5: Rewrite `CLAUDE.md`**
```markdown
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## What this is
A self-hosted HTTP service that looks up Swedish phone numbers ("who is
calling me?") by scraping reverse-lookup sites. Providers are WASM components
(Component Model / WASI p2) loaded from a directory at startup; the host does
all fetching and caching. Design spec:
`docs/superpowers/specs/2026-06-05-wasm-provider-service-design.md`.
## Commands
```bash
just test # build components + run all tests (preferred)
just run # build components + run the service
just build # release build of everything
cargo test -p whoareyou-provider-hitta # provider parser tests (native, no WASM)
cargo test -p whoareyou-server --test component # WIT-boundary integration test
cargo +nightly fmt # always nightly, not stable
cargo clippy --workspace
./fetch-fixture <number> # refresh an HTML fixture from hitta.se
```
The integration test needs the component built first — run via `just test`,
or `cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta`
before bare `cargo test`.
## Architecture
- `wit/provider.wit` — the provider contract (`metadata`/`requests`/`parse`).
Components are pure: no network, no filesystem. The HOST fetches URLs.
- `crates/providers/hitta` — parse logic in `parser.rs` is plain Rust,
unit-tested natively against `fixtures/hitta/*.html`; `component.rs` is
thin WIT glue, compiled only for `wasm32` (`cargo test` never touches WASM
here).
- `crates/server` — lib + thin bin. `service.rs` holds the `ProviderHandle` +
`Fetch` traits and `LookupService` (moka cache, TTL 24h, key
`provider:number`; fetch failures are NOT cached). `wasm.rs` implements
`ProviderHandle` over wasmtime (fresh Store per call, epoch deadline ≈5s).
`http.rs` is axum: `GET /api/v1/number/{number}`, `GET /healthz`.
## Gotchas
- Components build with plain `cargo build --target wasm32-wasip2` — no
cargo-component. Output name uses underscores:
`whoareyou_provider_hitta.wasm`; the justfile copies it to
`components/hitta.wasm` (gitignored).
- One provider failing maps to a per-provider `status` in the JSON response —
never a non-200 for the whole lookup. `parse_failed` in logs (WARN) means a
site changed its markup: refresh a fixture with `./fetch-fixture` and fix
the parser.
- `ParseError::NoData` vs `Failed`: a fetched page with no phone data is
NoData (normal); a page that doesn't match the expected structure is Failed
(scraper rot). Don't conflate them.
```
- [ ] **Step 6: Final verification**
Run: `just test && cargo clippy --workspace && cargo +nightly fmt -- --check`
Expected: tests pass, no clippy errors (warnings OK to fix or note), fmt clean.
- [ ] **Step 7: Commit**
```bash
git add justfile fetch-fixture README.md CLAUDE.md
git commit -m "docs: add justfile and rewrite README/CLAUDE.md for service architecture"
```
---
## Out of scope (per spec)
Container image · k8s/Pithos/CI · provider upload/enable-disable · more providers · host-fetch import for multi-step providers · lookup history / persistent cache · metrics.