Files
whoareyou/docs/superpowers/plans/2026-06-05-wasm-provider-service.md
T
2026-06-05 14:34:29 +02:00

62 KiB
Raw Blame History

WASM Provider Service Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Rebuild whoareyou as an async HTTP service that looks up Swedish phone numbers via WASM-component providers (hitta.se in v1), retiring the CLI.

Architecture: Cargo workspace with an axum server hosting wasmtime; providers are pure WASM components (WIT contract: metadata/requests/parse) — the host fetches all URLs and caches parsed results in moka. Provider parse logic is plain Rust, unit-tested natively against HTML fixtures; WIT glue is a thin cfg(wasm32) layer.

Tech Stack: Rust edition 2024 · tokio · axum 0.8 · reqwest 0.13 · moka 0.12 · wasmtime + wasmtime-wasi 45 · wit-bindgen 0.57 · thiserror 2 · tracing · insta 1.47

Spec: docs/superpowers/specs/2026-06-05-wasm-provider-service-design.md


File structure

whoareyou/
├── Cargo.toml                          # workspace (NEW)
├── justfile                            # build orchestration (NEW)
├── wit/provider.wit                    # provider contract (NEW)
├── crates/
│   ├── server/                         # package whoareyou-server (lib + bin)
│   │   ├── Cargo.toml
│   │   ├── src/lib.rs                  # module exports
│   │   ├── src/main.rs                 # wiring only
│   │   ├── src/config.rs               # env config
│   │   ├── src/error.rs                # HostError, FetchError, ConfigError
│   │   ├── src/model.rs                # Entry, Comment, ProviderResult, API types
│   │   ├── src/service.rs              # ProviderHandle + Fetch traits, LookupService
│   │   ├── src/fetch.rs                # ReqwestFetcher
│   │   ├── src/http.rs                 # axum router, normalize()
│   │   ├── src/wasm.rs                 # wasmtime host, WasmProvider
│   │   └── tests/component.rs          # loads the real .wasm
│   └── providers/hitta/                # package whoareyou-provider-hitta (cdylib+rlib)
│       ├── Cargo.toml
│       ├── src/lib.rs
│       ├── src/parser.rs               # pure parse logic + native tests
│       └── src/component.rs            # wit-bindgen glue (wasm32 only)
├── fixtures/hitta/*.html               # KEPT (+ one fresh fixture)
├── fetch-fixture                       # KEPT, trimmed to hitta
└── DELETED: src/, definitions/, _build.rs, NOTEPAD.md, old Cargo.toml contents

whoareyou-server is a lib + thin bin so tests/component.rs can use its modules.


Task 1: Workspace scaffold & demolition

Files:

  • Delete: src/, definitions/, _build.rs, NOTEPAD.md

  • Create: Cargo.toml (workspace), wit/provider.wit, crates/server/{Cargo.toml,src/lib.rs,src/main.rs}, crates/providers/hitta/{Cargo.toml,src/lib.rs}

  • Modify: .gitignore

  • Step 1: Install the wasm target

Run: rustup target add wasm32-wasip2 Expected: installs or "is up to date".

  • Step 2: Delete the old code
git rm -r src definitions _build.rs NOTEPAD.md

(The old hitta parser is reproduced in Task 3 — nothing needed from the deleted tree.)

  • Step 3: Write the workspace Cargo.toml (replaces the old package manifest)
[workspace]
resolver = "3"
members = ["crates/server", "crates/providers/hitta"]

[workspace.package]
version = "0.1.0"
edition = "2024"
authors = ["Anders Olsson <anders.e.olsson@gmail.com>"]
  • Step 4: Write wit/provider.wit
package whoareyou:provider@0.1.0;

interface lookup {
    record provider-info {
        name: string,
        version: string,
    }

    record request {
        url: string,
    }

    record response {
        status: u16,
        body: string,
    }

    record comment {
        timestamp: option<s64>,
        title: option<string>,
        message: string,
    }

    record entry {
        messages: list<string>,
        history: list<string>,
        comments: list<comment>,
    }

    variant lookup-error {
        no-data,
        parse-failed(string),
    }

    metadata: func() -> provider-info;
    requests: func(number: string) -> list<request>;
    parse: func(number: string, responses: list<response>) -> result<entry, lookup-error>;
}

world provider {
    export lookup;
}
  • Step 5: Create the server crate stub

crates/server/Cargo.toml:

[package]
name = "whoareyou-server"
version.workspace = true
edition.workspace = true
authors.workspace = true

[dependencies]

[dev-dependencies]

crates/server/src/lib.rs:

// modules added as they are implemented

crates/server/src/main.rs:

fn main() {}
  • Step 6: Create the hitta provider crate stub

crates/providers/hitta/Cargo.toml:

[package]
name = "whoareyou-provider-hitta"
version.workspace = true
edition.workspace = true
authors.workspace = true

[lib]
crate-type = ["cdylib", "rlib"]

[dependencies]

[dev-dependencies]

crates/providers/hitta/src/lib.rs:

// modules added as they are implemented
  • Step 7: Ignore the components dir

Append to .gitignore (create if missing):

components/
  • Step 8: Verify the workspace builds

Run: cargo check --workspace Expected: success (two empty crates). Cargo.lock regenerates — that's fine.

  • Step 9: Commit
git add -A
git commit -m "refactor!: replace CLI with workspace scaffold for WASM provider service"

Task 2: Refresh hitta fixture & audit page structure

The 2019 fixtures predate any hitta.se redesign. Before porting the parser, capture what the site serves today so Task 3 is written against reality.

Files:

  • Create: fixtures/hitta/fresh-0104754350.html

  • Step 1: Fetch a fresh copy of a known number's page

Run:

curl -sL -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)" \
  "https://www.hitta.se/vem-ringde/0104754350" \
  -o fixtures/hitta/fresh-0104754350.html
wc -c fixtures/hitta/fresh-0104754350.html

Expected: a non-trivial file (> 10 KB). If the response is a bot-block page (check with head -c 2000), retry with the http --follow (httpie) variant from fetch-fixture, or fetch the page in a real browser (View Source → save). The fixture MUST contain real page markup before continuing.

  • Step 2: Audit the page structure

Run:

grep -c "__NEXT_DATA__" fixtures/hitta/fresh-0104754350.html
grep -o '__NEXT_DATA__[^>]\{0,80\}' fixtures/hitta/fresh-0104754350.html | head -3

Two outcomes — record which one applies, it determines Step 3 of Task 3:

  • (a) __NEXT_DATA__ still present. Check whether it's still <script>__NEXT_DATA__ = {...};__NEXT_LOADED_PAGES__ (2019 inline style) or the modern <script id="__NEXT_DATA__" type="application/json">{...}</script> form. Note which.

  • (b) Gone entirely. Inspect the page (python3 -m json.tool on any embedded JSON, or read the HTML) and locate where phone data + comments live now. Write down the JSON path to: comments list, comment text, comment timestamp, and the statistics/"X others searched" text — Task 3's serde structs must be adapted to those paths (the shape of the parser — regex/JSON extraction → typed structs → ParsedEntry — stays identical).

  • Step 3: Commit the fixture

git add fixtures/hitta/fresh-0104754350.html
git commit -m "test: add fresh hitta.se fixture for parser port"

Task 3: hitta parser (pure logic, native TDD)

Port the old src/probe/hitta.rs parse logic (reproduced below) into the provider crate as plain functions. All tests run natively — no WASM involved.

Files:

  • Create: crates/providers/hitta/src/parser.rs

  • Modify: crates/providers/hitta/src/lib.rs, crates/providers/hitta/Cargo.toml

  • Step 1: Add dependencies

In crates/providers/hitta/Cargo.toml set:

[dependencies]
regex = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"

[dev-dependencies]
insta = { version = "1.47", features = ["yaml"] }
  • Step 2: Declare the module

crates/providers/hitta/src/lib.rs:

pub mod parser;
  • Step 3: Write the failing tests

Append to crates/providers/hitta/src/parser.rs (create the file with ONLY this test module first; the types/functions it references don't exist yet, that's the point):

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn requests_single_hitta_url() {
        assert_eq!(
            request_urls("0700000000"),
            vec!["https://www.hitta.se/vem-ringde/0700000000".to_string()]
        );
    }

    #[test]
    fn parses_number_with_comments() {
        let body = include_str!("../../../../fixtures/hitta/0104754350.html");
        let entry = parse(body).unwrap();

        assert_eq!(entry.messages, Vec::<String>::new());
        assert_eq!(entry.history, vec!["42 andra har rapporterat detta nummer"]);
        assert_eq!(entry.comments.len(), 29);

        // newest first
        let first = &entry.comments[0];
        assert_eq!(first.timestamp, Some(1547746162)); // 2019-01-17T17:29:22Z
        assert_eq!(first.title, None);
        assert_eq!(first.message, "Varmsälj från Folksam");
    }

    #[test]
    fn parses_number_with_history_only() {
        let body = include_str!("../../../../fixtures/hitta/0702269893.html");
        let entry = parse(body).unwrap();

        assert_eq!(entry.history, vec!["Tre andra har också sökt på detta nummer"]);
        assert!(entry.comments.is_empty());
    }

    #[test]
    fn no_phone_data_is_no_data() {
        let body = include_str!("../../../../fixtures/hitta/0313908905.html");
        assert_eq!(parse(body), Err(ParseError::NoData));
    }

    #[test]
    fn unparseable_page_is_failed() {
        let body = include_str!("../../../../fixtures/hitta/0701807618.html");
        assert!(matches!(parse(body), Err(ParseError::Failed(_))));
    }

    #[test]
    fn garbage_is_failed() {
        assert!(matches!(parse("<html></html>"), Err(ParseError::Failed(_))));
    }

    #[test]
    fn parses_fresh_fixture() {
        let body = include_str!("../../../../fixtures/hitta/fresh-0104754350.html");
        insta::assert_yaml_snapshot!(parse(body));
    }
}

Semantics note (differs from the old CLI): the old code returned Ok with an all-empty entry when JSON parsed but phoneData was absent. That is now Err(ParseError::NoData). Old fixtures 0313908905, 0751793426/83/99 fall in that bucket; 0701807618, 0546780862 fail the regex → Failed.

  • Step 4: Run tests to verify they fail

Run: cargo test -p whoareyou-provider-hitta Expected: COMPILE ERROR — request_urls, parse, ParseError not found.

  • Step 5: Implement the parser

Prepend to crates/providers/hitta/src/parser.rs (above the test module). This is the 2019 logic ported; if Task 2 found outcome (b) or the modern <script id="__NEXT_DATA__"> form, adapt NEXT_DATA_RE / the serde structs to the JSON paths recorded in Task 2 — keep the public surface (request_urls, parse, the three types) exactly as below:

use std::sync::LazyLock;

use regex::Regex;
use serde::{Deserialize, Serialize};

#[derive(Debug, PartialEq, Serialize)]
pub struct ParsedEntry {
    pub messages: Vec<String>,
    pub history: Vec<String>,
    pub comments: Vec<ParsedComment>,
}

#[derive(Debug, PartialEq, Serialize)]
pub struct ParsedComment {
    /// Unix epoch seconds, UTC.
    pub timestamp: Option<i64>,
    pub title: Option<String>,
    pub message: String,
}

#[derive(Debug, PartialEq, Serialize)]
pub enum ParseError {
    /// Page fetched and understood, but it contains no data for the number.
    NoData,
    /// Page structure did not match expectations — scraper rot signal.
    Failed(String),
}

static NEXT_DATA_RE: LazyLock<Regex> = LazyLock::new(|| {
    Regex::new(r"<script>__NEXT_DATA__ = (.*?);__NEXT_LOADED_PAGES__").unwrap()
});

#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct Data {
    props: Props,
}

#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct Props {
    page_props: PageProps,
}

#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct PageProps {
    phone_data: Option<PhoneData>,
}

#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct PhoneData {
    #[serde(default)]
    comments: Vec<RawComment>,
    statistics_text: String,
}

#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct RawComment {
    comment: String,
    /// Milliseconds since epoch.
    timestamp: u64,
}

pub fn request_urls(number: &str) -> Vec<String> {
    vec![format!("https://www.hitta.se/vem-ringde/{number}")]
}

pub fn parse(body: &str) -> Result<ParsedEntry, ParseError> {
    let captures = NEXT_DATA_RE
        .captures(body)
        .ok_or_else(|| ParseError::Failed("__NEXT_DATA__ not found".to_string()))?;

    let json = captures.get(1).unwrap().as_str();

    let data: Data = serde_json::from_str(json)
        .map_err(|e| ParseError::Failed(format!("deserialize __NEXT_DATA__: {e}")))?;

    let Some(phone_data) = data.props.page_props.phone_data else {
        return Err(ParseError::NoData);
    };

    let mut comments: Vec<ParsedComment> = phone_data
        .comments
        .into_iter()
        .map(|c| ParsedComment {
            timestamp: Some((c.timestamp / 1000) as i64),
            title: None,
            message: c.comment,
        })
        .collect();

    comments.sort_by(|a, b| b.timestamp.cmp(&a.timestamp));

    Ok(ParsedEntry {
        messages: Vec::new(),
        history: vec![phone_data.statistics_text],
        comments,
    })
}
  • Step 6: Run tests to verify they pass

Run: cargo test -p whoareyou-provider-hitta Expected: all pass except possibly parses_fresh_fixture (pending snapshot). If the fresh-fixture test FAILS to parse (Failed/NoData against a real page that visibly has data), the site changed — adapt the regex/structs per Task 2's notes until the fresh fixture parses, while keeping the 2019-fixture tests passing (if the old format is truly gone from the new code path, update those tests' expectations to Failed and note it in the commit message).

  • Step 7: Accept the fresh-fixture snapshot after eyeballing it

Run: cargo insta review (or cargo insta accept after inspecting the .snap.new file manually) Expected: snapshot under crates/providers/hitta/src/snapshots/ showing a plausible entry (or an honest NoData/Failed for a dead number — verify it matches what the fixture actually contains).

  • Step 8: Run the full test suite

Run: cargo test --workspace Expected: PASS.

  • Step 9: Commit
git add crates/providers/hitta .gitignore Cargo.lock
git commit -m "feat: port hitta.se parser as pure native-testable functions"

Task 4: hitta component glue (WIT export)

Files:

  • Create: crates/providers/hitta/src/component.rs

  • Modify: crates/providers/hitta/src/lib.rs, crates/providers/hitta/Cargo.toml

  • Step 1: Add wit-bindgen for wasm32 only

Append to crates/providers/hitta/Cargo.toml:

[target.'cfg(target_arch = "wasm32")'.dependencies]
wit-bindgen = "0.57"
  • Step 2: Write the glue

crates/providers/hitta/src/component.rs:

use crate::parser;

wit_bindgen::generate!({
    world: "provider",
    path: "../../../wit",
});

use exports::whoareyou::provider::lookup::{
    Comment, Entry, Guest, LookupError, ProviderInfo, Request, Response,
};

struct Component;

impl Guest for Component {
    fn metadata() -> ProviderInfo {
        ProviderInfo {
            name: "hitta.se".to_string(),
            version: env!("CARGO_PKG_VERSION").to_string(),
        }
    }

    fn requests(number: String) -> Vec<Request> {
        parser::request_urls(&number)
            .into_iter()
            .map(|url| Request { url })
            .collect()
    }

    fn parse(_number: String, responses: Vec<Response>) -> Result<Entry, LookupError> {
        let Some(first) = responses.first() else {
            return Err(LookupError::ParseFailed("no responses provided".to_string()));
        };

        match parser::parse(&first.body) {
            Ok(entry) => Ok(Entry {
                messages: entry.messages,
                history: entry.history,
                comments: entry
                    .comments
                    .into_iter()
                    .map(|c| Comment {
                        timestamp: c.timestamp,
                        title: c.title,
                        message: c.message,
                    })
                    .collect(),
            }),
            Err(parser::ParseError::NoData) => Err(LookupError::NoData),
            Err(parser::ParseError::Failed(msg)) => Err(LookupError::ParseFailed(msg)),
        }
    }
}

export!(Component);
  • Step 3: Gate it into the crate

crates/providers/hitta/src/lib.rs:

pub mod parser;

#[cfg(target_arch = "wasm32")]
mod component;
  • Step 4: Build the component

Run: cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta Expected: success; target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm exists.

  • Step 5: Verify native tests still pass

Run: cargo test -p whoareyou-provider-hitta Expected: PASS (glue is cfg'd out natively).

  • Step 6: Commit
git add crates/providers/hitta
git commit -m "feat: export hitta parser as a WASM component via wit-bindgen"

Task 5: Server model types

Files:

  • Create: crates/server/src/model.rs

  • Modify: crates/server/src/lib.rs, crates/server/Cargo.toml

  • Step 1: Add first server dependencies

In crates/server/Cargo.toml:

[dependencies]
serde = { version = "1", features = ["derive"] }

[dev-dependencies]
serde_json = "1"
  • Step 2: Write the failing test

crates/server/src/model.rs (test module only, types come next step):

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn provider_result_serializes_to_api_shape() {
        let ok = ProviderResult::Ok {
            entry: Entry {
                messages: vec![],
                history: vec!["42 andra".to_string()],
                comments: vec![Comment {
                    timestamp: Some(1547746162),
                    title: None,
                    message: "Varmsälj".to_string(),
                }],
            },
        };

        let json = serde_json::to_value(&ok).unwrap();
        assert_eq!(json["status"], "ok");
        assert_eq!(json["entry"]["history"][0], "42 andra");
        assert_eq!(json["entry"]["comments"][0]["timestamp"], 1547746162);

        assert_eq!(
            serde_json::to_value(&ProviderResult::NoData).unwrap()["status"],
            "no_data"
        );
        assert_eq!(
            serde_json::to_value(&ProviderResult::FetchFailed).unwrap()["status"],
            "fetch_failed"
        );
        assert_eq!(
            serde_json::to_value(&ProviderResult::ParseFailed).unwrap()["status"],
            "parse_failed"
        );
    }
}

crates/server/src/lib.rs:

pub mod model;
  • Step 3: Run test to verify it fails

Run: cargo test -p whoareyou-server Expected: COMPILE ERROR — types not defined.

  • Step 4: Implement the types

Prepend to crates/server/src/model.rs:

use std::collections::BTreeMap;

use serde::Serialize;

#[derive(Debug, Clone, PartialEq, Serialize)]
pub struct Entry {
    pub messages: Vec<String>,
    pub history: Vec<String>,
    pub comments: Vec<Comment>,
}

#[derive(Debug, Clone, PartialEq, Serialize)]
pub struct Comment {
    /// Unix epoch seconds, UTC.
    pub timestamp: Option<i64>,
    pub title: Option<String>,
    pub message: String,
}

/// Per-provider outcome as exposed in the API (and cached).
#[derive(Debug, Clone, PartialEq, Serialize)]
#[serde(tag = "status", rename_all = "snake_case")]
pub enum ProviderResult {
    Ok { entry: Entry },
    NoData,
    FetchFailed,
    ParseFailed,
}

/// A fetched HTTP response handed to a provider's `parse`.
#[derive(Debug, Clone)]
pub struct FetchedResponse {
    pub status: u16,
    pub body: String,
}

/// Outcome of a provider's `parse` call, before API mapping.
#[derive(Debug)]
pub enum ParseOutcome {
    Ok(Entry),
    NoData,
    Failed(String),
}

#[derive(Debug, Serialize)]
pub struct LookupResponse {
    pub number: String,
    pub results: BTreeMap<String, ProviderResult>,
}
  • Step 5: Run test to verify it passes

Run: cargo test -p whoareyou-server Expected: PASS.

  • Step 6: Commit
git add crates/server Cargo.lock
git commit -m "feat: add server model types and API serialization shape"

Task 6: Errors and fetcher

Files:

  • Create: crates/server/src/error.rs, crates/server/src/fetch.rs

  • Modify: crates/server/src/lib.rs, crates/server/Cargo.toml

  • Step 1: Add dependencies

Extend crates/server/Cargo.toml [dependencies]:

async-trait = "0.1"
reqwest = "0.13"
thiserror = "2"
tokio = { version = "1", features = ["full"] }
wasmtime = { version = "45", features = ["component-model"] }

(wasmtime is needed now because HostError wraps wasmtime::Error.)

  • Step 2: Write crates/server/src/error.rs
use thiserror::Error;

/// Errors from hosting/calling a WASM component.
#[derive(Debug, Error)]
pub enum HostError {
    #[error("wasm error: {0}")]
    Wasm(#[from] wasmtime::Error),
    #[error("io error: {0}")]
    Io(#[from] std::io::Error),
}

#[derive(Debug, Error)]
pub enum FetchError {
    #[error("request failed: {0}")]
    Request(#[from] reqwest::Error),
}

#[derive(Debug, Error)]
pub enum ConfigError {
    #[error("invalid value for {key}: {message}")]
    Invalid { key: String, message: String },
}
  • Step 3: Write crates/server/src/fetch.rs

The Fetch trait lives in service.rs (Task 7); to keep this task compiling standalone, define the trait there first — so this task only adds the implementation file with a stub trait import deferred. Simplest ordering: write fetch.rs now but leave it out of lib.rs until Task 7 wires it in.

crates/server/src/fetch.rs:

use std::time::Duration;

use async_trait::async_trait;

use crate::error::FetchError;
use crate::model::FetchedResponse;
use crate::service::Fetch;

pub struct ReqwestFetcher {
    client: reqwest::Client,
}

impl ReqwestFetcher {
    pub fn new(timeout: Duration) -> Result<Self, FetchError> {
        let client = reqwest::Client::builder()
            .timeout(timeout)
            .user_agent(concat!("whoareyou/", env!("CARGO_PKG_VERSION")))
            .build()?;

        Ok(Self { client })
    }
}

#[async_trait]
impl Fetch for ReqwestFetcher {
    async fn fetch(&self, url: &str) -> Result<FetchedResponse, FetchError> {
        let response = self.client.get(url).send().await?;
        let status = response.status().as_u16();
        let body = response.text().await?;

        Ok(FetchedResponse { status, body })
    }
}
  • Step 4: Wire only error into lib.rs

crates/server/src/lib.rs:

pub mod error;
pub mod model;
  • Step 5: Verify it compiles

Run: cargo check -p whoareyou-server Expected: success (fetch.rs is not yet a module, so its crate::service import is not compiled).

  • Step 6: Commit
git add crates/server Cargo.lock
git commit -m "feat: add server error types and reqwest fetcher"

Task 7: LookupService (orchestration + cache, TDD)

Files:

  • Create: crates/server/src/service.rs

  • Modify: crates/server/src/lib.rs, crates/server/Cargo.toml

  • Step 1: Add dependencies

Extend crates/server/Cargo.toml [dependencies]:

futures = "0.3"
moka = { version = "0.12", features = ["future"] }
tracing = "0.1"
  • Step 2: Write the failing tests

crates/server/src/service.rs, test module first:

#[cfg(test)]
mod tests {
    use std::sync::Arc;
    use std::sync::atomic::{AtomicUsize, Ordering};
    use std::time::Duration;

    use async_trait::async_trait;

    use super::*;
    use crate::error::{FetchError, HostError};
    use crate::model::{Comment, Entry, FetchedResponse, ParseOutcome, ProviderResult};

    fn entry() -> Entry {
        Entry {
            messages: vec![],
            history: vec!["history".to_string()],
            comments: vec![Comment {
                timestamp: Some(1547746162),
                title: None,
                message: "spam".to_string(),
            }],
        }
    }

    /// Provider whose parse outcome is scripted per call.
    struct FakeProvider {
        name: &'static str,
        outcome: fn() -> ParseOutcome,
    }

    impl ProviderHandle for FakeProvider {
        fn name(&self) -> &str {
            self.name
        }

        fn requests(&self, number: &str) -> Result<Vec<String>, HostError> {
            Ok(vec![format!("https://example.test/{number}")])
        }

        fn parse(
            &self,
            _number: &str,
            _responses: &[FetchedResponse],
        ) -> ParseOutcome {
            (self.outcome)()
        }
    }

    /// Fetcher that counts calls and can be told to fail.
    struct FakeFetcher {
        calls: AtomicUsize,
        fail: bool,
    }

    impl FakeFetcher {
        fn new(fail: bool) -> Self {
            Self { calls: AtomicUsize::new(0), fail }
        }
    }

    #[async_trait]
    impl Fetch for FakeFetcher {
        async fn fetch(&self, _url: &str) -> Result<FetchedResponse, FetchError> {
            self.calls.fetch_add(1, Ordering::SeqCst);

            if self.fail {
                // construct a real reqwest error by failing a bad URL... instead
                // keep FetchError easy to fabricate via a connection refused on a
                // reserved port? No — simplest: add a test-only variant? Use
                // reqwest from an invalid builder is convoluted. See note below.
                unreachable!("replaced in Step 4");
            }

            Ok(FetchedResponse { status: 200, body: "body".to_string() })
        }
    }

    fn service(
        providers: Vec<Arc<dyn ProviderHandle>>,
        fetcher: Arc<dyn Fetch>,
    ) -> LookupService {
        LookupService::new(providers, fetcher, Duration::from_secs(60))
    }

    #[tokio::test]
    async fn ok_result_is_returned_and_cached() {
        let provider = Arc::new(FakeProvider {
            name: "fake.se",
            outcome: || ParseOutcome::Ok(entry()),
        });
        let fetcher = Arc::new(FakeFetcher::new(false));
        let svc = service(vec![provider], fetcher.clone());

        let results = svc.lookup("0700000000").await;
        assert_eq!(results["fake.se"], ProviderResult::Ok { entry: entry() });

        // second lookup served from cache — fetcher not called again
        let results = svc.lookup("0700000000").await;
        assert_eq!(results["fake.se"], ProviderResult::Ok { entry: entry() });
        assert_eq!(fetcher.calls.load(Ordering::SeqCst), 1);
    }

    #[tokio::test]
    async fn no_data_is_cached() {
        let provider = Arc::new(FakeProvider { name: "fake.se", outcome: || ParseOutcome::NoData });
        let fetcher = Arc::new(FakeFetcher::new(false));
        let svc = service(vec![provider], fetcher.clone());

        assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::NoData);
        assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::NoData);
        assert_eq!(fetcher.calls.load(Ordering::SeqCst), 1);
    }

    #[tokio::test]
    async fn parse_failure_maps_and_is_cached() {
        let provider = Arc::new(FakeProvider {
            name: "fake.se",
            outcome: || ParseOutcome::Failed("rot".to_string()),
        });
        let fetcher = Arc::new(FakeFetcher::new(false));
        let svc = service(vec![provider], fetcher.clone());

        assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::ParseFailed);
        assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::ParseFailed);
        assert_eq!(fetcher.calls.load(Ordering::SeqCst), 1);
    }

    #[tokio::test]
    async fn fetch_failure_is_not_cached() {
        let provider = Arc::new(FakeProvider {
            name: "fake.se",
            outcome: || ParseOutcome::NoData,
        });
        let fetcher = Arc::new(FakeFetcher::new(true));
        let svc = service(vec![provider], fetcher.clone());

        assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::FetchFailed);
        assert_eq!(svc.lookup("0700000000").await["fake.se"], ProviderResult::FetchFailed);
        // NOT cached: fetcher tried twice
        assert_eq!(fetcher.calls.load(Ordering::SeqCst), 2);
    }

    #[tokio::test]
    async fn multiple_providers_keyed_by_name() {
        let a = Arc::new(FakeProvider { name: "a.se", outcome: || ParseOutcome::NoData });
        let b = Arc::new(FakeProvider {
            name: "b.se",
            outcome: || ParseOutcome::Ok(entry()),
        });
        let fetcher = Arc::new(FakeFetcher::new(false));
        let svc = service(vec![a, b], fetcher);

        let results = svc.lookup("0700000000").await;
        assert_eq!(results.len(), 2);
        assert_eq!(results["a.se"], ProviderResult::NoData);
        assert!(matches!(results["b.se"], ProviderResult::Ok { .. }));
    }
}

Fabricating a FetchError in tests: reqwest::Error cannot be constructed directly. Make the fail path real instead of fabricated — in Step 4's implementation of FakeFetcher::fetch, replace the unreachable! with an actual failing request against a closed local port:

if self.fail {
    let err = reqwest::Client::new()
        .get("http://127.0.0.1:1/unreachable")
        .send()
        .await
        .unwrap_err();
    return Err(FetchError::Request(err));
}

(Port 1 is never listening; connection is refused immediately — no external network involved.)

  • Step 3: Run tests to verify they fail

Run: cargo test -p whoareyou-server Expected: COMPILE ERROR — ProviderHandle, Fetch, LookupService not defined.

  • Step 4: Implement the service

Prepend to crates/server/src/service.rs (and fix the FakeFetcher fail path as noted above):

use std::collections::BTreeMap;
use std::sync::Arc;
use std::time::Duration;

use async_trait::async_trait;
use moka::future::Cache;
use tracing::warn;

use crate::error::{FetchError, HostError};
use crate::model::{FetchedResponse, ParseOutcome, ProviderResult};

/// A loaded provider. Implemented by `wasm::WasmProvider`; faked in tests.
/// Methods are sync — WASM calls are CPU-bound; the service wraps them in
/// `spawn_blocking`.
pub trait ProviderHandle: Send + Sync {
    fn name(&self) -> &str;
    fn requests(&self, number: &str) -> Result<Vec<String>, HostError>;
    fn parse(&self, number: &str, responses: &[FetchedResponse]) -> ParseOutcome;
}

#[async_trait]
pub trait Fetch: Send + Sync {
    async fn fetch(&self, url: &str) -> Result<FetchedResponse, FetchError>;
}

pub struct LookupService {
    providers: Vec<Arc<dyn ProviderHandle>>,
    fetcher: Arc<dyn Fetch>,
    cache: Cache<String, ProviderResult>,
}

impl LookupService {
    pub fn new(
        providers: Vec<Arc<dyn ProviderHandle>>,
        fetcher: Arc<dyn Fetch>,
        cache_ttl: Duration,
    ) -> Self {
        Self {
            providers,
            fetcher,
            cache: Cache::builder().time_to_live(cache_ttl).build(),
        }
    }

    pub fn provider_names(&self) -> Vec<&str> {
        self.providers.iter().map(|p| p.name()).collect()
    }

    /// Run all providers concurrently; one result per provider name.
    pub async fn lookup(&self, number: &str) -> BTreeMap<String, ProviderResult> {
        let tasks = self.providers.iter().map(|provider| {
            let provider = provider.clone();
            let fetcher = self.fetcher.clone();
            let cache = self.cache.clone();
            let number = number.to_string();

            async move {
                let name = provider.name().to_string();
                let key = format!("{name}:{number}");

                if let Some(hit) = cache.get(&key).await {
                    return (name, hit);
                }

                let result = run_provider(provider, &number, fetcher).await;

                // Transient failures must not poison the cache.
                if result != ProviderResult::FetchFailed {
                    cache.insert(key, result.clone()).await;
                }

                (name, result)
            }
        });

        futures::future::join_all(tasks).await.into_iter().collect()
    }
}

async fn run_provider(
    provider: Arc<dyn ProviderHandle>,
    number: &str,
    fetcher: Arc<dyn Fetch>,
) -> ProviderResult {
    let name = provider.name().to_string();

    let urls = {
        let provider = provider.clone();
        let number = number.to_string();

        match tokio::task::spawn_blocking(move || provider.requests(&number)).await {
            Ok(Ok(urls)) => urls,
            Ok(Err(error)) => {
                warn!(provider = %name, %error, "requests() failed");
                return ProviderResult::ParseFailed;
            }
            Err(error) => {
                warn!(provider = %name, %error, "requests() panicked");
                return ProviderResult::ParseFailed;
            }
        }
    };

    let fetched = futures::future::join_all(urls.iter().map(|url| fetcher.fetch(url))).await;

    let mut responses = Vec::with_capacity(fetched.len());

    for result in fetched {
        match result {
            Ok(response) => responses.push(response),
            Err(error) => {
                warn!(provider = %name, %error, "fetch failed");
                return ProviderResult::FetchFailed;
            }
        }
    }

    let outcome = {
        let provider = provider.clone();
        let number = number.to_string();

        tokio::task::spawn_blocking(move || provider.parse(&number, &responses)).await
    };

    match outcome {
        Ok(ParseOutcome::Ok(entry)) => ProviderResult::Ok { entry },
        Ok(ParseOutcome::NoData) => ProviderResult::NoData,
        Ok(ParseOutcome::Failed(message)) => {
            warn!(provider = %name, %message, "parse failed — scraper rot?");
            ProviderResult::ParseFailed
        }
        Err(error) => {
            warn!(provider = %name, %error, "parse() panicked");
            ProviderResult::ParseFailed
        }
    }
}
  • Step 5: Wire modules into lib.rs

crates/server/src/lib.rs:

pub mod error;
pub mod fetch;
pub mod model;
pub mod service;
  • Step 6: Run tests to verify they pass

Run: cargo test -p whoareyou-server Expected: PASS (all five service tests + model test).

  • Step 7: Commit
git add crates/server Cargo.lock
git commit -m "feat: add LookupService with moka cache and provider orchestration"

Task 8: HTTP layer (axum, TDD)

Files:

  • Create: crates/server/src/http.rs

  • Modify: crates/server/src/lib.rs, crates/server/Cargo.toml

  • Step 1: Add dependencies

Extend crates/server/Cargo.toml:

[dependencies]
# add:
axum = "0.8"
serde_json = "1"

[dev-dependencies]
# add:
http-body-util = "0.1"
tower = { version = "0.5", features = ["util"] }

(serde_json moves from dev-dependencies to dependencies — remove the dev entry.)

  • Step 2: Write the failing tests

crates/server/src/http.rs, test module first:

#[cfg(test)]
mod tests {
    use std::sync::Arc;
    use std::time::Duration;

    use async_trait::async_trait;
    use axum::body::Body;
    use axum::http::{Request, StatusCode};
    use http_body_util::BodyExt;
    use tower::ServiceExt;

    use super::*;
    use crate::error::{FetchError, HostError};
    use crate::model::{FetchedResponse, ParseOutcome};
    use crate::service::{Fetch, LookupService, ProviderHandle};

    struct NoDataProvider;

    impl ProviderHandle for NoDataProvider {
        fn name(&self) -> &str {
            "fake.se"
        }

        fn requests(&self, number: &str) -> Result<Vec<String>, HostError> {
            Ok(vec![format!("https://example.test/{number}")])
        }

        fn parse(&self, _: &str, _: &[FetchedResponse]) -> ParseOutcome {
            ParseOutcome::NoData
        }
    }

    struct StaticFetcher;

    #[async_trait]
    impl Fetch for StaticFetcher {
        async fn fetch(&self, _: &str) -> Result<FetchedResponse, FetchError> {
            Ok(FetchedResponse { status: 200, body: String::new() })
        }
    }

    fn app() -> axum::Router {
        let service = LookupService::new(
            vec![Arc::new(NoDataProvider)],
            Arc::new(StaticFetcher),
            Duration::from_secs(60),
        );

        router(Arc::new(service))
    }

    #[test]
    fn normalize_strips_separators() {
        assert_eq!(normalize("0700 00-00.00"), Some("0700000000".to_string()));
        assert_eq!(normalize("+46701234567"), Some("+46701234567".to_string()));
    }

    #[test]
    fn normalize_rejects_garbage() {
        assert_eq!(normalize("not-a-number"), None);
        assert_eq!(normalize(""), None);
        assert_eq!(normalize("0"), None);
        assert_eq!(normalize("07001231231231231231"), None); // > 15 digits
        assert_eq!(normalize("070+123"), None); // '+' not at start
    }

    #[tokio::test]
    async fn lookup_returns_results_keyed_by_provider() {
        let response = app()
            .oneshot(
                Request::builder()
                    .uri("/api/v1/number/0700 00-00 00")
                    .body(Body::empty())
                    .unwrap(),
            )
            .await
            .unwrap();

        assert_eq!(response.status(), StatusCode::OK);

        let bytes = response.into_body().collect().await.unwrap().to_bytes();
        let json: serde_json::Value = serde_json::from_slice(&bytes).unwrap();

        assert_eq!(json["number"], "0700000000");
        assert_eq!(json["results"]["fake.se"]["status"], "no_data");
    }

    #[tokio::test]
    async fn invalid_number_is_400() {
        let response = app()
            .oneshot(
                Request::builder()
                    .uri("/api/v1/number/banana")
                    .body(Body::empty())
                    .unwrap(),
            )
            .await
            .unwrap();

        assert_eq!(response.status(), StatusCode::BAD_REQUEST);
    }

    #[tokio::test]
    async fn healthz_is_ok() {
        let response = app()
            .oneshot(Request::builder().uri("/healthz").body(Body::empty()).unwrap())
            .await
            .unwrap();

        assert_eq!(response.status(), StatusCode::OK);
    }
}
  • Step 3: Run tests to verify they fail

Run: cargo test -p whoareyou-server Expected: COMPILE ERROR — router, normalize not defined.

  • Step 4: Implement the HTTP layer

Prepend to crates/server/src/http.rs:

use std::sync::Arc;

use axum::Json;
use axum::Router;
use axum::extract::{Path, State};
use axum::http::StatusCode;
use axum::response::{IntoResponse, Response};
use axum::routing::get;
use serde_json::json;

use crate::model::LookupResponse;
use crate::service::LookupService;

pub fn router(service: Arc<LookupService>) -> Router {
    Router::new()
        .route("/api/v1/number/{number}", get(lookup_number))
        .route("/healthz", get(|| async { "ok" }))
        .with_state(service)
}

async fn lookup_number(
    State(service): State<Arc<LookupService>>,
    Path(raw): Path<String>,
) -> Response {
    let Some(number) = normalize(&raw) else {
        return (
            StatusCode::BAD_REQUEST,
            Json(json!({ "error": "invalid phone number" })),
        )
            .into_response();
    };

    let results = service.lookup(&number).await;

    Json(LookupResponse { number, results }).into_response()
}

/// Strip separators and validate: optional leading '+', then 215 digits.
pub fn normalize(raw: &str) -> Option<String> {
    let cleaned: String = raw
        .chars()
        .filter(|c| !matches!(c, ' ' | '-' | '.'))
        .collect();

    let digits = cleaned.strip_prefix('+').unwrap_or(&cleaned);

    let valid = (2..=15).contains(&digits.len())
        && digits.chars().all(|c| c.is_ascii_digit());

    valid.then_some(cleaned)
}
  • Step 5: Wire the module

crates/server/src/lib.rs:

pub mod error;
pub mod fetch;
pub mod http;
pub mod model;
pub mod service;
  • Step 6: Run tests to verify they pass

Run: cargo test -p whoareyou-server Expected: PASS.

  • Step 7: Commit
git add crates/server Cargo.lock
git commit -m "feat: add axum HTTP layer with lookup endpoint and healthz"

Task 9: wasmtime host (WasmProvider)

Files:

  • Create: crates/server/src/wasm.rs

  • Modify: crates/server/src/lib.rs, crates/server/Cargo.toml

  • Step 1: Add wasmtime-wasi

Extend crates/server/Cargo.toml [dependencies]:

wasmtime-wasi = "45"
  • Step 2: Write crates/server/src/wasm.rs

API-drift note: the WasiView/WasiCtxView shape below matches recent wasmtime-wasi releases as of this plan's writing. If cargo check disagrees, consult https://docs.rs/wasmtime-wasi/45 — the intent is fixed: a store data struct holding WasiCtx + ResourceTable, WASI added to the linker sync, no preopens / no env / no inherited stdio. Adapt mechanically; do not change the public surface of this module.

use std::path::Path;

use wasmtime::component::{Component, Linker};
use wasmtime::{Config, Engine, Store};
use wasmtime_wasi::ResourceTable;
use wasmtime_wasi::p2::{WasiCtx, WasiCtxBuilder, WasiCtxView, WasiView};

use crate::error::HostError;
use crate::model::{Comment, Entry, FetchedResponse, ParseOutcome};
use crate::service::ProviderHandle;

wasmtime::component::bindgen!({
    world: "provider",
    path: "../../wit",
});

use exports::whoareyou::provider::lookup::{LookupError as WitLookupError, Response as WitResponse};

/// How many epoch ticks a guest call may run. The epoch thread ticks every
/// 100 ms → 50 ticks ≈ 5 s budget per call.
const EPOCH_DEADLINE_TICKS: u64 = 50;
pub const EPOCH_TICK: std::time::Duration = std::time::Duration::from_millis(100);

pub struct HostState {
    ctx: WasiCtx,
    table: ResourceTable,
}

impl WasiView for HostState {
    fn ctx(&mut self) -> WasiCtxView<'_> {
        WasiCtxView { ctx: &mut self.ctx, table: &mut self.table }
    }
}

pub fn engine() -> Result<Engine, HostError> {
    let mut config = Config::new();
    config.epoch_interruption(true);

    Ok(Engine::new(&config)?)
}

pub fn linker(engine: &Engine) -> Result<Linker<HostState>, HostError> {
    let mut linker = Linker::new(engine);
    wasmtime_wasi::p2::add_to_linker_sync(&mut linker)?;

    Ok(linker)
}

/// Spawn the thread that advances the engine epoch so runaway guest calls
/// trap instead of hanging the service. Call once at startup.
pub fn spawn_epoch_thread(engine: &Engine) {
    let engine = engine.clone();

    std::thread::spawn(move || {
        loop {
            std::thread::sleep(EPOCH_TICK);
            engine.increment_epoch();
        }
    });
}

pub struct WasmProvider {
    name: String,
    version: String,
    engine: Engine,
    pre: ProviderPre<HostState>,
}

impl WasmProvider {
    /// Compile a component from disk and read its metadata once.
    /// Fails fast if the component does not satisfy the provider world.
    pub fn load(
        engine: &Engine,
        linker: &Linker<HostState>,
        path: &Path,
    ) -> Result<Self, HostError> {
        let component = Component::from_file(engine, path)?;
        let pre = ProviderPre::new(linker.instantiate_pre(&component)?)?;

        let mut provider = Self {
            name: String::new(),
            version: String::new(),
            engine: engine.clone(),
            pre,
        };

        let mut store = provider.new_store();
        let instance = provider.pre.instantiate(&mut store)?;
        let info = instance.whoareyou_provider_lookup().call_metadata(&mut store)?;

        provider.name = info.name;
        provider.version = info.version;

        Ok(provider)
    }

    pub fn version(&self) -> &str {
        &self.version
    }

    fn new_store(&self) -> Store<HostState> {
        // No preopens, no env, no inherited stdio — fully sandboxed guest.
        let ctx = WasiCtxBuilder::new().build();
        let mut store = Store::new(
            &self.engine,
            HostState { ctx, table: ResourceTable::new() },
        );

        store.set_epoch_deadline(EPOCH_DEADLINE_TICKS);

        store
    }
}

impl ProviderHandle for WasmProvider {
    fn name(&self) -> &str {
        &self.name
    }

    fn requests(&self, number: &str) -> Result<Vec<String>, HostError> {
        let mut store = self.new_store();
        let instance = self.pre.instantiate(&mut store)?;

        let requests = instance
            .whoareyou_provider_lookup()
            .call_requests(&mut store, number)?;

        Ok(requests.into_iter().map(|r| r.url).collect())
    }

    fn parse(&self, number: &str, responses: &[FetchedResponse]) -> ParseOutcome {
        let wit_responses: Vec<WitResponse> = responses
            .iter()
            .map(|r| WitResponse { status: r.status, body: r.body.clone() })
            .collect();

        let mut store = self.new_store();

        let result = (|| {
            let instance = self.pre.instantiate(&mut store)?;

            instance
                .whoareyou_provider_lookup()
                .call_parse(&mut store, number, &wit_responses)
        })();

        match result {
            Ok(Ok(entry)) => ParseOutcome::Ok(Entry {
                messages: entry.messages,
                history: entry.history,
                comments: entry
                    .comments
                    .into_iter()
                    .map(|c| Comment {
                        timestamp: c.timestamp,
                        title: c.title,
                        message: c.message,
                    })
                    .collect(),
            }),
            Ok(Err(WitLookupError::NoData)) => ParseOutcome::NoData,
            Ok(Err(WitLookupError::ParseFailed(message))) => ParseOutcome::Failed(message),
            // Trap (incl. epoch deadline exceeded) or instantiation failure.
            Err(error) => ParseOutcome::Failed(format!("component error: {error}")),
        }
    }
}
  • Step 3: Wire the module

crates/server/src/lib.rs:

pub mod error;
pub mod fetch;
pub mod http;
pub mod model;
pub mod service;
pub mod wasm;
  • Step 4: Verify it compiles (adapt API drift here if needed)

Run: cargo check -p whoareyou-server Expected: success. If WasiView/WasiCtxView/add_to_linker_sync signatures drifted in wasmtime-wasi 45, fix per the docs.rs note above and re-check.

  • Step 5: Run all tests

Run: cargo test -p whoareyou-server Expected: PASS (no new tests — real coverage lands in Task 10's integration test).

  • Step 6: Commit
git add crates/server Cargo.lock
git commit -m "feat: add wasmtime host with epoch-bounded WasmProvider"

Task 10: Component integration test

Proves the WIT boundary end-to-end: the real .wasm built from Task 4, loaded by the real host from Task 9.

Files:

  • Create: crates/server/tests/component.rs

  • Step 1: Build the component

Run: cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta Expected: target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm exists.

  • Step 2: Write the integration test

crates/server/tests/component.rs:

use std::path::Path;

use whoareyou_server::model::{FetchedResponse, ParseOutcome};
use whoareyou_server::service::ProviderHandle;
use whoareyou_server::wasm;

const COMPONENT_PATH: &str = concat!(
    env!("CARGO_MANIFEST_DIR"),
    "/../../target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm"
);

fn load_provider() -> wasm::WasmProvider {
    let path = Path::new(COMPONENT_PATH);

    assert!(
        path.exists(),
        "hitta component not built — run `just build-components` first"
    );

    let engine = wasm::engine().unwrap();
    let linker = wasm::linker(&engine).unwrap();
    wasm::spawn_epoch_thread(&engine);

    wasm::WasmProvider::load(&engine, &linker, path).unwrap()
}

#[test]
fn metadata_identifies_hitta() {
    let provider = load_provider();

    assert_eq!(provider.name(), "hitta.se");
    assert!(!provider.version().is_empty());
}

#[test]
fn requests_contain_the_number() {
    let provider = load_provider();
    let urls = provider.requests("0104754350").unwrap();

    assert_eq!(urls, vec!["https://www.hitta.se/vem-ringde/0104754350"]);
}

#[test]
fn parse_roundtrips_a_fixture_through_wasm() {
    let provider = load_provider();
    let body = include_str!("../../../fixtures/hitta/0104754350.html").to_string();

    let outcome = provider.parse(
        "0104754350",
        &[FetchedResponse { status: 200, body }],
    );

    let ParseOutcome::Ok(entry) = outcome else {
        panic!("expected Ok entry, got {outcome:?}");
    };

    assert_eq!(entry.history, vec!["42 andra har rapporterat detta nummer"]);
    assert_eq!(entry.comments.len(), 29);
    assert_eq!(entry.comments[0].timestamp, Some(1547746162));
}

#[test]
fn parse_maps_no_data() {
    let provider = load_provider();
    let body = include_str!("../../../fixtures/hitta/0313908905.html").to_string();

    let outcome = provider.parse(
        "0313908905",
        &[FetchedResponse { status: 200, body }],
    );

    assert!(matches!(outcome, ParseOutcome::NoData), "got {outcome:?}");
}
  • Step 3: Run the integration test

Run: cargo test -p whoareyou-server --test component Expected: 4 tests PASS. (If 0104754350.html parse expectations changed in Task 3 Step 6's contingency branch, mirror the same expectations here.)

  • Step 4: Commit
git add crates/server/tests
git commit -m "test: prove WIT boundary with real component integration test"

Task 11: Config + main wiring

Files:

  • Create: crates/server/src/config.rs

  • Modify: crates/server/src/main.rs, crates/server/src/lib.rs, crates/server/Cargo.toml

  • Step 1: Add binary dependencies

Extend crates/server/Cargo.toml [dependencies]:

anyhow = "1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
  • Step 2: Write the failing config tests

crates/server/src/config.rs, test module first:

#[cfg(test)]
mod tests {
    use std::collections::HashMap;

    use super::*;

    fn env(pairs: &[(&str, &str)]) -> impl Fn(&str) -> Option<String> + '_ {
        let map: HashMap<String, String> = pairs
            .iter()
            .map(|(k, v)| (k.to_string(), v.to_string()))
            .collect();

        move |key: &str| map.get(key).cloned()
    }

    #[test]
    fn defaults_apply_when_unset() {
        let config = AppConfig::from_lookup(env(&[])).unwrap();

        assert_eq!(config.listen.to_string(), "127.0.0.1:8080");
        assert_eq!(config.components_dir, std::path::PathBuf::from("components"));
        assert_eq!(config.cache_ttl, std::time::Duration::from_secs(24 * 3600));
        assert_eq!(config.fetch_timeout, std::time::Duration::from_secs(10));
    }

    #[test]
    fn env_overrides_apply() {
        let config = AppConfig::from_lookup(env(&[
            ("WHOAREYOU_LISTEN", "0.0.0.0:9000"),
            ("WHOAREYOU_COMPONENTS_DIR", "/opt/providers"),
            ("WHOAREYOU_CACHE_TTL_HOURS", "1"),
            ("WHOAREYOU_FETCH_TIMEOUT_SECS", "30"),
        ]))
        .unwrap();

        assert_eq!(config.listen.to_string(), "0.0.0.0:9000");
        assert_eq!(config.components_dir, std::path::PathBuf::from("/opt/providers"));
        assert_eq!(config.cache_ttl, std::time::Duration::from_secs(3600));
        assert_eq!(config.fetch_timeout, std::time::Duration::from_secs(30));
    }

    #[test]
    fn invalid_values_error() {
        assert!(AppConfig::from_lookup(env(&[("WHOAREYOU_LISTEN", "not-an-addr")])).is_err());
        assert!(AppConfig::from_lookup(env(&[("WHOAREYOU_CACHE_TTL_HOURS", "soon")])).is_err());
    }
}
  • Step 3: Run tests to verify they fail

Run: cargo test -p whoareyou-server config Expected: COMPILE ERROR — AppConfig not defined. (First wire pub mod config; into lib.rs.)

  • Step 4: Implement config

Prepend to crates/server/src/config.rs:

use std::net::SocketAddr;
use std::path::PathBuf;
use std::time::Duration;

use crate::error::ConfigError;

#[derive(Debug)]
pub struct AppConfig {
    pub listen: SocketAddr,
    pub components_dir: PathBuf,
    pub cache_ttl: Duration,
    pub fetch_timeout: Duration,
}

impl AppConfig {
    pub fn from_env() -> Result<Self, ConfigError> {
        Self::from_lookup(|key| std::env::var(key).ok())
    }

    pub fn from_lookup(get: impl Fn(&str) -> Option<String>) -> Result<Self, ConfigError> {
        let listen = match get("WHOAREYOU_LISTEN") {
            Some(value) => value.parse().map_err(|e| ConfigError::Invalid {
                key: "WHOAREYOU_LISTEN".to_string(),
                message: format!("{e}"),
            })?,
            None => SocketAddr::from(([127, 0, 0, 1], 8080)),
        };

        let components_dir = get("WHOAREYOU_COMPONENTS_DIR")
            .map(PathBuf::from)
            .unwrap_or_else(|| PathBuf::from("components"));

        let cache_ttl_hours: u64 = parse_or("WHOAREYOU_CACHE_TTL_HOURS", &get, 24)?;
        let fetch_timeout_secs: u64 = parse_or("WHOAREYOU_FETCH_TIMEOUT_SECS", &get, 10)?;

        Ok(Self {
            listen,
            components_dir,
            cache_ttl: Duration::from_secs(cache_ttl_hours * 3600),
            fetch_timeout: Duration::from_secs(fetch_timeout_secs),
        })
    }
}

fn parse_or(
    key: &str,
    get: &impl Fn(&str) -> Option<String>,
    default: u64,
) -> Result<u64, ConfigError> {
    match get(key) {
        Some(value) => value.parse().map_err(|e| ConfigError::Invalid {
            key: key.to_string(),
            message: format!("{e}"),
        }),
        None => Ok(default),
    }
}

crates/server/src/lib.rs final state:

pub mod config;
pub mod error;
pub mod fetch;
pub mod http;
pub mod model;
pub mod service;
pub mod wasm;
  • Step 5: Run tests to verify they pass

Run: cargo test -p whoareyou-server config Expected: PASS.

  • Step 6: Write main.rs

crates/server/src/main.rs:

use std::sync::Arc;

use anyhow::Context;
use tracing::info;
use tracing_subscriber::EnvFilter;

use whoareyou_server::config::AppConfig;
use whoareyou_server::fetch::ReqwestFetcher;
use whoareyou_server::service::{LookupService, ProviderHandle};
use whoareyou_server::{http, wasm};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    tracing_subscriber::fmt()
        .with_env_filter(
            EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info")),
        )
        .init();

    let config = AppConfig::from_env()?;

    let engine = wasm::engine()?;
    let linker = wasm::linker(&engine)?;
    wasm::spawn_epoch_thread(&engine);

    let mut providers: Vec<Arc<dyn ProviderHandle>> = Vec::new();

    let dir = std::fs::read_dir(&config.components_dir).with_context(|| {
        format!("reading components dir {:?}", config.components_dir)
    })?;

    for entry in dir {
        let path = entry?.path();

        if path.extension().is_some_and(|ext| ext == "wasm") {
            let provider = wasm::WasmProvider::load(&engine, &linker, &path)
                .with_context(|| format!("loading component {path:?}"))?;

            info!(
                name = provider.name(),
                version = provider.version(),
                ?path,
                "loaded provider"
            );

            providers.push(Arc::new(provider));
        }
    }

    anyhow::ensure!(
        !providers.is_empty(),
        "no .wasm components found in {:?}",
        config.components_dir
    );

    let fetcher = Arc::new(ReqwestFetcher::new(config.fetch_timeout)?);
    let service = Arc::new(LookupService::new(providers, fetcher, config.cache_ttl));
    let app = http::router(service);

    let listener = tokio::net::TcpListener::bind(config.listen).await?;
    info!("listening on http://{}", config.listen);

    axum::serve(listener, app).await?;

    Ok(())
}
  • Step 7: Full workspace check + tests

Run: cargo test --workspace Expected: PASS.

  • Step 8: Smoke-test the real service (network)
mkdir -p components
cp target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm components/hitta.wasm
cargo run -p whoareyou-server &
sleep 3
curl -s http://127.0.0.1:8080/healthz
curl -s "http://127.0.0.1:8080/api/v1/number/0104754350" | python3 -m json.tool
kill %1

Expected: ok from healthz; lookup returns JSON with a results["hitta.se"] object whose status is one of ok/no_data/parse_failed (live site — parse_failed here while the fixture tests pass means hitta.se serves different markup to the server's User-Agent; if so, record it as a follow-up issue, it does not block this task).

  • Step 9: Commit
git add crates/server Cargo.lock
git commit -m "feat: wire config, component loading, and axum serve in main"

Task 12: justfile, docs, cleanup

Files:

  • Create: justfile

  • Modify: fetch-fixture, README.md, CLAUDE.md

  • Step 1: Write the justfile

# Build provider components and copy them where the server looks.
build-components:
    cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta
    mkdir -p components
    cp target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm components/hitta.wasm

# Full build: components first, then the server.
build: build-components
    cargo build --release

# All tests (the integration test needs the built component).
test: build-components
    cargo test --workspace

# Run the service locally.
run: build-components
    cargo run -p whoareyou-server

fmt:
    cargo +nightly fmt

lint:
    cargo clippy --workspace
  • Step 2: Verify just test works end to end

Run: just test Expected: builds the component, all tests PASS.

  • Step 3: Trim fetch-fixture to live providers

Replace fetch-fixture contents:

#!/bin/bash
# Refresh HTML fixtures for provider parser tests.
# Usage: ./fetch-fixture <number>

set -euo pipefail

curl -sL -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)" \
  "https://www.hitta.se/vem-ringde/$1" \
  -o "fixtures/hitta/$1.html"

echo "fixtures/hitta/$1.html: $(wc -c < "fixtures/hitta/$1.html") bytes"
  • Step 4: Rewrite README.md
# whoareyou

Who is calling me? A self-hosted HTTP service that looks up Swedish phone
numbers across reverse-lookup sites. Providers are sandboxed WASM components.

## Usage

```shell
$ just run
$ curl "http://127.0.0.1:8080/api/v1/number/0700000000"

Configuration (env)

Variable Default
WHOAREYOU_LISTEN 127.0.0.1:8080
WHOAREYOU_COMPONENTS_DIR components
WHOAREYOU_CACHE_TTL_HOURS 24
WHOAREYOU_FETCH_TIMEOUT_SECS 10

Development

$ rustup target add wasm32-wasip2
$ just test

Provider contract lives in wit/provider.wit. See docs/superpowers/specs/2026-06-05-wasm-provider-service-design.md.


- [ ] **Step 5: Rewrite `CLAUDE.md`**

```markdown
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## What this is

A self-hosted HTTP service that looks up Swedish phone numbers ("who is
calling me?") by scraping reverse-lookup sites. Providers are WASM components
(Component Model / WASI p2) loaded from a directory at startup; the host does
all fetching and caching. Design spec:
`docs/superpowers/specs/2026-06-05-wasm-provider-service-design.md`.

## Commands

```bash
just test                  # build components + run all tests (preferred)
just run                   # build components + run the service
just build                 # release build of everything
cargo test -p whoareyou-provider-hitta      # provider parser tests (native, no WASM)
cargo test -p whoareyou-server --test component   # WIT-boundary integration test
cargo +nightly fmt         # always nightly, not stable
cargo clippy --workspace
./fetch-fixture <number>   # refresh an HTML fixture from hitta.se

The integration test needs the component built first — run via just test, or cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta before bare cargo test.

Architecture

  • wit/provider.wit — the provider contract (metadata/requests/parse). Components are pure: no network, no filesystem. The HOST fetches URLs.
  • crates/providers/hitta — parse logic in parser.rs is plain Rust, unit-tested natively against fixtures/hitta/*.html; component.rs is thin WIT glue, compiled only for wasm32 (cargo test never touches WASM here).
  • crates/server — lib + thin bin. service.rs holds the ProviderHandle + Fetch traits and LookupService (moka cache, TTL 24h, key provider:number; fetch failures are NOT cached). wasm.rs implements ProviderHandle over wasmtime (fresh Store per call, epoch deadline ≈5s). http.rs is axum: GET /api/v1/number/{number}, GET /healthz.

Gotchas

  • Components build with plain cargo build --target wasm32-wasip2 — no cargo-component. Output name uses underscores: whoareyou_provider_hitta.wasm; the justfile copies it to components/hitta.wasm (gitignored).
  • One provider failing maps to a per-provider status in the JSON response — never a non-200 for the whole lookup. parse_failed in logs (WARN) means a site changed its markup: refresh a fixture with ./fetch-fixture and fix the parser.
  • ParseError::NoData vs Failed: a fetched page with no phone data is NoData (normal); a page that doesn't match the expected structure is Failed (scraper rot). Don't conflate them.

- [ ] **Step 6: Final verification**

Run: `just test && cargo clippy --workspace && cargo +nightly fmt -- --check`
Expected: tests pass, no clippy errors (warnings OK to fix or note), fmt clean.

- [ ] **Step 7: Commit**

```bash
git add justfile fetch-fixture README.md CLAUDE.md
git commit -m "docs: add justfile and rewrite README/CLAUDE.md for service architecture"

Out of scope (per spec)

Container image · k8s/Pithos/CI · provider upload/enable-disable · more providers · host-fetch import for multi-step providers · lookup history / persistent cache · metrics.