diff --git a/CLAUDE.md b/CLAUDE.md index b11f23b..1cde3c2 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,64 +4,57 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## What this is -A CLI that looks up Swedish phone numbers ("who is calling me?") by scraping -reverse-lookup sites. Old codebase: Rust edition 2018, reqwest 0.9 (synchronous -API), insta 0.11. +A self-hosted HTTP service that looks up Swedish phone numbers ("who is +calling me?") by scraping reverse-lookup sites. Providers are WASM components +(Component Model / WASI p2) loaded from a directory at startup; the host does +all fetching and caching. Design spec: +`docs/superpowers/specs/2026-06-05-wasm-provider-service-design.md`. ## Commands ```bash -cargo build -cargo run -- 0700000000 # query a number (hitta.se built in) -cargo run -- -d definitions/vem_ringde.toml 0700000000 # add TOML-defined probes -cargo run -- -o 0700000000 # open probe URLs in browser (macOS `open`) - -cargo test # all tests (insta snapshot tests) -cargo test probe::hitta # one module -cargo test test_0104754350 # one test - -cargo +nightly fmt # always nightly, not stable -cargo clippy +just test # build components + run all tests (preferred) +just run # build components + run the service +just build # release build of everything +cargo test -p whoareyou-provider-hitta # provider parser tests (native, no WASM) +cargo test -p whoareyou-server --test component # WIT-boundary integration test +cargo +nightly fmt # always nightly, not stable +cargo clippy --workspace +./fetch-fixture # refresh an HTML fixture from hitta.se ``` -Tests are inline-snapshot tests (`assert_yaml_snapshot!(..., @r###"..."###)`) -against checked-in HTML fixtures in `fixtures//.html` — no -network needed. Refresh/add fixtures with `./fetch-fixture ` (requires -`http`/httpie); it fetches the number from all five sites into `fixtures/`. +The integration test needs the component built first — run via `just test`, +or `cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta` +before bare `cargo test`. ## Architecture -Everything revolves around the `Probe` trait (`src/probe.rs`): `provider()`, -`uri(number)`, `fetch(number)`, `parse(html) -> Result`. - -Two kinds of probes: - -1. **Hard-coded**: `Hitta` (`src/probe/hitta.rs`) — extracts the - `__NEXT_DATA__` JSON blob via regex and deserializes it with serde. Always - registered in `main.rs`. -2. **Declarative**: `Definition` (`src/definition.rs`) — generic scraper - configured by a TOML file (`definitions/*.toml`) with CSS selectors for - `messages`, `history`, and `comments` (each comment has optional - `date_time`/`title`/`message` sub-selectors). The URL `path` is a - tinytemplate string with `{ number }`. Loaded at runtime via `-d`. - -Flow in `main.rs`: build probe list → for each probe, check the cache -(`Context` in `src/context.rs`, bincode files under the platform cache dir -with a 1-day TTL) → otherwise `fetch()` and cache → `parse()` into an `Entry` -(`src/entry.rs`) → `Display` it. +- `wit/provider.wit` — the provider contract (`metadata`/`requests`/`parse`). + Components are pure: no network, no filesystem. The HOST fetches URLs. +- `crates/providers/hitta` — parse logic in `parser.rs` is plain Rust, + unit-tested natively against `fixtures/hitta/*.html`; `component.rs` is + thin WIT glue, compiled only for `wasm32` (`cargo test` never touches WASM + here). hitta.se serves Next.js App Router pages — data lives in RSC flight + payloads (`self.__next_f.push`), NOT `__NEXT_DATA__` (that's the dead 2019 + format kept in old fixtures as a Failed-path regression case). +- `crates/server` — lib + thin bin. `service.rs` holds the `ProviderHandle` + + `Fetch` traits and `LookupService` (moka cache, TTL 24h, key + `provider:number`; fetch failures are NOT cached). `wasm.rs` implements + `ProviderHandle` over wasmtime (fresh Store per call, epoch deadline ≈5s — + `spawn_epoch_thread` must run once at startup or runaway guests hang + instead of trapping). `http.rs` is axum: `GET /api/v1/number/{number}`, + `GET /healthz`. ## Gotchas -- `src/probe/{eniro,konsument_info,telefonforsaljare,vem_ringde}.rs` are - **orphaned** — `probe.rs` only declares `mod hitta;`. Those providers were - superseded by the TOML definitions in `definitions/`. Don't "fix" them or - expect them to compile; they're kept as reference. -- `_build.rs` is intentionally disabled (underscore prefix, not referenced in - Cargo.toml) — an abandoned attempt at generating fixture tests. -- `definitions/vem_ringde.yml` is an experimental YAML variant of the TOML - definition, but `main.rs` only parses TOML (`toml::from_slice`). -- The `Filter` enum in `src/definition.rs` has no variants yet — `filters` is - parsed from definitions but unimplemented (commented-out loops in `parse`). -- insta 0.11 is old: the macro is `assert_yaml_snapshot!` and inline-snapshot - updates need a matching old `cargo-insta`; it's usually easier to update the - inline `@r###"..."###` literals by hand. +- Components build with plain `cargo build --target wasm32-wasip2` — no + cargo-component. Output name uses underscores: + `whoareyou_provider_hitta.wasm`; the justfile copies it to + `components/hitta.wasm` (gitignored). +- One provider failing maps to a per-provider `status` in the JSON response — + never a non-200 for the whole lookup. `parse_failed` in logs (WARN) means a + site changed its markup: refresh a fixture with `./fetch-fixture` and fix + the parser. +- `ParseError::NoData` vs `Failed`: a fetched page with no phone data is + NoData (normal); a page that doesn't match the expected structure is Failed + (scraper rot). Don't conflate them. diff --git a/README.md b/README.md index 8c754da..b70296d 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,30 @@ # whoareyou -Who is calling me? +Who is calling me? A self-hosted HTTP service that looks up Swedish phone +numbers across reverse-lookup sites. Providers are sandboxed WASM components. ## Usage ```shell -$ whoareyou 0700000000 +$ just run +$ curl "http://127.0.0.1:8080/api/v1/number/0700000000" ``` -## Todo +## Configuration (env) -Almost everything. I will add stuff when I need stuff. But hey, if you found this project and want to use it. Fork it, change it, create a PR, and I will add it :) +| Variable | Default | +|---|---| +| `WHOAREYOU_LISTEN` | `127.0.0.1:8080` | +| `WHOAREYOU_COMPONENTS_DIR` | `components` | +| `WHOAREYOU_CACHE_TTL_HOURS` | `24` | +| `WHOAREYOU_FETCH_TIMEOUT_SECS` | `10` | -- [x] Add flag to open url for probes in browser (easier for debugging) -- [x] Probe should return and Result, so we don't print a new line for empty result -- [x] Add logging -- [ ] List cache entries -- [ ] Clear cache entries -- [ ] Add some nice colors, so it's easier to read the output. -- [x] Add tests for probes. +## Development + +```shell +$ rustup target add wasm32-wasip2 +$ just test +``` + +Provider contract lives in `wit/provider.wit`. See +`docs/superpowers/specs/2026-06-05-wasm-provider-service-design.md`. diff --git a/fetch-fixture b/fetch-fixture index bfc4efc..fbaae90 100755 --- a/fetch-fixture +++ b/fetch-fixture @@ -1,8 +1,11 @@ #!/bin/bash +# Refresh HTML fixtures for provider parser tests. +# Usage: ./fetch-fixture +set -euo pipefail -http --follow GET "https://gulasidorna.eniro.se/hitta:$1" > "fixtures/eniro/$1.html" -http --follow GET "https://www.hitta.se/vem-ringde/$1" > "fixtures/hitta/$1.html" -http --follow GET "http://konsumentinfo.se/telefonnummer/sverige/$1" > "fixtures/konsumentinfo/$1.html" -http --follow GET "http://telefonforsaljare.nu/telefonnummer/$1/" > "fixtures/telefonforsaljare/$1.html" -http --follow GET "http://vemringde.se/?q=$1" > "fixtures/vemringde/$1.html" +curl -sL -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)" \ + "https://www.hitta.se/vem-ringde/$1" \ + -o "fixtures/hitta/$1.html" + +echo "fixtures/hitta/$1.html: $(wc -c < "fixtures/hitta/$1.html") bytes" diff --git a/justfile b/justfile new file mode 100644 index 0000000..43bd5b1 --- /dev/null +++ b/justfile @@ -0,0 +1,23 @@ +# Build provider components and copy them where the server looks. +build-components: + cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta + mkdir -p components + cp target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm components/hitta.wasm + +# Full build: components first, then the server. +build: build-components + cargo build --release + +# All tests (the integration test needs the built component). +test: build-components + cargo test --workspace + +# Run the service locally. +run: build-components + cargo run -p whoareyou-server + +fmt: + cargo +nightly fmt + +lint: + cargo clippy --workspace