Compare commits

..

19 Commits

Author SHA1 Message Date
logaritmisk 4a22c795ea polish: drop dead provider_names, clarify status mapping, README prereq
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 15:37:59 +02:00
logaritmisk 6babaa0166 docs: add justfile and rewrite README/CLAUDE.md for service architecture
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 15:34:21 +02:00
logaritmisk 86c4440576 feat: wire config, component loading, and axum serve in main
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 15:32:10 +02:00
logaritmisk 7747ffbc20 test: prove WIT boundary with real component integration test
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 15:25:36 +02:00
logaritmisk eeec821af2 feat: add wasmtime host with epoch-bounded WasmProvider
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 15:23:11 +02:00
logaritmisk 58f4bd4fdf feat: add axum HTTP layer with lookup endpoint and healthz
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 15:19:18 +02:00
logaritmisk 0880198b3c test: prove provider panic containment and isolation 2026-06-05 15:16:37 +02:00
logaritmisk 1a33317b6d feat: add LookupService with moka cache and provider orchestration
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 15:13:28 +02:00
logaritmisk 9f3ff2633c feat: add server error types and reqwest fetcher 2026-06-05 15:10:11 +02:00
logaritmisk 86b196c2d8 feat: add server model types and API serialization shape 2026-06-05 15:08:30 +02:00
logaritmisk 09f05b8c23 feat: reject non-200 responses in hitta component with precise error 2026-06-05 15:07:11 +02:00
logaritmisk 9c4493c1a4 feat: export hitta parser as a WASM component via wit-bindgen
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 15:02:57 +02:00
logaritmisk a5896e046c refactor: address review — sort_by_key and optional comment timestamps 2026-06-05 15:01:14 +02:00
logaritmisk 4980beec0a feat: add hitta.se flight-data parser as pure native-testable functions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 14:54:04 +02:00
logaritmisk 896333254a test: add second fresh hitta.se fixture (low-activity number)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 14:48:14 +02:00
logaritmisk 3804901237 test: add fresh hitta.se fixture for parser port
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 14:45:52 +02:00
logaritmisk de0b0d9280 refactor!: replace CLI with workspace scaffold for WASM provider service 2026-06-05 14:37:03 +02:00
logaritmisk f8555722af docs: add implementation plan for WASM provider service
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 14:34:29 +02:00
logaritmisk 4093c344be docs: add v1 design spec — WASM provider service
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 14:24:01 +02:00
46 changed files with 6914 additions and 3497 deletions
+1
View File
@@ -1,3 +1,4 @@
/target
*.pending-snap
components/
+42 -49
View File
@@ -4,64 +4,57 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## What this is
A CLI that looks up Swedish phone numbers ("who is calling me?") by scraping
reverse-lookup sites. Old codebase: Rust edition 2018, reqwest 0.9 (synchronous
API), insta 0.11.
A self-hosted HTTP service that looks up Swedish phone numbers ("who is
calling me?") by scraping reverse-lookup sites. Providers are WASM components
(Component Model / WASI p2) loaded from a directory at startup; the host does
all fetching and caching. Design spec:
`docs/superpowers/specs/2026-06-05-wasm-provider-service-design.md`.
## Commands
```bash
cargo build
cargo run -- 0700000000 # query a number (hitta.se built in)
cargo run -- -d definitions/vem_ringde.toml 0700000000 # add TOML-defined probes
cargo run -- -o 0700000000 # open probe URLs in browser (macOS `open`)
cargo test # all tests (insta snapshot tests)
cargo test probe::hitta # one module
cargo test test_0104754350 # one test
cargo +nightly fmt # always nightly, not stable
cargo clippy
just test # build components + run all tests (preferred)
just run # build components + run the service
just build # release build of everything
cargo test -p whoareyou-provider-hitta # provider parser tests (native, no WASM)
cargo test -p whoareyou-server --test component # WIT-boundary integration test
cargo +nightly fmt # always nightly, not stable
cargo clippy --workspace
./fetch-fixture <number> # refresh an HTML fixture from hitta.se
```
Tests are inline-snapshot tests (`assert_yaml_snapshot!(..., @r###"..."###)`)
against checked-in HTML fixtures in `fixtures/<provider>/<number>.html` — no
network needed. Refresh/add fixtures with `./fetch-fixture <number>` (requires
`http`/httpie); it fetches the number from all five sites into `fixtures/`.
The integration test needs the component built first — run via `just test`,
or `cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta`
before bare `cargo test`.
## Architecture
Everything revolves around the `Probe` trait (`src/probe.rs`): `provider()`,
`uri(number)`, `fetch(number)`, `parse(html) -> Result<Entry, ()>`.
Two kinds of probes:
1. **Hard-coded**: `Hitta` (`src/probe/hitta.rs`) — extracts the
`__NEXT_DATA__` JSON blob via regex and deserializes it with serde. Always
registered in `main.rs`.
2. **Declarative**: `Definition` (`src/definition.rs`) — generic scraper
configured by a TOML file (`definitions/*.toml`) with CSS selectors for
`messages`, `history`, and `comments` (each comment has optional
`date_time`/`title`/`message` sub-selectors). The URL `path` is a
tinytemplate string with `{ number }`. Loaded at runtime via `-d`.
Flow in `main.rs`: build probe list → for each probe, check the cache
(`Context` in `src/context.rs`, bincode files under the platform cache dir
with a 1-day TTL) → otherwise `fetch()` and cache → `parse()` into an `Entry`
(`src/entry.rs`) → `Display` it.
- `wit/provider.wit` — the provider contract (`metadata`/`requests`/`parse`).
Components are pure: no network, no filesystem. The HOST fetches URLs.
- `crates/providers/hitta` — parse logic in `parser.rs` is plain Rust,
unit-tested natively against `fixtures/hitta/*.html`; `component.rs` is
thin WIT glue, compiled only for `wasm32` (`cargo test` never touches WASM
here). hitta.se serves Next.js App Router pages — data lives in RSC flight
payloads (`self.__next_f.push`), NOT `__NEXT_DATA__` (that's the dead 2019
format kept in old fixtures as a Failed-path regression case).
- `crates/server` — lib + thin bin. `service.rs` holds the `ProviderHandle` +
`Fetch` traits and `LookupService` (moka cache, TTL 24h, key
`provider:number`; fetch failures are NOT cached). `wasm.rs` implements
`ProviderHandle` over wasmtime (fresh Store per call, epoch deadline ≈5s
`spawn_epoch_thread` must run once at startup or runaway guests hang
instead of trapping). `http.rs` is axum: `GET /api/v1/number/{number}`,
`GET /healthz`.
## Gotchas
- `src/probe/{eniro,konsument_info,telefonforsaljare,vem_ringde}.rs` are
**orphaned**`probe.rs` only declares `mod hitta;`. Those providers were
superseded by the TOML definitions in `definitions/`. Don't "fix" them or
expect them to compile; they're kept as reference.
- `_build.rs` is intentionally disabled (underscore prefix, not referenced in
Cargo.toml) — an abandoned attempt at generating fixture tests.
- `definitions/vem_ringde.yml` is an experimental YAML variant of the TOML
definition, but `main.rs` only parses TOML (`toml::from_slice`).
- The `Filter` enum in `src/definition.rs` has no variants yet — `filters` is
parsed from definitions but unimplemented (commented-out loops in `parse`).
- insta 0.11 is old: the macro is `assert_yaml_snapshot!` and inline-snapshot
updates need a matching old `cargo-insta`; it's usually easier to update the
inline `@r###"..."###` literals by hand.
- Components build with plain `cargo build --target wasm32-wasip2` — no
cargo-component. Output name uses underscores:
`whoareyou_provider_hitta.wasm`; the justfile copies it to
`components/hitta.wasm` (gitignored).
- One provider failing maps to a per-provider `status` in the JSON response —
never a non-200 for the whole lookup. `parse_failed` in logs (WARN) means a
site changed its markup: refresh a fixture with `./fetch-fixture` and fix
the parser.
- `ParseError::NoData` vs `Failed`: a fetched page with no phone data is
NoData (normal); a page that doesn't match the expected structure is Failed
(scraper rot). Don't conflate them.
Generated
+2789 -1473
View File
File diff suppressed because it is too large Load Diff
+6 -24
View File
@@ -1,26 +1,8 @@
[package]
name = "whoareyou"
[workspace]
resolver = "3"
members = ["crates/server", "crates/providers/hitta"]
[workspace.package]
version = "0.1.0"
edition = "2024"
authors = ["Anders Olsson <anders.e.olsson@gmail.com>"]
edition = "2018"
[dependencies]
bincode = "1.1"
chrono = { version = "0.4", features = ["serde"] }
chrono-tz = "0.5"
directories = "2.0"
fern = { version = "0.5", features = ["colored"] }
htmlescape = "0.3"
lazy_static = "1.4"
log = "0.4"
regex = "1.3"
reqwest = "0.9"
scraper = "0.10"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
structopt = "0.3"
tinytemplate = "1.0"
toml = "0.5"
[dev-dependencies]
insta = "0.11"
View File
+21 -11
View File
@@ -1,21 +1,31 @@
# whoareyou
Who is calling me?
Who is calling me? A self-hosted HTTP service that looks up Swedish phone
numbers across reverse-lookup sites. Providers are sandboxed WASM components.
## Usage
```shell
$ whoareyou 0700000000
$ rustup target add wasm32-wasip2 # once
$ just run
$ curl "http://127.0.0.1:8080/api/v1/number/0700000000"
```
## Todo
## Configuration (env)
Almost everything. I will add stuff when I need stuff. But hey, if you found this project and want to use it. Fork it, change it, create a PR, and I will add it :)
| Variable | Default |
|---|---|
| `WHOAREYOU_LISTEN` | `127.0.0.1:8080` |
| `WHOAREYOU_COMPONENTS_DIR` | `components` |
| `WHOAREYOU_CACHE_TTL_HOURS` | `24` |
| `WHOAREYOU_FETCH_TIMEOUT_SECS` | `10` |
- [x] Add flag to open url for probes in browser (easier for debugging)
- [x] Probe should return and Result, so we don't print a new line for empty result
- [x] Add logging
- [ ] List cache entries
- [ ] Clear cache entries
- [ ] Add some nice colors, so it's easier to read the output.
- [x] Add tests for probes.
## Development
```shell
$ rustup target add wasm32-wasip2
$ just test
```
Provider contract lives in `wit/provider.wit`. See
`docs/superpowers/specs/2026-06-05-wasm-provider-service-design.md`.
-47
View File
@@ -1,47 +0,0 @@
use std::env;
use std::fs::read_dir;
use std::fs::DirEntry;
use std::fs::File;
use std::io::Write;
use std::path::Path;
fn main() {
let out_dir = env::var("OUT_DIR").unwrap();
let destination = Path::new(&out_dir).join("tests.rs");
let mut test_file = File::create(&destination).unwrap();
// write_header(&mut test_file);
// let test_data_directories = read_dir("./tests/data/").unwrap();
/*
for directory in test_data_directories {
write_test(&mut test_file, &directory.unwrap());
}
*/
}
fn write_header(test_file: &mut File) {
write!(
test_file,
r#"
use insta::assert_yaml_snapshot_matches;
use whoareyou::*;
"#
)
.unwrap();
}
fn write_test(test_file: &mut File, directory: &DirEntry) {
let directory = directory.path().canonicalize().unwrap();
let path = directory.display();
let test_name = format!("prefix_if_needed_{}", directory.file_name().unwrap().to_string_lossy());
write!(
test_file,
include_str!("./tests/test_template"),
name = test_name,
path = path
)
.unwrap();
}
+19
View File
@@ -0,0 +1,19 @@
[package]
name = "whoareyou-provider-hitta"
version.workspace = true
edition.workspace = true
authors.workspace = true
[lib]
crate-type = ["cdylib", "rlib"]
[dependencies]
regex = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
[target.'cfg(target_arch = "wasm32")'.dependencies]
wit-bindgen = "0.57"
[dev-dependencies]
insta = { version = "1.47", features = ["yaml"] }
+63
View File
@@ -0,0 +1,63 @@
use crate::parser;
wit_bindgen::generate!({
world: "provider",
path: "../../../wit",
});
use exports::whoareyou::provider::lookup::{
Comment, Entry, Guest, LookupError, ProviderInfo, Request, Response,
};
struct Component;
impl Guest for Component {
fn metadata() -> ProviderInfo {
ProviderInfo {
name: "hitta.se".to_string(),
version: env!("CARGO_PKG_VERSION").to_string(),
}
}
fn requests(number: String) -> Vec<Request> {
parser::request_urls(&number)
.into_iter()
.map(|url| Request { url })
.collect()
}
fn parse(_number: String, responses: Vec<Response>) -> Result<Entry, LookupError> {
let Some(first) = responses.first() else {
return Err(LookupError::ParseFailed(
"no responses provided".to_string(),
));
};
if first.status != 200 {
return Err(LookupError::ParseFailed(format!(
"unexpected HTTP status {}",
first.status
)));
}
match parser::parse(&first.body) {
Ok(entry) => Ok(Entry {
messages: entry.messages,
history: entry.history,
comments: entry
.comments
.into_iter()
.map(|c| Comment {
timestamp: c.timestamp,
title: c.title,
message: c.message,
})
.collect(),
}),
Err(parser::ParseError::NoData) => Err(LookupError::NoData),
Err(parser::ParseError::Failed(msg)) => Err(LookupError::ParseFailed(msg)),
}
}
}
export!(Component);
+4
View File
@@ -0,0 +1,4 @@
pub mod parser;
#[cfg(target_arch = "wasm32")]
mod component;
+212
View File
@@ -0,0 +1,212 @@
use std::sync::LazyLock;
use regex::Regex;
use serde::{Deserialize, Serialize};
static FLIGHT_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r#"(?s)self\.__next_f\.push\(\[1,"(.*?)"\]\)\s*</script>"#)
.expect("FLIGHT_RE is valid")
});
#[derive(Debug, PartialEq, Serialize)]
pub struct ParsedEntry {
pub messages: Vec<String>,
pub history: Vec<String>,
pub comments: Vec<ParsedComment>,
}
#[derive(Debug, PartialEq, Serialize)]
pub struct ParsedComment {
/// Unix epoch seconds, UTC.
pub timestamp: Option<i64>,
pub title: Option<String>,
pub message: String,
}
#[derive(Debug, PartialEq, Serialize)]
pub enum ParseError {
/// Page fetched and understood, but contains no data for the number.
NoData,
/// Page structure did not match expectations — scraper rot signal.
Failed(String),
}
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
struct Statistics {
#[serde(default)]
comments: Vec<RawComment>,
statistics_text: String,
}
#[derive(Debug, Deserialize)]
struct RawComment {
comment: String,
timestamp: Option<u64>,
}
pub fn request_urls(number: &str) -> Vec<String> {
vec![format!("https://www.hitta.se/vem-ringde/{number}")]
}
pub fn parse(body: &str) -> Result<ParsedEntry, ParseError> {
let captures: Vec<&str> = FLIGHT_RE
.captures_iter(body)
.filter_map(|cap| cap.get(1).map(|m| m.as_str()))
.collect();
if captures.is_empty() {
return Err(ParseError::Failed(
"__next_f flight data not found".to_string(),
));
}
for raw_payload in &captures {
// Unescape the JSON string captured from the HTML attribute.
// We wrap it as a JSON string value so serde_json handles all escape
// sequences correctly. Literal newlines (which appear in synthetic test
// payloads) are escaped first so the JSON remains valid.
let sanitized = raw_payload.replace('\n', "\\n").replace('\r', "\\r");
let json_str = format!(r#""{sanitized}""#);
let payload: String = match serde_json::from_str(&json_str) {
Ok(s) => s,
Err(_) => continue,
};
let marker = "\"statistics\":";
let idx = match payload.find(marker) {
Some(idx) => idx,
None => continue,
};
// Found the statistics marker — deserialize or report rot.
let after_marker = &payload[idx + marker.len()..];
let mut de = serde_json::Deserializer::from_str(after_marker);
let stats = match Statistics::deserialize(&mut de) {
Ok(s) => s,
Err(err) => return Err(ParseError::Failed(err.to_string())),
};
let mut comments: Vec<ParsedComment> = stats
.comments
.into_iter()
.filter(|raw| !raw.comment.trim().is_empty())
.map(|raw| ParsedComment {
timestamp: raw.timestamp.map(|millis| (millis / 1000) as i64),
title: None,
message: raw.comment,
})
.collect();
comments.sort_by_key(|comment| std::cmp::Reverse(comment.timestamp));
return Ok(ParsedEntry {
messages: Vec::new(),
history: vec![stats.statistics_text],
comments,
});
}
Err(ParseError::NoData)
}
#[cfg(test)]
mod tests {
use super::*;
/// Build a minimal page in the App Router flight-data format.
fn flight_page(payload_json: &str) -> String {
let escaped = payload_json.replace('\\', "\\\\").replace('"', "\\\"");
format!(r#"<html><body><script>self.__next_f.push([1,"{escaped}"])</script></body></html>"#)
}
#[test]
fn requests_single_hitta_url() {
assert_eq!(
request_urls("0700000000"),
vec!["https://www.hitta.se/vem-ringde/0700000000".to_string()]
);
}
#[test]
fn parses_reported_number_fixture() {
let body = include_str!("../../../../fixtures/hitta/fresh-0104754350.html");
let entry = parse(body).unwrap();
assert_eq!(entry.messages, Vec::<String>::new());
assert_eq!(
entry.history,
vec!["Elva andra har rapporterat detta nummer"]
);
// every comment on this number has empty text -> all filtered out
assert!(entry.comments.is_empty());
}
#[test]
fn parses_low_activity_number_fixture() {
let body = include_str!("../../../../fixtures/hitta/fresh-0313908905.html");
let entry = parse(body).unwrap();
assert_eq!(
entry.history,
vec!["1000 andra har också sökt på detta nummer"]
);
assert!(entry.comments.is_empty());
}
#[test]
fn extracts_and_converts_comments() {
let page = flight_page(
r#"{"foo":{"statistics":{"searches":5,"comments":[
{"id":"a","comment":"Telefonförsäljare","time":"03 okt","timestamp":1538574919000,"upVotes":1},
{"id":"b","comment":"","time":"04 okt","timestamp":1538661319000},
{"id":"c","comment":"Bluff","time":"05 okt","timestamp":1538747719000}
],"statisticsText":"Tre rapporter"}}}"#,
);
let entry = parse(&page).unwrap();
assert_eq!(entry.history, vec!["Tre rapporter"]);
// empty-text comment filtered; newest first; millis -> seconds
assert_eq!(entry.comments.len(), 2);
assert_eq!(entry.comments[0].timestamp, Some(1538747719));
assert_eq!(entry.comments[0].message, "Bluff");
assert_eq!(entry.comments[1].timestamp, Some(1538574919));
assert_eq!(entry.comments[1].message, "Telefonförsäljare");
assert_eq!(entry.comments[0].title, None);
}
#[test]
fn flight_data_without_statistics_is_no_data() {
let page = flight_page(r#"{"someOtherComponent":{"props":{}}}"#);
assert_eq!(parse(&page), Err(ParseError::NoData));
}
#[test]
fn legacy_next_data_page_is_failed() {
// 2019 Pages Router fixture: no __next_f flight data at all
let body = include_str!("../../../../fixtures/hitta/0104754350.html");
assert!(matches!(parse(body), Err(ParseError::Failed(_))));
}
#[test]
fn garbage_is_failed() {
assert!(matches!(parse("<html></html>"), Err(ParseError::Failed(_))));
}
#[test]
fn snapshot_reported_number() {
let body = include_str!("../../../../fixtures/hitta/fresh-0104754350.html");
insta::assert_yaml_snapshot!(parse(body), @r###"
Ok:
messages: []
history:
- Elva andra har rapporterat detta nummer
comments: []
"###);
}
}
+25
View File
@@ -0,0 +1,25 @@
[package]
name = "whoareyou-server"
version.workspace = true
edition.workspace = true
authors.workspace = true
[dependencies]
anyhow = "1"
async-trait = "0.1"
axum = "0.8"
futures = "0.3"
moka = { version = "0.12", features = ["future"] }
reqwest = "0.13"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
thiserror = "2"
tokio = { version = "1", features = ["full"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
wasmtime = { version = "45", features = ["component-model"] }
wasmtime-wasi = "45"
[dev-dependencies]
http-body-util = "0.1"
tower = { version = "0.5", features = ["util"] }
+111
View File
@@ -0,0 +1,111 @@
use std::net::SocketAddr;
use std::path::PathBuf;
use std::time::Duration;
use crate::error::ConfigError;
#[derive(Debug)]
pub struct AppConfig {
pub listen: SocketAddr,
pub components_dir: PathBuf,
pub cache_ttl: Duration,
pub fetch_timeout: Duration,
}
impl AppConfig {
pub fn from_env() -> Result<Self, ConfigError> {
Self::from_lookup(|key| std::env::var(key).ok())
}
pub fn from_lookup(get: impl Fn(&str) -> Option<String>) -> Result<Self, ConfigError> {
let listen = match get("WHOAREYOU_LISTEN") {
Some(value) => value.parse().map_err(|err| ConfigError::Invalid {
key: "WHOAREYOU_LISTEN".to_string(),
message: format!("{err}"),
})?,
None => SocketAddr::from(([127, 0, 0, 1], 8080)),
};
let components_dir = get("WHOAREYOU_COMPONENTS_DIR")
.map(PathBuf::from)
.unwrap_or_else(|| PathBuf::from("components"));
let cache_ttl_hours: u64 = parse_or("WHOAREYOU_CACHE_TTL_HOURS", &get, 24)?;
let fetch_timeout_secs: u64 = parse_or("WHOAREYOU_FETCH_TIMEOUT_SECS", &get, 10)?;
Ok(Self {
listen,
components_dir,
cache_ttl: Duration::from_secs(cache_ttl_hours * 3600),
fetch_timeout: Duration::from_secs(fetch_timeout_secs),
})
}
}
fn parse_or(
key: &str,
get: &impl Fn(&str) -> Option<String>,
default: u64,
) -> Result<u64, ConfigError> {
match get(key) {
Some(value) => value.parse().map_err(|err| ConfigError::Invalid {
key: key.to_string(),
message: format!("{err}"),
}),
None => Ok(default),
}
}
#[cfg(test)]
mod tests {
use std::collections::HashMap;
use super::*;
fn env<'a>(pairs: &'a [(&'a str, &'a str)]) -> impl Fn(&str) -> Option<String> + 'a {
let map: HashMap<String, String> = pairs
.iter()
.map(|(k, v)| (k.to_string(), v.to_string()))
.collect();
move |key: &str| map.get(key).cloned()
}
#[test]
fn defaults_apply_when_unset() {
let config = AppConfig::from_lookup(env(&[])).unwrap();
assert_eq!(config.listen.to_string(), "127.0.0.1:8080");
assert_eq!(
config.components_dir,
std::path::PathBuf::from("components")
);
assert_eq!(config.cache_ttl, std::time::Duration::from_secs(24 * 3600));
assert_eq!(config.fetch_timeout, std::time::Duration::from_secs(10));
}
#[test]
fn env_overrides_apply() {
let config = AppConfig::from_lookup(env(&[
("WHOAREYOU_LISTEN", "0.0.0.0:9000"),
("WHOAREYOU_COMPONENTS_DIR", "/opt/providers"),
("WHOAREYOU_CACHE_TTL_HOURS", "1"),
("WHOAREYOU_FETCH_TIMEOUT_SECS", "30"),
]))
.unwrap();
assert_eq!(config.listen.to_string(), "0.0.0.0:9000");
assert_eq!(
config.components_dir,
std::path::PathBuf::from("/opt/providers")
);
assert_eq!(config.cache_ttl, std::time::Duration::from_secs(3600));
assert_eq!(config.fetch_timeout, std::time::Duration::from_secs(30));
}
#[test]
fn invalid_values_error() {
assert!(AppConfig::from_lookup(env(&[("WHOAREYOU_LISTEN", "not-an-addr")])).is_err());
assert!(AppConfig::from_lookup(env(&[("WHOAREYOU_CACHE_TTL_HOURS", "soon")])).is_err());
}
}
+22
View File
@@ -0,0 +1,22 @@
use thiserror::Error;
/// Errors from hosting/calling a WASM component.
#[derive(Debug, Error)]
pub enum HostError {
#[error("wasm error: {0}")]
Wasm(#[from] wasmtime::Error),
#[error("io error: {0}")]
Io(#[from] std::io::Error),
}
#[derive(Debug, Error)]
pub enum FetchError {
#[error("request failed: {0}")]
Request(#[from] reqwest::Error),
}
#[derive(Debug, Error)]
pub enum ConfigError {
#[error("invalid value for {key}: {message}")]
Invalid { key: String, message: String },
}
+33
View File
@@ -0,0 +1,33 @@
use std::time::Duration;
use async_trait::async_trait;
use crate::error::FetchError;
use crate::model::FetchedResponse;
use crate::service::Fetch;
pub struct ReqwestFetcher {
client: reqwest::Client,
}
impl ReqwestFetcher {
pub fn new(timeout: Duration) -> Result<Self, FetchError> {
let client = reqwest::Client::builder()
.timeout(timeout)
.user_agent(concat!("whoareyou/", env!("CARGO_PKG_VERSION")))
.build()?;
Ok(Self { client })
}
}
#[async_trait]
impl Fetch for ReqwestFetcher {
async fn fetch(&self, url: &str) -> Result<FetchedResponse, FetchError> {
let response = self.client.get(url).send().await?;
let status = response.status().as_u16();
let body = response.text().await?;
Ok(FetchedResponse { status, body })
}
}
+171
View File
@@ -0,0 +1,171 @@
use std::sync::Arc;
use axum::Json;
use axum::Router;
use axum::extract::{Path, State};
use axum::http::StatusCode;
use axum::response::{IntoResponse, Response};
use axum::routing::get;
use serde_json::json;
use crate::model::LookupResponse;
use crate::service::LookupService;
pub fn router(service: Arc<LookupService>) -> Router {
Router::new()
.route("/api/v1/number/{number}", get(lookup_number))
.route("/healthz", get(|| async { "ok" }))
.with_state(service)
}
async fn lookup_number(
State(service): State<Arc<LookupService>>,
Path(raw): Path<String>,
) -> Response {
let Some(number) = normalize(&raw) else {
return (
StatusCode::BAD_REQUEST,
Json(json!({ "error": "invalid phone number" })),
)
.into_response();
};
let results = service.lookup(&number).await;
Json(LookupResponse { number, results }).into_response()
}
/// Strip separators and validate: optional leading '+', then 215 digits.
pub fn normalize(raw: &str) -> Option<String> {
let cleaned: String = raw
.chars()
.filter(|c| !matches!(c, ' ' | '-' | '.'))
.collect();
let digits = cleaned.strip_prefix('+').unwrap_or(&cleaned);
let valid = (2..=15).contains(&digits.len()) && digits.chars().all(|c| c.is_ascii_digit());
valid.then_some(cleaned)
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use std::time::Duration;
use async_trait::async_trait;
use axum::body::Body;
use axum::http::{Request, StatusCode};
use http_body_util::BodyExt;
use tower::ServiceExt;
use super::*;
use crate::error::{FetchError, HostError};
use crate::model::{FetchedResponse, ParseOutcome};
use crate::service::{Fetch, LookupService, ProviderHandle};
struct NoDataProvider;
impl ProviderHandle for NoDataProvider {
fn name(&self) -> &str {
"fake.se"
}
fn requests(&self, number: &str) -> Result<Vec<String>, HostError> {
Ok(vec![format!("https://example.test/{number}")])
}
fn parse(&self, _: &str, _: &[FetchedResponse]) -> ParseOutcome {
ParseOutcome::NoData
}
}
struct StaticFetcher;
#[async_trait]
impl Fetch for StaticFetcher {
async fn fetch(&self, _: &str) -> Result<FetchedResponse, FetchError> {
Ok(FetchedResponse {
status: 200,
body: String::new(),
})
}
}
fn app() -> axum::Router {
let service = LookupService::new(
vec![Arc::new(NoDataProvider)],
Arc::new(StaticFetcher),
Duration::from_secs(60),
);
router(Arc::new(service))
}
#[test]
fn normalize_strips_separators() {
assert_eq!(normalize("0700 00-00.00"), Some("0700000000".to_string()));
assert_eq!(normalize("+46701234567"), Some("+46701234567".to_string()));
}
#[test]
fn normalize_rejects_garbage() {
assert_eq!(normalize("not-a-number"), None);
assert_eq!(normalize(""), None);
assert_eq!(normalize("0"), None);
assert_eq!(normalize("07001231231231231231"), None); // > 15 digits
assert_eq!(normalize("070+123"), None); // '+' not at start
}
#[tokio::test]
async fn lookup_returns_results_keyed_by_provider() {
let response = app()
.oneshot(
Request::builder()
.uri("/api/v1/number/0700%2000-00%2000")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::OK);
let bytes = response.into_body().collect().await.unwrap().to_bytes();
let json: serde_json::Value = serde_json::from_slice(&bytes).unwrap();
assert_eq!(json["number"], "0700000000");
assert_eq!(json["results"]["fake.se"]["status"], "no_data");
}
#[tokio::test]
async fn invalid_number_is_400() {
let response = app()
.oneshot(
Request::builder()
.uri("/api/v1/number/banana")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::BAD_REQUEST);
}
#[tokio::test]
async fn healthz_is_ok() {
let response = app()
.oneshot(
Request::builder()
.uri("/healthz")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::OK);
}
}
+7
View File
@@ -0,0 +1,7 @@
pub mod config;
pub mod error;
pub mod fetch;
pub mod http;
pub mod model;
pub mod service;
pub mod wasm;
+67
View File
@@ -0,0 +1,67 @@
use std::sync::Arc;
use anyhow::Context;
use tracing::info;
use tracing_subscriber::EnvFilter;
use whoareyou_server::config::AppConfig;
use whoareyou_server::fetch::ReqwestFetcher;
use whoareyou_server::service::{LookupService, ProviderHandle};
use whoareyou_server::{http, wasm};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
tracing_subscriber::fmt()
.with_env_filter(
EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("info")),
)
.init();
let config = AppConfig::from_env()?;
let engine = wasm::engine()?;
let linker = wasm::linker(&engine)?;
wasm::spawn_epoch_thread(&engine);
let mut providers: Vec<Arc<dyn ProviderHandle>> = Vec::new();
let dir = std::fs::read_dir(&config.components_dir)
.with_context(|| format!("reading components dir {:?}", config.components_dir))?;
for entry in dir {
let path = entry?.path();
if path.extension().is_some_and(|ext| ext == "wasm") {
let provider = wasm::WasmProvider::load(&engine, &linker, &path)
.with_context(|| format!("loading component {path:?}"))?;
info!(
name = provider.name(),
version = provider.version(),
?path,
"loaded provider"
);
providers.push(Arc::new(provider));
}
}
anyhow::ensure!(
!providers.is_empty(),
"no .wasm components found in {:?}",
config.components_dir
);
let fetcher = Arc::new(ReqwestFetcher::new(config.fetch_timeout)?);
let service = Arc::new(LookupService::new(providers, fetcher, config.cache_ttl));
let app = http::router(service);
let listener = tokio::net::TcpListener::bind(config.listen).await?;
info!("listening on http://{}", config.listen);
axum::serve(listener, app).await?;
Ok(())
}
+87
View File
@@ -0,0 +1,87 @@
use std::collections::BTreeMap;
use serde::Serialize;
#[derive(Debug, Clone, PartialEq, Serialize)]
pub struct Entry {
pub messages: Vec<String>,
pub history: Vec<String>,
pub comments: Vec<Comment>,
}
#[derive(Debug, Clone, PartialEq, Serialize)]
pub struct Comment {
/// Unix epoch seconds, UTC.
pub timestamp: Option<i64>,
pub title: Option<String>,
pub message: String,
}
/// Per-provider outcome as exposed in the API (and cached).
#[derive(Debug, Clone, PartialEq, Serialize)]
#[serde(tag = "status", rename_all = "snake_case")]
pub enum ProviderResult {
Ok { entry: Entry },
NoData,
FetchFailed,
ParseFailed,
}
/// A fetched HTTP response handed to a provider's `parse`.
#[derive(Debug, Clone)]
pub struct FetchedResponse {
pub status: u16,
pub body: String,
}
/// Outcome of a provider's `parse` call, before API mapping.
#[derive(Debug)]
pub enum ParseOutcome {
Ok(Entry),
NoData,
Failed(String),
}
#[derive(Debug, Serialize)]
pub struct LookupResponse {
pub number: String,
pub results: BTreeMap<String, ProviderResult>,
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn provider_result_serializes_to_api_shape() {
let ok = ProviderResult::Ok {
entry: Entry {
messages: vec![],
history: vec!["42 andra".to_string()],
comments: vec![Comment {
timestamp: Some(1547746162),
title: None,
message: "Varmsälj".to_string(),
}],
},
};
let json = serde_json::to_value(&ok).unwrap();
assert_eq!(json["status"], "ok");
assert_eq!(json["entry"]["history"][0], "42 andra");
assert_eq!(json["entry"]["comments"][0]["timestamp"], 1547746162);
assert_eq!(
serde_json::to_value(&ProviderResult::NoData).unwrap()["status"],
"no_data"
);
assert_eq!(
serde_json::to_value(&ProviderResult::FetchFailed).unwrap()["status"],
"fetch_failed"
);
assert_eq!(
serde_json::to_value(&ProviderResult::ParseFailed).unwrap()["status"],
"parse_failed"
);
}
}
+352
View File
@@ -0,0 +1,352 @@
use std::collections::BTreeMap;
use std::sync::Arc;
use std::time::Duration;
use async_trait::async_trait;
use moka::future::Cache;
use tracing::warn;
use crate::error::{FetchError, HostError};
use crate::model::{FetchedResponse, ParseOutcome, ProviderResult};
/// A loaded provider. Implemented by `wasm::WasmProvider`; faked in tests.
/// Methods are sync — WASM calls are CPU-bound; the service wraps them in
/// `spawn_blocking`.
pub trait ProviderHandle: Send + Sync {
fn name(&self) -> &str;
fn requests(&self, number: &str) -> Result<Vec<String>, HostError>;
fn parse(&self, number: &str, responses: &[FetchedResponse]) -> ParseOutcome;
}
#[async_trait]
pub trait Fetch: Send + Sync {
async fn fetch(&self, url: &str) -> Result<FetchedResponse, FetchError>;
}
pub struct LookupService {
providers: Vec<Arc<dyn ProviderHandle>>,
fetcher: Arc<dyn Fetch>,
cache: Cache<String, ProviderResult>,
}
impl LookupService {
pub fn new(
providers: Vec<Arc<dyn ProviderHandle>>,
fetcher: Arc<dyn Fetch>,
cache_ttl: Duration,
) -> Self {
Self {
providers,
fetcher,
cache: Cache::builder().time_to_live(cache_ttl).build(),
}
}
/// Run all providers concurrently; one result per provider name.
pub async fn lookup(&self, number: &str) -> BTreeMap<String, ProviderResult> {
let tasks = self.providers.iter().map(|provider| {
let provider = provider.clone();
let fetcher = self.fetcher.clone();
let cache = self.cache.clone();
let number = number.to_string();
async move {
let name = provider.name().to_string();
let key = format!("{name}:{number}");
if let Some(hit) = cache.get(&key).await {
return (name, hit);
}
let result = run_provider(provider, &number, fetcher).await;
// Transient failures must not poison the cache.
if result != ProviderResult::FetchFailed {
cache.insert(key, result.clone()).await;
}
(name, result)
}
});
futures::future::join_all(tasks).await.into_iter().collect()
}
}
async fn run_provider(
provider: Arc<dyn ProviderHandle>,
number: &str,
fetcher: Arc<dyn Fetch>,
) -> ProviderResult {
let name = provider.name().to_string();
let urls = {
let provider = provider.clone();
let number = number.to_string();
match tokio::task::spawn_blocking(move || provider.requests(&number)).await {
Ok(Ok(urls)) => urls,
Ok(Err(error)) => {
// Host-side trap; the 4-status API has no better bucket than parse_failed.
warn!(provider = %name, %error, "requests() failed");
return ProviderResult::ParseFailed;
}
Err(error) => {
warn!(provider = %name, %error, "requests() panicked");
return ProviderResult::ParseFailed;
}
}
};
let fetched = futures::future::join_all(urls.iter().map(|url| fetcher.fetch(url))).await;
let mut responses = Vec::with_capacity(fetched.len());
for result in fetched {
match result {
Ok(response) => responses.push(response),
Err(error) => {
warn!(provider = %name, %error, "fetch failed");
return ProviderResult::FetchFailed;
}
}
}
let outcome = {
let provider = provider.clone();
let number = number.to_string();
tokio::task::spawn_blocking(move || provider.parse(&number, &responses)).await
};
match outcome {
Ok(ParseOutcome::Ok(entry)) => ProviderResult::Ok { entry },
Ok(ParseOutcome::NoData) => ProviderResult::NoData,
Ok(ParseOutcome::Failed(message)) => {
warn!(provider = %name, %message, "parse failed — scraper rot?");
ProviderResult::ParseFailed
}
Err(error) => {
warn!(provider = %name, %error, "parse() panicked");
ProviderResult::ParseFailed
}
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::time::Duration;
use async_trait::async_trait;
use super::*;
use crate::error::{FetchError, HostError};
use crate::model::{Comment, Entry, FetchedResponse, ParseOutcome, ProviderResult};
fn entry() -> Entry {
Entry {
messages: vec![],
history: vec!["history".to_string()],
comments: vec![Comment {
timestamp: Some(1547746162),
title: None,
message: "spam".to_string(),
}],
}
}
/// Provider whose parse outcome is scripted per call.
struct FakeProvider {
name: &'static str,
outcome: fn() -> ParseOutcome,
}
impl ProviderHandle for FakeProvider {
fn name(&self) -> &str {
self.name
}
fn requests(&self, number: &str) -> Result<Vec<String>, HostError> {
Ok(vec![format!("https://example.test/{number}")])
}
fn parse(&self, _number: &str, _responses: &[FetchedResponse]) -> ParseOutcome {
(self.outcome)()
}
}
/// Fetcher that counts calls and can be told to fail.
struct FakeFetcher {
calls: AtomicUsize,
fail: bool,
}
impl FakeFetcher {
fn new(fail: bool) -> Self {
Self {
calls: AtomicUsize::new(0),
fail,
}
}
}
#[async_trait]
impl Fetch for FakeFetcher {
async fn fetch(&self, _url: &str) -> Result<FetchedResponse, FetchError> {
self.calls.fetch_add(1, Ordering::SeqCst);
if self.fail {
// reqwest::Error cannot be constructed directly; produce a real
// one via an immediately-refused local connection (port 1).
let err = reqwest::Client::new()
.get("http://127.0.0.1:1/unreachable")
.send()
.await
.unwrap_err();
return Err(FetchError::Request(err));
}
Ok(FetchedResponse {
status: 200,
body: "body".to_string(),
})
}
}
fn service(providers: Vec<Arc<dyn ProviderHandle>>, fetcher: Arc<dyn Fetch>) -> LookupService {
LookupService::new(providers, fetcher, Duration::from_secs(60))
}
#[tokio::test]
async fn ok_result_is_returned_and_cached() {
let provider = Arc::new(FakeProvider {
name: "fake.se",
outcome: || ParseOutcome::Ok(entry()),
});
let fetcher = Arc::new(FakeFetcher::new(false));
let svc = service(vec![provider], fetcher.clone());
let results = svc.lookup("0700000000").await;
assert_eq!(results["fake.se"], ProviderResult::Ok { entry: entry() });
// second lookup served from cache — fetcher not called again
let results = svc.lookup("0700000000").await;
assert_eq!(results["fake.se"], ProviderResult::Ok { entry: entry() });
assert_eq!(fetcher.calls.load(Ordering::SeqCst), 1);
}
#[tokio::test]
async fn no_data_is_cached() {
let provider = Arc::new(FakeProvider {
name: "fake.se",
outcome: || ParseOutcome::NoData,
});
let fetcher = Arc::new(FakeFetcher::new(false));
let svc = service(vec![provider], fetcher.clone());
assert_eq!(
svc.lookup("0700000000").await["fake.se"],
ProviderResult::NoData
);
assert_eq!(
svc.lookup("0700000000").await["fake.se"],
ProviderResult::NoData
);
assert_eq!(fetcher.calls.load(Ordering::SeqCst), 1);
}
#[tokio::test]
async fn parse_failure_maps_and_is_cached() {
let provider = Arc::new(FakeProvider {
name: "fake.se",
outcome: || ParseOutcome::Failed("rot".to_string()),
});
let fetcher = Arc::new(FakeFetcher::new(false));
let svc = service(vec![provider], fetcher.clone());
assert_eq!(
svc.lookup("0700000000").await["fake.se"],
ProviderResult::ParseFailed
);
assert_eq!(
svc.lookup("0700000000").await["fake.se"],
ProviderResult::ParseFailed
);
assert_eq!(fetcher.calls.load(Ordering::SeqCst), 1);
}
#[tokio::test]
async fn fetch_failure_is_not_cached() {
let provider = Arc::new(FakeProvider {
name: "fake.se",
outcome: || ParseOutcome::NoData,
});
let fetcher = Arc::new(FakeFetcher::new(true));
let svc = service(vec![provider], fetcher.clone());
assert_eq!(
svc.lookup("0700000000").await["fake.se"],
ProviderResult::FetchFailed
);
assert_eq!(
svc.lookup("0700000000").await["fake.se"],
ProviderResult::FetchFailed
);
// NOT cached: fetcher tried twice
assert_eq!(fetcher.calls.load(Ordering::SeqCst), 2);
}
#[tokio::test]
async fn multiple_providers_keyed_by_name() {
let a = Arc::new(FakeProvider {
name: "a.se",
outcome: || ParseOutcome::NoData,
});
let b = Arc::new(FakeProvider {
name: "b.se",
outcome: || ParseOutcome::Ok(entry()),
});
let fetcher = Arc::new(FakeFetcher::new(false));
let svc = service(vec![a, b], fetcher);
let results = svc.lookup("0700000000").await;
assert_eq!(results.len(), 2);
assert_eq!(results["a.se"], ProviderResult::NoData);
assert!(matches!(results["b.se"], ProviderResult::Ok { .. }));
}
/// Provider whose parse() panics — must be contained by spawn_blocking.
struct PanickingProvider;
impl ProviderHandle for PanickingProvider {
fn name(&self) -> &str {
"panic.se"
}
fn requests(&self, number: &str) -> Result<Vec<String>, HostError> {
Ok(vec![format!("https://example.test/{number}")])
}
fn parse(&self, _number: &str, _responses: &[FetchedResponse]) -> ParseOutcome {
panic!("provider blew up");
}
}
#[tokio::test]
async fn provider_panic_is_contained_and_isolated() {
let panicking = Arc::new(PanickingProvider);
let healthy = Arc::new(FakeProvider {
name: "ok.se",
outcome: || ParseOutcome::Ok(entry()),
});
let fetcher = Arc::new(FakeFetcher::new(false));
let svc = service(vec![panicking, healthy], fetcher);
let results = svc.lookup("0700000000").await;
assert_eq!(results["panic.se"], ProviderResult::ParseFailed);
assert!(matches!(results["ok.se"], ProviderResult::Ok { .. }));
}
}
+182
View File
@@ -0,0 +1,182 @@
use std::path::Path;
use wasmtime::component::{Component, Linker};
use wasmtime::{Config, Engine, Store};
use wasmtime_wasi::{ResourceTable, WasiCtx, WasiCtxBuilder, WasiCtxView, WasiView};
use crate::error::HostError;
use crate::model::{Comment, Entry, FetchedResponse, ParseOutcome};
use crate::service::ProviderHandle;
wasmtime::component::bindgen!({
world: "provider",
path: "../../wit",
});
use exports::whoareyou::provider::lookup::{
LookupError as WitLookupError, Response as WitResponse,
};
/// How many epoch ticks a guest call may run. The epoch thread ticks every
/// 100 ms → 50 ticks ≈ 5 s budget per call.
const EPOCH_DEADLINE_TICKS: u64 = 50;
pub const EPOCH_TICK: std::time::Duration = std::time::Duration::from_millis(100);
pub struct HostState {
ctx: WasiCtx,
table: ResourceTable,
}
impl WasiView for HostState {
fn ctx(&mut self) -> WasiCtxView<'_> {
WasiCtxView {
ctx: &mut self.ctx,
table: &mut self.table,
}
}
}
pub fn engine() -> Result<Engine, HostError> {
let mut config = Config::new();
config.epoch_interruption(true);
Ok(Engine::new(&config)?)
}
pub fn linker(engine: &Engine) -> Result<Linker<HostState>, HostError> {
let mut linker = Linker::new(engine);
wasmtime_wasi::p2::add_to_linker_sync(&mut linker)?;
Ok(linker)
}
/// Spawn the thread that advances the engine epoch so runaway guest calls
/// trap instead of hanging the service. Call once at startup.
pub fn spawn_epoch_thread(engine: &Engine) {
let engine = engine.clone();
std::thread::spawn(move || {
loop {
std::thread::sleep(EPOCH_TICK);
engine.increment_epoch();
}
});
}
pub struct WasmProvider {
name: String,
version: String,
engine: Engine,
pre: ProviderPre<HostState>,
}
impl WasmProvider {
/// Compile a component from disk and read its metadata once.
/// Fails fast if the component does not satisfy the provider world.
pub fn load(
engine: &Engine,
linker: &Linker<HostState>,
path: &Path,
) -> Result<Self, HostError> {
let component = Component::from_file(engine, path)?;
let pre = ProviderPre::new(linker.instantiate_pre(&component)?)?;
let mut provider = Self {
name: String::new(),
version: String::new(),
engine: engine.clone(),
pre,
};
let mut store = provider.new_store();
let instance = provider.pre.instantiate(&mut store)?;
let info = instance
.whoareyou_provider_lookup()
.call_metadata(&mut store)?;
provider.name = info.name;
provider.version = info.version;
Ok(provider)
}
pub fn version(&self) -> &str {
&self.version
}
fn new_store(&self) -> Store<HostState> {
// No preopens, no env, no inherited stdio — fully sandboxed guest.
let ctx = WasiCtxBuilder::new().build();
let mut store = Store::new(
&self.engine,
HostState {
ctx,
table: ResourceTable::new(),
},
);
store.set_epoch_deadline(EPOCH_DEADLINE_TICKS);
store
}
}
impl ProviderHandle for WasmProvider {
fn name(&self) -> &str {
&self.name
}
fn requests(&self, number: &str) -> Result<Vec<String>, HostError> {
let mut store = self.new_store();
let instance = self.pre.instantiate(&mut store)?;
let requests = instance
.whoareyou_provider_lookup()
.call_requests(&mut store, number)?;
Ok(requests.into_iter().map(|r| r.url).collect())
}
fn parse(&self, number: &str, responses: &[FetchedResponse]) -> ParseOutcome {
let wit_responses: Vec<WitResponse> = responses
.iter()
.map(|r| WitResponse {
status: r.status,
body: r.body.clone(),
})
.collect();
let mut store = self.new_store();
let result: Result<Result<_, WitLookupError>, wasmtime::Error> = (|| {
let instance = self.pre.instantiate(&mut store)?;
instance
.whoareyou_provider_lookup()
.call_parse(&mut store, number, &wit_responses)
})();
match result {
Ok(Ok(entry)) => ParseOutcome::Ok(Entry {
messages: entry.messages,
history: entry.history,
comments: entry
.comments
.into_iter()
.map(|c| Comment {
timestamp: c.timestamp,
title: c.title,
message: c.message,
})
.collect(),
}),
Ok(Err(WitLookupError::NoData)) => ParseOutcome::NoData,
Ok(Err(WitLookupError::ParseFailed(message))) => ParseOutcome::Failed(message),
// Trap (incl. epoch deadline exceeded) or instantiation failure.
Err(error) => ParseOutcome::Failed(format!("component error: {error}")),
}
}
}
+93
View File
@@ -0,0 +1,93 @@
use std::path::Path;
use whoareyou_server::model::{FetchedResponse, ParseOutcome};
use whoareyou_server::service::ProviderHandle;
use whoareyou_server::wasm;
const COMPONENT_PATH: &str = concat!(
env!("CARGO_MANIFEST_DIR"),
"/../../target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm"
);
fn load_provider() -> wasm::WasmProvider {
let path = Path::new(COMPONENT_PATH);
assert!(
path.exists(),
"hitta component not built — run `cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta` first"
);
let engine = wasm::engine().unwrap();
let linker = wasm::linker(&engine).unwrap();
wasm::spawn_epoch_thread(&engine);
wasm::WasmProvider::load(&engine, &linker, path).unwrap()
}
#[test]
fn metadata_identifies_hitta() {
let provider = load_provider();
assert_eq!(provider.name(), "hitta.se");
assert!(!provider.version().is_empty());
}
#[test]
fn requests_contain_the_number() {
let provider = load_provider();
let urls = provider.requests("0104754350").unwrap();
assert_eq!(urls, vec!["https://www.hitta.se/vem-ringde/0104754350"]);
}
#[test]
fn parse_roundtrips_a_fixture_through_wasm() {
let provider = load_provider();
let body = include_str!("../../../fixtures/hitta/fresh-0104754350.html").to_string();
let outcome = provider.parse("0104754350", &[FetchedResponse { status: 200, body }]);
let ParseOutcome::Ok(entry) = outcome else {
panic!("expected Ok entry, got {outcome:?}");
};
assert_eq!(
entry.history,
vec!["Elva andra har rapporterat detta nummer"]
);
// every comment on this number has empty text -> filtered out in the parser
assert!(entry.comments.is_empty());
}
#[test]
fn parse_maps_no_data_for_legacy_page() {
// 2019-format page: parser reports Failed (no flight data) -> ParseOutcome::Failed
let provider = load_provider();
let body = include_str!("../../../fixtures/hitta/0104754350.html").to_string();
let outcome = provider.parse("0104754350", &[FetchedResponse { status: 200, body }]);
assert!(
matches!(outcome, ParseOutcome::Failed(_)),
"got {outcome:?}"
);
}
#[test]
fn parse_rejects_non_200_status() {
let provider = load_provider();
let outcome = provider.parse(
"0104754350",
&[FetchedResponse {
status: 429,
body: "rate limited".to_string(),
}],
);
let ParseOutcome::Failed(message) = outcome else {
panic!("expected Failed, got {outcome:?}");
};
assert!(message.contains("429"), "message was: {message}");
}
-11
View File
@@ -1,11 +0,0 @@
name = "eniro.se"
path = "https://gulasidorna.eniro.se/hitta:{ number }"
[[messages]]
selector = ".CompanyResultListItem h3.name > a"
[[history]]
selector = "div.PhoneNoHit div.search-info-container p"
[[history]]
selector = "div.feedback-types div.feedback-type-item"
-5
View File
@@ -1,5 +0,0 @@
name = "konsumentinfo.se"
path = "http://konsumentinfo.se/telefonnummer/sverige/{ number }"
[[messages]]
selector = ".panel-heading > h1:nth-child(3)"
-29
View File
@@ -1,29 +0,0 @@
name = "telefonforsaljare.nu"
path = "http://www.telefonforsaljare.nu/telefonnummer/{ number }/"
[[messages]]
selector = "#content p:nth-child(2) i"
[[history]]
selector = "#content p:nth-child(4)"
[[history]]
selector = "#content p:nth-child(5)"
[[comments]]
selector = "#kommentarer > [itemtype='http://data-vocabulary.org/Review']"
[comments.date_time]
selector = "small"
data = "attr:datetime"
kind = "date_time"
format = "%Y-%m-%d %H:%M:%S"
tz = "Europe/Stockholm"
[comments.title]
selector = "h3"
data = "inner_html"
[comments.message]
selector = "[itemprop='description']"
data = "inner_html"
-18
View File
@@ -1,18 +0,0 @@
name = "vemringde.se"
path = "http://vemringde.se/?q={ number }"
[[messages]]
selector = "#toporganisations li"
[[comments]]
selector = "#calls ol li"
[comments.date_time]
selector = "div:nth-child(4)"
data = "inner_html"
kind = "date"
format = "%Y-%m-%d"
tz = "Europe/Stockholm"
[comments.message]
selector = "div:nth-child(3)"
-17
View File
@@ -1,17 +0,0 @@
name: "vemringde.se"
path: "http://vemringde.se/?q={ number }"
messages:
- selector: "#toporganisations li"
comments:
- selector: "#calls ol li"
fields:
date_time:
selector: "div:nth-child(4)"
data: "inner_html"
kind: "date"
format: "%Y-%m-%d"
tz: "Europe/Stockholm"
message:
selector: "div:nth-child(3)"
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,232 @@
# whoareyou v1 — WASM provider service
**Date:** 2026-06-05
**Status:** Approved design
## Summary
whoareyou becomes a long-running, async HTTP service that looks up Swedish
phone numbers by aggregating scraping providers. Providers are **WASM
components** (WASM Component Model, WASI p2) loaded from disk at startup. The
host does all network fetching; components are pure functions from number →
requests and responses → parsed entries. The existing CLI is retired.
V1 scope: code only (runs via `cargo run`, components on disk). One provider:
hitta.se. No container image, no k8s manifests, no upload/enable-disable UI —
those come later.
## Decisions log
| Decision | Choice |
|---|---|
| Form factor | Long-running service + HTTP API; CLI retired |
| Deployment target | Self-hosted on own infra (later); v1 is code-only |
| API surface | Single lookup endpoint (+ `/healthz` as daemon necessity) |
| Provider mechanism | WASM components, fixed set loaded at startup; upload/enable/disable is future work |
| Host/guest boundary | Host fetches; components are pure (no network/fs access) |
| WASM plumbing | Component Model: wasmtime + WIT + wit-bindgen (chosen over Extism and raw-module ABI) |
| Cache | In-memory TTL (moka), parsed results, 24h, key `provider:number` |
| V1 providers | hitta.se only |
| Old code | CLI, TOML/YAML definitions, bincode cache, orphaned probes: deleted. Hitta parse logic + fixtures: ported |
## Workspace layout & toolchain
Cargo workspace, edition 2024:
```
whoareyou/
├── Cargo.toml # workspace
├── wit/
│ └── provider.wit # the provider contract (single source of truth)
├── crates/
│ ├── server/ # bin: axum HTTP service + wasmtime host
│ └── providers/
│ └── hitta/ # cdylib → wasm32-wasip2 component
├── fixtures/ # existing HTML fixtures, reused for component tests
└── justfile # build components → build server (two-step build)
```
- **Host stack:** tokio, axum 0.8, reqwest (current, async), moka 0.12,
wasmtime 45 (`component-model`), thiserror, tracing.
- **Provider stack:** `wit-bindgen` in a `cdylib` crate, built with plain
`cargo build --target wasm32-wasip2 --release`. No `cargo-component`
(stale; the tier-2 wasip2 target makes it unnecessary).
- **Deleted:** `src/main.rs` (CLI), `src/definition.rs`, `src/context.rs`
(bincode cache), `definitions/`, the four orphaned probe modules, `_build.rs`.
## The WIT contract
`wit/provider.wit`:
```wit
package whoareyou:provider@0.1.0;
interface lookup {
record provider-info {
name: string, // e.g. "hitta.se" — key in API response + cache
version: string,
}
record request {
url: string,
}
record response {
status: u16,
body: string,
}
record comment {
timestamp: option<s64>, // unix epoch seconds, UTC
title: option<string>,
message: string,
}
record entry {
messages: list<string>,
history: list<string>,
comments: list<comment>,
}
variant lookup-error {
no-data, // fetched fine, nothing on the page
parse-failed(string), // page structure changed — scraper rot signal
}
metadata: func() -> provider-info;
requests: func(number: string) -> list<request>;
parse: func(number: string, responses: list<response>) -> result<entry, lookup-error>;
}
world provider {
export lookup;
}
```
Design points:
- **Pure exports, no imports.** Components cannot reach network or
filesystem. Sandboxed, trivially testable, host owns all I/O policy.
- **`timestamp: option<s64>`** replaces the old `Date` enum. Components
normalize site-local dates to UTC epoch seconds (date-only → 00:00:00);
`option` because some sites omit dates. The host never parses dates.
- **`lookup-error` separates `no-data` from `parse-failed`** — the old
`Result<Entry, ()>` conflated "nothing there" with "scraper broke".
- **`requests` returns a list** for single-round fan-out (e.g. two URL
formats). No sequential multi-step flows in v1; if a future provider needs
fetch→token→fetch, add an optional host-fetch import to the world then.
- Package is versioned (`@0.1.0`); future provider-upload feature hangs
version negotiation off this.
## The host service
### Startup
1. Load config from env (`WHOAREYOU_` prefix): listen addr, components dir,
cache TTL, fetch timeout.
2. Scan `components/*.wasm`, compile each with wasmtime, call `metadata()`
once; **fail fast** on components that don't satisfy the WIT world. Log
the loaded provider set.
3. Serve HTTP.
### API
```
GET /api/v1/number/{number}
GET /healthz
```
Response shape:
```json
{
"number": "0700000000",
"results": {
"hitta.se": {
"status": "ok",
"entry": {
"messages": [],
"history": ["42 andra har rapporterat detta nummer"],
"comments": [
{ "timestamp": 1547746162, "title": null, "message": "Varmsälj från Folksam" }
]
}
}
}
}
```
- Per-provider `status`: `"ok"` | `"no_data"` | `"fetch_failed"` |
`"parse_failed"`. One provider failing never fails the request.
- HTTP 200 whenever the lookup ran; 400 only for invalid number format.
- `/healthz` is the one deliberate addition to "just lookup" — a supervised
daemon needs it.
### Lookup flow
1. Normalize the number (strip spaces/dashes; minimal — no full E.164 in
v1). Normalized form is the cache key.
2. Check moka cache (key `provider:number`, value: per-provider result,
TTL 24h).
3. On miss, per provider **concurrently**: `requests(number)` → host fetches
each URL via reqwest (shared client, timeout, descriptive User-Agent) →
`parse(number, responses)` → cache the result.
4. Assemble the JSON response.
### Wasmtime mechanics
- One `Engine` + one compiled `Component` per provider at startup.
- Fresh `Store` + instance per call — cheap, and no state bleeds between
lookups.
- Guest calls run in `spawn_blocking` (CPU-bound parsing, no async in guest).
- Epoch-based deadline per call (`Engine` epoch interruption, deadline a few
seconds out) so a runaway component can't hang the service — matters once
uploaded third-party components exist.
### Errors and logging
- Host-side `thiserror` enum (`FetchFailed`, `ParseFailed`, `ComponentTrap`,
…) mapped to the per-provider `status` strings.
- `tracing` structured logs; `parse_failed` logs at WARN — it means a
scraper rotted and needs attention.
## The hitta component
`crates/providers/hitta`:
- `wit-bindgen` generates the trait from `wit/provider.wit`.
- `metadata()``{ name: "hitta.se", version: <crate version> }`.
- `requests(number)``["https://www.hitta.se/vem-ringde/{number}"]`.
- `parse()` ports `src/probe/hitta.rs`: regex out `__NEXT_DATA__`,
serde-deserialize, map comments to epoch timestamps. The old `Err(())`
paths split: regex miss → `parse-failed`; JSON ok but no `phone_data`
`no-data`.
- **First implementation task: verify the 2019 fixtures against today's
hitta.se.** The site has likely moved off `__NEXT_DATA__`-in-a-script-tag.
Fetch fresh fixtures and update the parser to match reality before
anything is declared working.
## Testing
Three layers, no live network anywhere:
1. **Component logic, native:** parse logic in plain functions; unit tests
run natively against `fixtures/hitta/*.html` with current insta
(migrating existing snapshots). No WASM in the loop — fast iteration.
2. **Component contract, in wasmtime:** one host-side integration test loads
the built `.wasm`, feeds it a fixture body as a `response`, asserts on the
returned `entry`. Proves the WIT boundary + build pipeline.
3. **HTTP layer:** axum handler tests with the provider/fetch layer behind a
small trait — API-shape tests need neither network nor WASM.
`fetch-fixture` (updated for current site URLs) remains the manual tool for
refreshing fixtures.
## Future work (explicitly out of v1)
- Container image, k8s/Pithos config, CI to registry
- Upload + enable/disable custom providers (API/UI)
- More providers (audit the four 2019 sites for survivors)
- Host-fetch import in the WIT world for multi-step providers
- Lookup history / persistent cache
- Metrics (Prometheus/OTel)
+8 -5
View File
@@ -1,8 +1,11 @@
#!/bin/bash
# Refresh HTML fixtures for provider parser tests.
# Usage: ./fetch-fixture <number>
set -euo pipefail
http --follow GET "https://gulasidorna.eniro.se/hitta:$1" > "fixtures/eniro/$1.html"
http --follow GET "https://www.hitta.se/vem-ringde/$1" > "fixtures/hitta/$1.html"
http --follow GET "http://konsumentinfo.se/telefonnummer/sverige/$1" > "fixtures/konsumentinfo/$1.html"
http --follow GET "http://telefonforsaljare.nu/telefonnummer/$1/" > "fixtures/telefonforsaljare/$1.html"
http --follow GET "http://vemringde.se/?q=$1" > "fixtures/vemringde/$1.html"
curl -sL -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)" \
"https://www.hitta.se/vem-ringde/$1" \
-o "fixtures/hitta/$1.html"
echo "fixtures/hitta/$1.html: $(wc -c < "fixtures/hitta/$1.html") bytes"
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
+23
View File
@@ -0,0 +1,23 @@
# Build provider components and copy them where the server looks.
build-components:
cargo build --release --target wasm32-wasip2 -p whoareyou-provider-hitta
mkdir -p components
cp target/wasm32-wasip2/release/whoareyou_provider_hitta.wasm components/hitta.wasm
# Full build: components first, then the server.
build: build-components
cargo build --release
# All tests (the integration test needs the built component).
test: build-components
cargo test --workspace
# Run the service locally.
run: build-components
cargo run -p whoareyou-server
fmt:
cargo +nightly fmt
lint:
cargo clippy --workspace
-99
View File
@@ -1,99 +0,0 @@
use std::fs;
use std::io;
use chrono::prelude::*;
use chrono::Duration;
use directories::ProjectDirs;
use log::debug;
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize)]
pub struct Cache {
timestamp: DateTime<Utc>,
pub data: Vec<u8>,
}
pub struct Context {
dirs: ProjectDirs,
}
impl Context {
pub fn new() -> Context {
Context {
dirs: ProjectDirs::from("com", "logaritmisk", "whoareyou").unwrap(),
}
}
pub fn cache_get(&mut self, bin: &str, key: &str) -> Option<Cache> {
let cache = self.dirs.cache_dir().join(format!("{}-{}.bin", bin, key));
if cache.exists() {
debug!("cache: bin={} key={} path={:?} exists", bin, key, cache);
fs::File::open(cache)
.and_then(|file| {
bincode::deserialize_from(&file).map_err(|_| {
debug!("cache: bin={} key={} faild to deserialize", bin, key);
io::Error::new(io::ErrorKind::Other, "failed to deserialize cache entry")
})
})
.and_then(|cache: Cache| {
if cache.timestamp > Utc::now() {
debug!("cache: bin={} key={} ok", bin, key);
Ok(cache)
} else {
debug!(
"cache: bin={} key={} outdated ({})",
bin, key, cache.timestamp
);
Err(io::Error::new(
io::ErrorKind::Other,
"failed to deserialize cache entry",
))
}
})
.ok()
} else {
debug!("cache: bin={} key={} don't exists", bin, key);
None
}
}
pub fn cache_set<D>(&mut self, bin: &str, key: &str, data: D) -> Result<(), io::Error>
where
D: AsRef<[u8]>,
{
let entry = Cache {
timestamp: Utc::now() + Duration::days(1),
data: data.as_ref().to_vec(),
};
let cache = self.dirs.cache_dir();
if !cache.exists() {
fs::create_dir_all(&cache)?;
}
let cache = cache.join(format!("{}-{}.bin", bin, key));
debug!(
"cache: save: bin={} key={} path={:?} timestamp={}",
bin, key, cache, entry.timestamp
);
fs::OpenOptions::new()
.create(true)
.write(true)
.truncate(true)
.open(cache)
.and_then(|mut file| {
bincode::serialize_into(&mut file, &entry).map_err(|_| {
io::Error::new(io::ErrorKind::Other, "failed to serialize cache entry")
})
})
}
}
-283
View File
@@ -1,283 +0,0 @@
use std::str;
use chrono_tz::Tz;
use scraper::{ElementRef, Html, Selector};
use serde::{de, Deserialize, Deserializer, Serialize};
use tinytemplate::TinyTemplate;
use crate::entry::{self, Date, Entry};
use crate::probe::Probe;
#[derive(Serialize)]
struct Context {
number: String,
}
#[derive(Debug, Deserialize)]
pub struct Definition {
name: String,
path: String,
messages: Vec<Field>,
#[serde(default)]
history: Vec<Field>,
#[serde(default)]
comments: Vec<Comment>,
}
#[derive(Debug, Deserialize)]
struct Comment {
#[serde(deserialize_with = "deserialize_selector")]
selector: Selector,
#[serde(rename = "date_time")]
datetime: Option<DateTime>,
title: Option<Field>,
message: Option<Field>,
}
#[derive(Debug, Deserialize)]
#[serde(rename_all = "snake_case")]
struct DateTime {
#[serde(flatten)]
field: Field,
kind: DateTimeKind,
format: String,
#[serde(deserialize_with = "deserialize_tz")]
tz: Tz,
}
#[derive(Debug, Deserialize)]
#[serde(rename_all = "snake_case")]
enum DateTimeKind {
Date,
DateTime,
}
#[derive(Debug, Deserialize)]
#[serde(tag = "type", rename_all = "snake_case")]
enum Filter {}
#[derive(Debug, Deserialize)]
struct Field {
#[serde(deserialize_with = "deserialize_selector")]
selector: Selector,
#[serde(default)]
data: Data,
#[serde(default)]
filters: Vec<Filter>,
}
#[derive(Debug)]
enum Data {
Text,
InnerHtml,
Attr { attr: String },
}
impl Data {
fn extract(&self, element: &ElementRef) -> Option<String> {
match self {
Data::Text => Some(
element
.text()
.map(str::trim)
.filter(|s| !s.is_empty())
.collect::<Vec<_>>()
.join(" "),
),
Data::InnerHtml => Some(element.inner_html()),
Data::Attr { attr } => element.value().attr(attr).map(|data| data.to_string()),
}
}
}
impl Default for Data {
fn default() -> Self {
Data::Text
}
}
impl<'de> Deserialize<'de> for Data {
fn deserialize<D>(deserializer: D) -> Result<Data, D::Error>
where
D: Deserializer<'de>,
{
use std::fmt;
use serde::de::{self, Visitor};
struct StrVisitor;
impl<'de> Visitor<'de> for StrVisitor {
type Value = Data;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("an str")
}
fn visit_str<E>(self, value: &str) -> Result<Self::Value, E>
where
E: de::Error,
{
match value {
"text" => Ok(Data::Text),
"inner_html" => Ok(Data::InnerHtml),
s if s.starts_with("attr:") => {
let attr = s.splitn(2, ":").nth(1).unwrap();
Ok(Data::Attr {
attr: attr.to_string(),
})
}
_ => Err(E::custom(format!("unknown data type: {}", value))),
}
}
}
deserializer.deserialize_str(StrVisitor)
}
}
impl Probe for Definition {
fn provider(&self) -> &str {
&self.name
}
fn uri(&self, number: &str) -> String {
let mut tt = TinyTemplate::new();
tt.add_template("path", &self.path)
.expect("failed to add path template");
let context = Context {
number: number.to_string(),
};
tt.render("path", &context)
.expect("failed to render path template")
}
fn fetch(&self, number: &str) -> Result<String, ()> {
reqwest::get(&self.uri(number))
.map_err(|_| ())?
.text()
.map_err(|_| ())
}
fn parse(&self, data: &str) -> Result<Entry, ()> {
let html = Html::parse_document(data);
let mut messages = Vec::new();
let mut history = Vec::new();
let mut comments = Vec::new();
for field in &self.messages {
for element in html.select(&field.selector) {
if let Some(data) = field.data.extract(&element) {
messages.push(data);
}
}
}
for field in &self.history {
for element in html.select(&field.selector) {
if let Some(data) = field.data.extract(&element) {
history.push(data);
}
}
}
for comment in &self.comments {
for comments_element in html.select(&comment.selector) {
let mut datetime: Option<Date> = None;
let mut title: Option<String> = None;
let mut message: Option<String> = None;
if let Some(ref datetime_field) = comment.datetime {
for comment_element in comments_element.select(&datetime_field.field.selector) {
if let Some(data) = datetime_field.field.data.extract(&comment_element) {
// for filter in &datetime_field.field.filters {}
let data = match datetime_field.kind {
DateTimeKind::Date => Date::date_from(
datetime_field.tz,
&data,
&datetime_field.format,
)
.expect("failed to parse date"),
DateTimeKind::DateTime => Date::datetime_from(
datetime_field.tz,
&data,
&datetime_field.format,
)
.expect("failed to parse date time"),
};
datetime = Some(data);
}
}
}
if let Some(ref title_field) = comment.title {
for comment_element in comments_element.select(&title_field.selector) {
if let Some(data) = title_field
.data
.extract(&comment_element)
.filter(|data| !data.is_empty())
{
// for filter in &message_field.filters {}
title = Some(data);
}
}
}
if let Some(ref message_field) = comment.message {
for comment_element in comments_element.select(&message_field.selector) {
if let Some(data) = message_field.data.extract(&comment_element) {
// for filter in &message_field.filters {}
message = Some(data);
}
}
}
if datetime.is_some() && message.is_some() {
comments.push(entry::Comment {
datetime: datetime.unwrap(),
title,
message: message.unwrap(),
});
}
}
}
if !messages.is_empty() || !history.is_empty() || !comments.is_empty() {
Ok(Entry {
messages,
history,
comments,
})
} else {
Err(())
}
}
}
fn deserialize_selector<'de, D>(deserializer: D) -> Result<Selector, D::Error>
where
D: Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
Selector::parse(&s).map_err(|_| de::Error::custom("failed to parse selector"))
}
fn deserialize_tz<'de, D>(deserializer: D) -> Result<Tz, D::Error>
where
D: Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
s.parse::<Tz>()
.map_err(|_| de::Error::custom("failed to parse tz"))
}
-131
View File
@@ -1,131 +0,0 @@
use std::fmt;
use chrono::offset::LocalResult;
use chrono::{Local, NaiveDate, NaiveDateTime, TimeZone, Utc};
use serde::{de, Deserialize, Deserializer, Serialize, Serializer};
#[derive(Debug, PartialEq, Serialize)]
pub struct Entry {
pub messages: Vec<String>,
pub history: Vec<String>,
pub comments: Vec<Comment>,
}
impl fmt::Display for Entry {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
if !self.messages.is_empty() {
for message in &self.messages {
writeln!(f, " {}", message)?;
}
}
if !self.history.is_empty() {
for history in &self.history {
writeln!(f, " {}", history)?;
}
}
if !self.comments.is_empty() {
for comment in &self.comments {
writeln!(f, " * {}", comment)?;
}
}
Ok(())
}
}
#[derive(Debug, PartialEq, Serialize)]
pub struct Comment {
pub datetime: Date,
pub title: Option<String>,
pub message: String,
}
impl fmt::Display for Comment {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
if let Some(ref title) = self.title {
write!(f, "{}: {} - {}", self.datetime, title, self.message)
} else {
write!(f, "{}: {}", self.datetime, self.message)
}
}
}
#[derive(Debug, PartialEq, Eq, Serialize, PartialOrd, Ord)]
pub enum Date {
DateTime(chrono::DateTime<Utc>),
#[serde(serialize_with = "serialize_date")]
Date(chrono::Date<Utc>),
}
impl Date {
pub fn datetime_from<T>(tz: T, s: &str, fmt: &str) -> Result<Date, ()>
where
T: TimeZone,
{
let datetime = NaiveDateTime::parse_from_str(s, fmt).map_err(|_| ())?;
let datetime = match tz.from_local_datetime(&datetime) {
LocalResult::Single(datetime) => datetime,
_ => return Err(()),
};
Ok(Date::DateTime(datetime.with_timezone(&Utc)))
}
pub fn date_from<T>(tz: T, s: &str, fmt: &str) -> Result<Date, ()>
where
T: TimeZone,
{
let date = NaiveDate::parse_from_str(s, fmt).map_err(|_| ())?;
let date = match tz.from_local_date(&date) {
LocalResult::Single(date) => date,
_ => return Err(()),
};
Ok(Date::Date(date.with_timezone(&Utc)))
}
}
impl fmt::Display for Date {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
Date::DateTime(datetime) => {
let datetime = datetime.with_timezone(&Local);
write!(f, "{}", datetime.format("%Y-%m-%d %H:%M:%S"))
}
Date::Date(date) => {
let date = date.with_timezone(&Local);
write!(f, "{}", date.format("%Y-%m-%d"))
}
}
}
}
fn serialize_date<S>(date: &chrono::Date<Utc>, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let date = date.with_timezone(&Local);
let s = format!("{}", date.format("%Y-%m-%d"));
Serialize::serialize(&s, serializer)
}
#[allow(dead_code)]
fn deserialize_date<'de, D>(deserializer: D) -> Result<chrono::Date<Utc>, D::Error>
where
D: Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
let date = NaiveDate::parse_from_str(&s, "%Y-%m-%d").map_err(de::Error::custom)?;
let date = match Utc.from_local_date(&date) {
LocalResult::Single(date) => date,
_ => return Err(de::Error::custom("")),
};
Ok(date.with_timezone(&Utc))
}
-37
View File
@@ -1,37 +0,0 @@
use std::str;
use scraper::{ElementRef, Html};
pub trait SelectExt {
fn element(&self) -> ElementRef;
fn easy_text(&self) -> String {
let data = self
.element()
.text()
.map(str::trim)
.filter(|s| !s.is_empty())
.collect::<Vec<_>>()
.join(" ");
htmlescape::decode_html(&data).unwrap_or(data)
}
fn easy_inner_html(&self) -> String {
let data = self.element().inner_html();
htmlescape::decode_html(&data).unwrap_or(data)
}
}
impl SelectExt for Html {
fn element(&self) -> ElementRef {
self.root_element()
}
}
impl<'a> SelectExt for ElementRef<'a> {
fn element(&self) -> ElementRef {
*self
}
}
-9
View File
@@ -1,9 +0,0 @@
mod context;
pub mod definition;
pub mod entry;
mod html;
mod probe;
pub use crate::context::Context;
pub use crate::definition::Definition;
pub use crate::probe::*;
-120
View File
@@ -1,120 +0,0 @@
use std::fs;
use std::io::Read;
use std::path::PathBuf;
use std::process::Command;
use fern::colors::{Color, ColoredLevelConfig};
use structopt::StructOpt;
use whoareyou::*;
#[derive(Debug, StructOpt)]
#[structopt(name = "whoareyou", about = "Search for swedish phone numbers.")]
struct Opt {
#[structopt(short = "v", parse(from_occurrences))]
verbose: u8,
#[structopt(short = "o", long = "open")]
open: bool,
#[structopt(short = "d", long = "definitions", parse(from_os_str))]
definitions: Vec<PathBuf>,
number: String,
}
fn main() {
let opt = Opt::from_args();
let colors = ColoredLevelConfig::new()
.error(Color::Red)
.warn(Color::Yellow)
.info(Color::White)
.debug(Color::White)
.trace(Color::BrightBlack);
let mut config = fern::Dispatch::new()
.format(move |out, message, record| {
out.finish(format_args!(
"{}[{}][{}] {}",
chrono::Local::now().format("[%Y-%m-%d %H:%M:%S]"),
record.target(),
colors.color(record.level()),
message
))
})
.level_for("reqwest", log::LevelFilter::Off)
.level_for("hyper", log::LevelFilter::Off)
.level_for("tokio_reactor", log::LevelFilter::Off)
.level_for("html5ever", log::LevelFilter::Off)
.level_for("selectors", log::LevelFilter::Off)
.chain(std::io::stdout());
config = match opt.verbose {
0 => config.level(log::LevelFilter::Info),
1 => config.level(log::LevelFilter::Debug),
2 => config.level(log::LevelFilter::Debug),
_ => config.level(log::LevelFilter::Trace),
};
config.apply().expect("failed to init fern");
let mut probes: Vec<Box<dyn Probe>> = vec![Box::new(Hitta)];
let mut buffer = Vec::new();
for definition in &opt.definitions {
let definition = fs::File::open(&definition)
.and_then(|mut file| {
file.read_to_end(&mut buffer)
.expect("failed to read definition file");
let definition: Definition =
toml::from_slice(&buffer).expect("failed to parse definition file");
buffer.clear();
Ok(definition)
})
.expect("failed to open definition file");
probes.push(Box::new(definition));
}
if opt.open {
for probe in &mut probes {
let uri = probe.uri(&opt.number);
Command::new("open")
.arg(uri)
.output()
.expect("failed to execute process");
}
} else {
let mut ctx = Context::new();
let mut first = true;
for probe in &mut probes {
let data = if let Some(cache) = ctx.cache_get(probe.provider(), &opt.number) {
String::from_utf8(cache.data).unwrap()
} else if let Ok(data) = probe.fetch(&opt.number) {
ctx.cache_set(probe.provider(), &opt.number, data.as_bytes())
.expect("wut?! why not?!");
data
} else {
continue;
};
if let Ok(entry) = probe.parse(&data) {
if first {
print!("{}\n{}", probe.provider(), entry);
first = false;
} else {
print!("\n{}\n{}", probe.provider(), entry);
}
}
}
}
}
-13
View File
@@ -1,13 +0,0 @@
use crate::entry::Entry;
mod hitta;
pub use self::hitta::Hitta;
pub trait Probe {
fn provider(&self) -> &str;
fn uri(&self, _: &str) -> String;
fn fetch(&self, _: &str) -> Result<String, ()>;
fn parse(&self, _: &str) -> Result<Entry, ()>;
}
-186
View File
@@ -1,186 +0,0 @@
use lazy_static::lazy_static;
use scraper::{Html, Selector};
use crate::entry::Entry;
use crate::html::SelectExt;
use crate::probe::Probe;
lazy_static! {
static ref MESSAGE: Selector = Selector::parse(".CompanyResultListItem h3.name > a").unwrap();
static ref HISTORY_1: Selector =
Selector::parse("div.PhoneNoHit div.search-info-container p").unwrap();
static ref HISTORY_2: Selector =
Selector::parse("div.feedback-types div.feedback-type-item").unwrap();
}
fn from_html(document: &str) -> Result<Entry, ()> {
let html = Html::parse_document(document);
let mut messages = Vec::new();
let mut history = Vec::new();
let comments = Vec::new();
if let Some(message) = html
.select(&MESSAGE)
.next()
.map(|element| element.easy_text())
{
messages.push(message);
}
if let Some(message) = html
.select(&HISTORY_1)
.next()
.map(|element| element.easy_text())
{
history.push(message);
}
for message in html.select(&HISTORY_2).map(|element| element.easy_text()) {
history.push(message);
}
Ok(Entry {
messages,
history,
comments,
})
}
pub struct Eniro;
impl Probe for Eniro {
fn provider(&self) -> &'static str {
"eniro.se"
}
fn uri(&self, number: &str) -> String {
format!("https://gulasidorna.eniro.se/hitta:{}", number)
}
fn fetch(&self, number: &str) -> Result<String, ()> {
reqwest::get(&self.uri(number))
.map_err(|_| ())?
.text()
.map_err(|_| ())
}
fn parse(&self, data: &str) -> Result<Entry, ()> {
from_html(&data)
}
}
#[cfg(test)]
mod tests {
use insta::assert_yaml_snapshot_matches;
use super::*;
#[test]
fn test_0104754350() {
let document = include_str!("../../fixtures/eniro/0104754350.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages:
- Företaget bedriver telefonförsäljning eller marknadsundersökningar
history: []
comments: []"###);
}
#[test]
fn test_0313908905() {
let document = include_str!("../../fixtures/eniro/0313908905.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- 3464 denna vecka och 6637 totalt.
- 76 Försäljning
- 47 Oseriös verksamhet
- 37 Annat
comments: []"###);
}
#[test]
fn test_0702269893() {
let document = include_str!("../../fixtures/eniro/0702269893.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages:
- Anonym Kund För Refill
history: []
comments: []"###);
}
#[test]
fn test_0726443387() {
let document = include_str!("../../fixtures/eniro/0726443387.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- 16 denna vecka och 98 totalt.
comments: []"###);
}
#[test]
fn test_0751793426() {
let document = include_str!("../../fixtures/eniro/0751793426.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- 20 denna vecka och 602 totalt.
- 11 Försäljning
- 9 Annat
- 7 Oseriös verksamhet
comments: []"###);
}
#[test]
fn test_0751793483() {
let document = include_str!("../../fixtures/eniro/0751793483.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- 29 denna vecka och 900 totalt.
- 5 Annat
- 4 Oseriös verksamhet
- 3 Marknadsföring
comments: []"###);
}
#[test]
fn test_0751793499() {
let document = include_str!("../../fixtures/eniro/0751793499.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- 303 denna vecka och 304 totalt.
comments: []"###);
}
#[test]
fn test_0701807618() {
let document = include_str!("../../fixtures/eniro/0701807618.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- 0 denna vecka och 1 totalt.
comments: []"###);
}
#[test]
fn test_0546780862() {
let document = include_str!("../../fixtures/eniro/0546780862.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages:
- Nya Wermlands-Tidningens AB
history: []
comments: []"###);
}
}
-335
View File
@@ -1,335 +0,0 @@
use chrono::{TimeZone, Utc};
use log::{debug, trace};
use regex::Regex;
use serde::Deserialize;
use crate::entry::{self, Date, Entry};
use crate::probe::Probe;
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
struct Data {
props: Props,
}
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
struct Props {
page_props: PageProps,
}
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
struct PageProps {
status_code: Option<u16>,
phone_data: Option<PhoneData>,
}
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
struct PhoneData {
alternative_formats: Vec<String>,
clean_number: String,
#[serde(default)]
comments: Vec<Comment>,
statistics_text: String,
}
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
struct Comment {
comment: String,
timestamp: u64,
}
fn from_html(document: &str) -> Result<Entry, ()> {
let re = Regex::new(r#"<script>__NEXT_DATA__ = (.*?);__NEXT_LOADED_PAGES__"#).unwrap();
let result = re.captures(&document).ok_or_else(|| {
debug!("Hitta: failed to find __NEXT_DATA__");
})?;
let json = result.get(1).unwrap().as_str();
trace!(
"Hitta: {:#?}",
serde_json::from_str::<serde_json::Value>(&json)
);
if let Ok(data) = serde_json::from_str::<Data>(&json) {
let messages = Vec::new();
let mut history = Vec::new();
let mut comments = Vec::new();
if let Some(phone_data) = data.props.page_props.phone_data {
history.push(phone_data.statistics_text);
for comment in phone_data.comments {
comments.push(entry::Comment {
datetime: Date::DateTime(Utc.timestamp(
(comment.timestamp / 1000) as i64,
(comment.timestamp % 1000) as u32,
)),
title: None,
message: comment.comment,
});
}
comments.sort_by(|a, b| b.datetime.cmp(&a.datetime));
}
Ok(Entry {
messages,
history,
comments,
})
} else {
if let Err(error) = serde_json::from_str::<Data>(&json) {
debug!("Hitta: failed to deserialize data: {:#?}", error);
}
Err(())
}
}
pub struct Hitta;
impl Probe for Hitta {
fn provider(&self) -> &'static str {
"hitta.se"
}
fn uri(&self, number: &str) -> String {
format!("https://www.hitta.se/vem-ringde/{}", number)
}
fn fetch(&self, number: &str) -> Result<String, ()> {
reqwest::get(&self.uri(number))
.map_err(|_| ())?
.text()
.map_err(|_| ())
}
fn parse(&self, data: &str) -> Result<Entry, ()> {
from_html(&data)
}
}
#[cfg(test)]
mod tests {
use insta::assert_yaml_snapshot;
use super::*;
#[test]
fn test_0104754350() {
let document = include_str!("../../fixtures/hitta/0104754350.html");
assert_yaml_snapshot!(from_html(&document), @r###"---
Ok:
messages: []
history:
- 42 andra har rapporterat detta nummer
comments:
- datetime:
DateTime: "2019-01-17T17:29:22Z"
title: ~
message: Varmsälj från Folksam
- datetime:
DateTime: "2018-12-14T13:45:28Z"
title: ~
message: Folksam
- datetime:
DateTime: "2018-11-28T07:30:18Z"
title: ~
message: Höglandschskt
- datetime:
DateTime: "2018-11-20T19:18:09Z"
title: ~
message: "Försäljare "
- datetime:
DateTime: "2018-11-19T17:38:34Z"
title: ~
message: mögg från Folksam
- datetime:
DateTime: "2018-11-12T16:00:41Z"
title: ~
message: Folksam försäkringsförsäljare
- datetime:
DateTime: "2018-10-25T10:28:36Z"
title: ~
message: folksam
- datetime:
DateTime: "2018-10-10T07:30:40Z"
title: ~
message: Telefonförsäljare
- datetime:
DateTime: "2018-10-04T10:04:55Z"
title: ~
message: Folksam säljare
- datetime:
DateTime: "2018-10-03T13:55:19Z"
title: ~
message: Sa inget.
- datetime:
DateTime: "2018-08-24T16:56:46Z"
title: ~
message: Folksam
- datetime:
DateTime: "2018-08-24T09:42:43Z"
title: ~
message: Achmati azmut från folksam
- datetime:
DateTime: "2018-08-21T18:29:29Z"
title: ~
message: Folksam
- datetime:
DateTime: "2018-08-16T18:56:56Z"
title: ~
message: Säljare från Folksam.
- datetime:
DateTime: "2018-08-16T14:48:59Z"
title: ~
message: "Folksam "
- datetime:
DateTime: "2018-08-09T16:30:28Z"
title: ~
message: Folksam
- datetime:
DateTime: "2018-08-02T16:29:32Z"
title: ~
message: "Folksam "
- datetime:
DateTime: "2018-08-02T15:33:38Z"
title: ~
message: "Folksam "
- datetime:
DateTime: "2018-07-25T08:28:27Z"
title: ~
message: Säljare Folksam
- datetime:
DateTime: "2018-07-17T21:20:51Z"
title: ~
message: "Inga Hansson "
- datetime:
DateTime: "2018-07-16T18:11:46Z"
title: ~
message: Folksam
- datetime:
DateTime: "2018-07-06T15:45:46Z"
title: ~
message: "Folksam "
- datetime:
DateTime: "2018-07-05T17:24:07Z"
title: ~
message: folksam
- datetime:
DateTime: "2018-07-05T11:15:02Z"
title: ~
message: Vesran
- datetime:
DateTime: "2018-07-04T13:30:49Z"
title: ~
message: Folksam
- datetime:
DateTime: "2018-06-29T10:52:51Z"
title: ~
message: folksam
- datetime:
DateTime: "2018-06-28T13:33:01Z"
title: ~
message: Säljare folksam
- datetime:
DateTime: "2018-06-28T07:42:42Z"
title: ~
message: Folksam försäkringar
- datetime:
DateTime: "2018-06-26T12:59:33Z"
title: ~
message: Säljare Folksam"###);
}
#[test]
fn test_0313908905() {
let document = include_str!("../../fixtures/hitta/0313908905.html");
assert_yaml_snapshot!(from_html(&document), @r###"---
Ok:
messages: []
history: []
comments: []"###);
}
#[test]
fn test_0702269893() {
let document = include_str!("../../fixtures/hitta/0702269893.html");
assert_yaml_snapshot!(from_html(&document), @r###"---
Ok:
messages: []
history:
- Tre andra har också sökt på detta nummer
comments: []"###);
}
#[test]
fn test_0726443387() {
let document = include_str!("../../fixtures/hitta/0726443387.html");
assert_yaml_snapshot!(from_html(&document), @r###"---
Ok:
messages: []
history:
- 1299 andra har också sökt på detta nummer
comments: []"###);
}
#[test]
fn test_0751793426() {
let document = include_str!("../../fixtures/hitta/0751793426.html");
assert_yaml_snapshot!(from_html(&document), @r###"---
Ok:
messages: []
history: []
comments: []"###);
}
#[test]
fn test_0751793483() {
let document = include_str!("../../fixtures/hitta/0751793483.html");
assert_yaml_snapshot!(from_html(&document), @r###"---
Ok:
messages: []
history: []
comments: []"###);
}
#[test]
fn test_0751793499() {
let document = include_str!("../../fixtures/hitta/0751793499.html");
assert_yaml_snapshot!(from_html(&document), @r###"---
Ok:
messages: []
history: []
comments: []"###);
}
#[test]
fn test_0701807618() {
let document = include_str!("../../fixtures/hitta/0701807618.html");
assert_yaml_snapshot!(from_html(&document), @r###"---
Err: ~"###);
}
#[test]
fn test_0546780862() {
let document = include_str!("../../fixtures/hitta/0546780862.html");
assert_yaml_snapshot!(from_html(&document), @r###"---
Err: ~"###);
}
}
-132
View File
@@ -1,132 +0,0 @@
use lazy_static::lazy_static;
use scraper::{Html, Selector};
use crate::html::SelectExt;
use crate::probe::{Entry, Probe};
lazy_static! {
static ref MESSAGE: Selector = Selector::parse(".panel-heading > h1:nth-child(3)").unwrap();
}
fn from_html(document: &str) -> Result<Entry, ()> {
let html = Html::parse_document(document);
let mut messages = Vec::new();
let history = Vec::new();
let comments = Vec::new();
if let Some(message) = html
.select(&MESSAGE)
.next()
.map(|element| element.easy_text())
{
messages.push(message);
}
if !messages.is_empty() {
Ok(Entry {
messages,
history,
comments,
})
} else {
Err(())
}
}
pub struct KonsumentInfo;
impl Probe for KonsumentInfo {
fn provider(&self) -> &'static str {
"konsumentinfo.se"
}
fn uri(&self, number: &str) -> String {
format!("http://konsumentinfo.se/telefonnummer/sverige/{}", number)
}
fn fetch(&self, number: &str) -> Result<String, ()> {
reqwest::get(&self.uri(number))
.map_err(|_| ())?
.text()
.map_err(|_| ())
}
fn parse(&self, data: &str) -> Result<Entry, ()> {
from_html(&data)
}
}
#[cfg(test)]
mod tests {
use insta::assert_yaml_snapshot_matches;
use super::*;
#[test]
fn test_0104754350() {
let document = include_str!("../../fixtures/konsumentinfo/0104754350.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0313908905() {
let document = include_str!("../../fixtures/konsumentinfo/0313908905.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0702269893() {
let document = include_str!("../../fixtures/konsumentinfo/0702269893.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages:
- Hydroscand AB
history: []
comments: []"###);
}
#[test]
fn test_0726443387() {
let document = include_str!("../../fixtures/konsumentinfo/0726443387.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0751793426() {
let document = include_str!("../../fixtures/konsumentinfo/0751793426.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0751793483() {
let document = include_str!("../../fixtures/konsumentinfo/0751793483.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0751793499() {
let document = include_str!("../../fixtures/konsumentinfo/0751793499.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0701807618() {
let document = include_str!("../../fixtures/konsumentinfo/0701807618.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0546780862() {
let document = include_str!("../../fixtures/konsumentinfo/0546780862.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
}
-245
View File
@@ -1,245 +0,0 @@
use chrono_tz::Europe::Stockholm;
use lazy_static::lazy_static;
use scraper::{Html, Selector};
use crate::entry::{Comment, Date, Entry};
use crate::html::SelectExt;
use crate::probe::Probe;
lazy_static! {
static ref MESSAGE: Selector = Selector::parse("#content p:nth-child(2) i").unwrap();
static ref HISTORY_1: Selector = Selector::parse("#content p:nth-child(4)").unwrap();
static ref HISTORY_2: Selector = Selector::parse("#content p:nth-child(5)").unwrap();
static ref COMMENTS: Selector =
Selector::parse("#kommentarer > [itemtype='http://data-vocabulary.org/Review']").unwrap();
static ref COMMENT_DATETIME: Selector = Selector::parse("small").unwrap();
static ref COMMENT_TITLE: Selector = Selector::parse("h3").unwrap();
static ref COMMENT_MESSAGE: Selector = Selector::parse("[itemprop='description']").unwrap();
}
fn from_html(document: &str) -> Result<Entry, ()> {
let html = Html::parse_document(document);
let mut messages = Vec::new();
let mut history = Vec::new();
let mut comments = Vec::new();
if let Some(element) = html.select(&MESSAGE).next() {
let message = element.inner_html();
let message = htmlescape::decode_html(&message).unwrap();
messages.push(message);
}
if let Some(message) = html
.select(if messages.is_empty() {
&HISTORY_1
} else {
&HISTORY_2
})
.next()
.map(|element| element.easy_text())
{
history.push(message);
}
for comment in html.select(&COMMENTS) {
let datetime = comment
.select(&COMMENT_DATETIME)
.next()
.unwrap()
.value()
.attr("datetime")
.unwrap()
.to_string();
let title = comment
.select(&COMMENT_TITLE)
.next()
.map(|element| element.easy_inner_html())
.filter(|title| !title.is_empty());
let message = comment
.select(&COMMENT_MESSAGE)
.next()
.map(|element| element.easy_inner_html())
.unwrap_or_else(String::new);
comments.push(Comment {
datetime: Date::datetime_from(Stockholm, &datetime, "%Y-%m-%d %H:%M:%S")
.expect("failed to parse datetime"),
title,
message,
});
}
Ok(Entry {
messages,
history,
comments,
})
}
pub struct Telefonforsaljare;
impl Probe for Telefonforsaljare {
fn provider(&self) -> &'static str {
"telefonforsaljare.nu"
}
fn uri(&self, number: &str) -> String {
format!("http://www.telefonforsaljare.nu/telefonnummer/{}/", number)
}
fn fetch(&self, number: &str) -> Result<String, ()> {
reqwest::get(&self.uri(number))
.map_err(|_| ())?
.text()
.map_err(|_| ())
}
fn parse(&self, data: &str) -> Result<Entry, ()> {
from_html(&data)
}
}
#[cfg(test)]
mod tests {
use insta::assert_yaml_snapshot_matches;
use super::*;
#[test]
fn test_0104754350() {
let document = include_str!("../../fixtures/telefonforsaljare/0104754350.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages:
- Folksam
history:
- De senaste 24 timmarna har 9 personer sökt efter numret 0104754350. Det kan tyda på att numret används av telefonförsäljare. Totalt har minst 4786 personer sökt efter numret.
comments:
- datetime:
DateTime: "2018-05-09T12:31:39Z"
title: Folksam
message: Svara inte på okända nummer. Blockerat!
- datetime:
DateTime: "2017-12-05T16:33:10Z"
title: Folksam
message: Svarade aldrig men när jag ringde upp var det Folksam
- datetime:
DateTime: "2017-11-28T10:30:10Z"
title: ~
message: Ringde och la på
- datetime:
DateTime: "2017-11-20T14:53:16Z"
title: Folksam
message: färsäljare
- datetime:
DateTime: "2017-11-16T12:38:07Z"
title: Folksam
message: "missat samtal, ringde tillbaka och automatsvar sa att det var folksam som sökt mig för att presentera ett erbjudande."
- datetime:
DateTime: "2017-10-25T05:59:26Z"
title: Folksam
message: Försäljare"###);
}
#[test]
fn test_0313908905() {
let document = include_str!("../../fixtures/telefonforsaljare/0313908905.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- Du är den första de senaste 24 timmarna som söker efter detta nummer. Det tyder på att numret inte används av telefonförsäljare. Totalt har minst 301 personer sökt efter numret.
comments: []"###);
}
#[test]
fn test_0702269893() {
let document = include_str!("../../fixtures/telefonforsaljare/0702269893.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages:
- Alnö Design & Produktion AB
history:
- De senaste 24 timmarna har 3 personer sökt efter numret 0702269893. Det kan tyda på att numret används av telefonförsäljare. Totalt har minst 4 personer sökt efter numret.
comments:
- datetime:
DateTime: "2019-01-18T13:30:55Z"
title: Alnö Design & Produktion AB
message: "Renhållning, service, kemprodukter""###);
}
#[test]
fn test_0726443387() {
let document = include_str!("../../fixtures/telefonforsaljare/0726443387.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages:
- Tele2
history:
- De senaste 24 timmarna har 1 personer sökt efter numret 0726443387. Det kan tyda på att numret används av telefonförsäljare. Totalt har minst 231 personer sökt efter numret.
comments:
- datetime:
DateTime: "2018-10-31T17:48:27Z"
title: Tele2
message: Bättre priser som inte finns online"###);
}
#[test]
fn test_0751793426() {
let document = include_str!("../../fixtures/telefonforsaljare/0751793426.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- Du är den första de senaste 24 timmarna som söker efter detta nummer. Det tyder på att numret inte används av telefonförsäljare. Totalt har minst 38 personer sökt efter numret.
comments: []"###);
}
#[test]
fn test_0751793483() {
let document = include_str!("../../fixtures/telefonforsaljare/0751793483.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- Du är den första de senaste 24 timmarna som söker efter detta nummer. Det tyder på att numret inte används av telefonförsäljare. Totalt har minst 25 personer sökt efter numret.
comments: []"###);
}
#[test]
fn test_0751793499() {
let document = include_str!("../../fixtures/telefonforsaljare/0751793499.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- Du är den första de senaste 24 timmarna som söker efter detta nummer. Det tyder på att numret inte används av telefonförsäljare. Totalt har minst 22 personer sökt efter numret.
comments: []"###);
}
#[test]
fn test_0701807618() {
let document = include_str!("../../fixtures/telefonforsaljare/0701807618.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- De senaste 24 timmarna har 1 personer sökt efter numret 0701807618. Det kan tyda på att numret används av telefonförsäljare. Totalt har minst 2 personer sökt efter numret.
comments: []"###);
}
#[test]
fn test_0546780862() {
let document = include_str!("../../fixtures/telefonforsaljare/0546780862.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history:
- De senaste 24 timmarna har 1 personer sökt efter numret 0546780862. Det kan tyda på att numret används av telefonförsäljare. Totalt har minst 12 personer sökt efter numret.
comments: []"###);
}
}
-218
View File
@@ -1,218 +0,0 @@
use std::str;
use chrono_tz::Europe::Stockholm;
use lazy_static::lazy_static;
use scraper::{Html, Selector};
use crate::entry::{Comment, Date, Entry};
use crate::html::SelectExt;
use crate::probe::Probe;
lazy_static! {
static ref MESSAGE: Selector = Selector::parse("#toporganisations li").unwrap();
static ref COMMENTS: Selector = Selector::parse("#calls ol li").unwrap();
static ref COMMENT_DATETIME: Selector = Selector::parse("div:nth-child(4)").unwrap();
static ref COMMENT_MESSAGE: Selector = Selector::parse("div:nth-child(3)").unwrap();
}
fn from_html(document: &str) -> Result<Entry, ()> {
let html = Html::parse_document(document);
let mut messages = Vec::new();
let history = Vec::new();
let mut comments = Vec::new();
for message in html.select(&MESSAGE).map(|element| element.easy_text()) {
messages.push(message);
}
for element in html.select(&COMMENTS) {
let date = element
.select(&COMMENT_DATETIME)
.next()
.map(|element| element.easy_inner_html())
.expect("failed to find datetime");
let message = element
.select(&COMMENT_MESSAGE)
.next()
.map(|element| element.easy_text())
.unwrap_or_else(String::new);
comments.push(Comment {
datetime: Date::date_from(Stockholm, &date, "%Y-%m-%d").expect("failed to parse date"),
title: None,
message,
});
}
if !messages.is_empty() || !comments.is_empty() {
Ok(Entry {
messages,
history,
comments,
})
} else {
Err(())
}
}
pub struct VemRingde;
impl Probe for VemRingde {
fn provider(&self) -> &'static str {
"vemringde.se"
}
fn uri(&self, number: &str) -> String {
format!("http://vemringde.se/?q={}", number)
}
fn fetch(&self, number: &str) -> Result<String, ()> {
reqwest::get(&self.uri(number))
.map_err(|_| ())?
.text()
.map_err(|_| ())
}
fn parse(&self, data: &str) -> Result<Entry, ()> {
from_html(&data)
}
}
#[cfg(test)]
mod tests {
use insta::assert_yaml_snapshot_matches;
use super::*;
#[test]
fn test_0104754350() {
let document = include_str!("../../fixtures/vemringde/0104754350.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages:
- Folksam (5 samtal)
history: []
comments:
- datetime:
Date: 2018-11-07
title: ~
message: Folksam
- datetime:
Date: 2018-06-05
title: ~
message: Folksam
- datetime:
Date: 2018-04-18
title: ~
message: Folksam
- datetime:
Date: 2018-03-19
title: ~
message: okänd
- datetime:
Date: 2018-03-07
title: ~
message: okänd
- datetime:
Date: 2018-02-06
title: ~
message: Folksam spam
- datetime:
Date: 2017-12-20
title: ~
message: svarade ej
- datetime:
Date: 2017-12-07
title: ~
message: okänd
- datetime:
Date: 2017-12-05
title: ~
message: okänd
- datetime:
Date: 2017-11-21
title: ~
message: Försäljare folksam
- datetime:
Date: 2017-11-14
title: ~
message: Folksam
- datetime:
Date: 2017-11-06
title: ~
message: Folksam
- datetime:
Date: 2017-10-24
title: ~
message: telemarketing
- datetime:
Date: 2017-10-23
title: ~
message: okänd"###);
}
#[test]
fn test_0313908905() {
let document = include_str!("../../fixtures/vemringde/0313908905.html");
assert_yaml_snapshot_matches!(from_html(&document), @r###"Ok:
messages: []
history: []
comments:
- datetime:
Date: 2018-11-26
title: ~
message: callcenter"###);
}
#[test]
fn test_0702269893() {
let document = include_str!("../../fixtures/vemringde/0702269893.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0726443387() {
let document = include_str!("../../fixtures/vemringde/0726443387.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0751793426() {
let document = include_str!("../../fixtures/vemringde/0751793426.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0751793483() {
let document = include_str!("../../fixtures/vemringde/0751793483.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0751793499() {
let document = include_str!("../../fixtures/vemringde/0751793499.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0701807618() {
let document = include_str!("../../fixtures/vemringde/0701807618.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
#[test]
fn test_0546780862() {
let document = include_str!("../../fixtures/vemringde/0546780862.html");
assert_yaml_snapshot_matches!(from_html(&document), @"Err: ~");
}
}
+42
View File
@@ -0,0 +1,42 @@
package whoareyou:provider@0.1.0;
interface lookup {
record provider-info {
name: string,
version: string,
}
record request {
url: string,
}
record response {
status: u16,
body: string,
}
record comment {
timestamp: option<s64>,
title: option<string>,
message: string,
}
record entry {
messages: list<string>,
history: list<string>,
comments: list<comment>,
}
variant lookup-error {
no-data,
parse-failed(string),
}
metadata: func() -> provider-info;
requests: func(number: string) -> list<request>;
parse: func(number: string, responses: list<response>) -> result<entry, lookup-error>;
}
world provider {
export lookup;
}