# xy — HTTP MCP Server Supervisor **Date:** 2026-05-25 **Status:** Approved — ready for implementation planning ## Problem HTTP-based MCP servers (currently two, more likely) need a long-running parent process so they survive terminal closures and can be inspected, restarted, and upgraded without ad-hoc terminal tabs. Today they're launched manually and their lifetime is coupled to a terminal window. ## Goals (MVP) - Run as a background daemon on macOS. - Auto-launch every configured MCP server when the daemon starts. - Provide a CLI to start, stop, restart, reload, list, and tail logs. - Per-server restart policy with backoff. - Capture stdout/stderr to rotating log files and an in-memory ring buffer. ## Non-goals (deferred) - Container isolation (planned for a later phase). - TUI dashboard. - macOS status bar app. - HTTP/MCP-level health probes. - Auto-start at login via launchd (manual daemon launch only for MVP). - Remote management (everything is local-socket only). ## Architecture ``` ┌──────────────────────────────┐ │ xy daemon (process) │ │ │ xy CLI ──────►│ JSON-RPC server │ (Unix socket) │ │ │ │ ▼ │ │ Command handlers │ │ │ │ │ ▼ │ │ Supervisor (one task per │ │ managed server): │ │ spawn → wait → restart │ │ per per-server policy │ │ │ │ │ ▼ │ │ Log capture: stdout/stderr ──►│──► $XDG_STATE_HOME/xy/logs/.log │ Ring buffer (in RAM) │ └──────────────────────────────┘ │ ▼ Child MCP server processes (HTTP, fixed port from KDL) ``` ### Filesystem layout XDG semantics on both Linux and macOS (no `~/Library/Application Support`). Use the `etcetera` crate's `Xdg` strategy, or hand-rolled env-var resolution. | Purpose | Path | |---------------|---------------------------------------------------------| | Configs | `${XDG_CONFIG_HOME:-~/.config}/xy/servers/*.kdl` | | Logs | `${XDG_STATE_HOME:-~/.local/state}/xy/logs/.log` | | Socket | `${XDG_RUNTIME_DIR}/xy.sock` if set, else `${XDG_STATE_HOME:-~/.local/state}/xy/xy.sock` | | Pidfile | `${XDG_STATE_HOME:-~/.local/state}/xy/xy.pid` | Socket permissions: `0600`. ### Concurrency model Tokio multi-thread runtime. Each managed server owns one **supervisor task** that holds the canonical state for that server. RPC handlers communicate with supervisor tasks via channels: - `mpsc::Sender` — `Start`, `Stop`, `Restart`, `Shutdown`, `Reconfigure(ServerConfig)`. - `watch::Receiver` — outsiders observe current state without locks. - `broadcast::Sender` — live `logs --follow` subscribers. No shared mutexes for server state; the supervisor task is the owner. ## Crate layout (Cargo workspace) ``` xy/ ├── Cargo.toml # workspace manifest ├── crates/ │ ├── xy-protocol/ # JSON-RPC types + KDL config schema (lib) │ ├── xy-supervisor/ # process lifecycle, restart policy, log capture (lib) │ ├── xy-ipc/ # socket framing + JSON-RPC client/server (lib) │ └── xy/ # binary: clap CLI + daemon command, wires it all together └── docs/superpowers/specs/ ``` Single `xy` binary; `xy daemon` runs the supervisor in-process, all other subcommands act as JSON-RPC clients. ### Dependencies - `tokio` (features: rt-multi-thread, net, process, signal, sync, fs, io-util, macros) - `clap` with `derive` feature - `serde`, `serde_json` - `kdl` (KDL parser) with a small typed schema wrapper - `tracing`, `tracing-subscriber` (env-filter) - `thiserror` (libraries), `anyhow` (binary) - `etcetera` (XDG paths, works correctly on macOS) - `nix` (SIGTERM/SIGKILL, process groups) Format with `cargo +nightly fmt`. Lint with `cargo clippy --all-targets -- -D warnings`. ## KDL config schema One file per server: `${XDG_CONFIG_HOME}/xy/servers/.kdl`. Filename stem is the canonical server name; the file itself does not repeat it. Example `~/.config/xy/servers/insikt.kdl`: ```kdl command "/Users/olsson/.cargo/bin/insikt-mcp" args "--http" "--port" "8421" port 8421 env { RUST_LOG "info" INSIKT_DATA_DIR "/Users/olsson/.local/share/insikt" } working-dir "/Users/olsson/Laboratory/insikt" restart { policy "on-failure" // "always" | "on-failure" | "never" backoff-initial "1s" backoff-max "30s" max-retries-per-minute 5 } stop { grace "10s" // SIGTERM, then SIGKILL after this } ``` ### Field semantics | Field | Required | Default | Notes | |---|---|---|---| | `command` | yes | — | Absolute path to executable. | | `args` | no | `[]` | String list. | | `port` | yes | — | Informational; xy doesn't bind it. Used for `list` display and load-time conflict detection across configs. | | `env` | no | `{}` | Merged onto inherited parent env; KDL wins on conflict. | | `working-dir` | no | daemon's cwd | Process working directory. | | `restart.policy` | no | `on-failure` | `always` \| `on-failure` \| `never`. | | `restart.backoff-initial` | no | `1s` | Humantime duration. | | `restart.backoff-max` | no | `30s` | Cap for exponential backoff. | | `restart.max-retries-per-minute` | no | `5` | Sliding-60s window. Exceeded → `failed`. | | `stop.grace` | no | `10s` | SIGTERM → wait → SIGKILL window. | ### Validation at load - Every file must parse and produce a complete `ServerConfig`. - No two configs may declare the same `port`. - `command` must exist and be executable (warn but allow if not — child spawn will fail and supervisor will mark `failed`). Validation failures at daemon startup are **fatal** (exit non-zero). Failures during `reload` are returned to the CLI client as JSON-RPC errors; the daemon keeps running. ## JSON-RPC protocol Transport: Unix socket, newline-delimited JSON (one JSON-RPC 2.0 message per line). ### Methods | Method | Params | Result | |----------|-------------------------------|--------| | `list` | — | `[{name, state, pid?, port, uptime_secs?, restart_count, last_exit?}]` | | `status` | `{name}` | single entry as above + recent state transitions | | `start` | `{name}` or `{all: true}` | `{started: [...], already_running: [...]}` | | `stop` | `{name}` or `{all: true}` | `{stopped: [...], not_running: [...]}` | | `restart`| `{name}` or `{all: true}` | `{restarted: [...]}` | | `reload` | — | `{added: [...], removed: [...], changed: [...], unchanged: [...]}` | | `logs` | `{name, tail?: u32, follow?: bool}` | Initial response `{subscription_id}`; the daemon then sends JSON-RPC notifications `log` `{subscription_id, name, stream, line, ts}` for each line. A final `log_end` notification `{subscription_id}` closes the stream. For non-`follow`, `log_end` fires after the buffered tail. For `follow`, the stream stays open until the client closes the connection or calls `logs_cancel {subscription_id}`. | ### Server states `stopped` | `starting` | `running` | `restarting` | `failed` | `stopping` ### `reload` semantics Diff current in-memory configs against on-disk config dir: - **Added** (new file): start. - **Removed** (file gone): stop running process. - **Changed** (content hash differs): stop, then start with new config. - **Unchanged**: leave alone. ### Error codes Standard JSON-RPC error objects with our codes: | Code | Name | |----------|-------------------| | `-32001` | `ServerNotFound` | | `-32002` | `PortConflict` | | `-32003` | `ConfigInvalid` | | `-32004` | `AlreadyRunning` | | `-32005` | `NotRunning` | | `-32006` | `SpawnFailed` | ## Supervisor state machine ``` stopped ─── start ──► starting ▲ │ │ (spawn) (stop_cmd) │ │ ▼ stopping ◄── stop ─── running ─── child_exit ──► (eval policy) │ ▲ │ (SIGTERM, │ │ grace timer, (spawn ok) │ SIGKILL) │ │ │ │ ┌─ restart ─► restarting ──┐ ▼ │ │ │ stopped └──────────────┤ │ │ │ └─ no-restart / cap hit ──► failed │ start ────┘ reload ``` ### Spawn flow 1. Open / rotate log file (append mode; size threshold 10 MB, keep last 5 generations). 2. Build `tokio::process::Command`: - `command`, `args`, merged env, `working-dir`. - `kill_on_drop(true)`. - `process_group(0)` — own process group so signals don't leak. 3. Spawn. Pipe stdout and stderr. 4. Spin up two log pumps per child: - `stdout_pump`: line-buffered → log file + ring buffer + broadcast channel. - `stderr_pump`: same, tagged `stderr`. 5. `await child.wait()`. On exit, evaluate restart policy. ### Stop flow 1. Send `SIGTERM` to the process group. 2. Start grace timer (`stop.grace`). 3. On timer fire, `SIGKILL` the process group. 4. `await child.wait()`. 5. Close log pumps, transition to `stopped`. ### Shutdown (daemon receives SIGTERM/SIGINT) Broadcast `Shutdown` to all supervisor tasks → each runs its stop flow in parallel → daemon awaits all with an outer deadline of `2 × max(stop.grace)` across configs → exit `0`. ### Daemon boot 1. Resolve XDG paths, create state directories if missing. 2. Acquire pidfile (fail if another daemon is alive). 3. Load and validate all configs. Fatal on any failure. 4. Bind Unix socket (0600 perms). 5. Spawn one supervisor task per config; send each an immediate `Start` (auto-launch behavior). 6. Serve JSON-RPC until shutdown signal. ## Log handling Per server: - **Disk file** at `${XDG_STATE_HOME}/xy/logs/.log`. Combined stdout+stderr with a leading tag per line: `[out]` / `[err]`. Size-based rotation: when current file ≥ 10 MB, rename to `.log.1` (shifting older generations), open fresh. Keep at most 5 generations. - **Ring buffer** in RAM, ~1 MB per server, holds the most recent log lines. Source for `logs --tail` without re-reading disk. - **Broadcast channel** (`tokio::sync::broadcast`) for live `logs --follow` subscribers. Lagged subscribers are dropped with a warning. ## CLI surface `clap` with `derive`. Subcommand structure: ``` xy daemon # foreground daemon (logs to stderr) xy list # all configured servers + state xy status # single server detail xy start xy stop xy restart xy reload xy logs [--tail N] [--follow] ``` CLI exit codes: | Code | Meaning | |------|---------| | 0 | success | | 1 | operational error (server not found, port conflict on reload) | | 2 | daemon unreachable (socket missing or refused) | | 3 | config invalid | ## Error handling Two layers, kept separate: - **Libraries** (`xy-protocol`, `xy-supervisor`, `xy-ipc`): `thiserror` enums per crate. Callers can match on variants (e.g., `SupervisorError::AlreadyRunning`, `ConfigError::DuplicatePort { name_a, name_b, port }`). - **Binary** (`xy`): `anyhow` for top-level startup and CLI reporting. The IPC layer has one match site translating typed errors into JSON-RPC error objects. ### Fatal vs non-fatal | Class | Examples | Behavior | |---|---|---| | Fatal at daemon startup | socket bind fails; state dir uncreatable; any config invalid; duplicate port | exit non-zero, log to stderr | | Non-fatal at runtime | child spawn fails; restart cap hit; log file write fails | log, mark server `failed` (or degrade log subsystem), daemon keeps running | ## Testing strategy ### Unit tests - **`xy-supervisor`**: state-machine transitions using a mock `ChildHandle` trait so tests don't actually spawn processes. Cases: - Restart policy decisions (`always` / `on-failure` / `never` × clean/dirty exit). - Backoff math (initial, exponential, cap). - Retry window (sliding 60s) → `failed` transition. - Stop flow: grace timer expires → SIGKILL escalation. - **`xy-protocol`**: KDL parse cases (minimal, full, invalid). JSON-RPC envelope round-trips. Error-code mapping. ### Integration tests In `crates/xy/tests/`: - Spin up the real daemon on a temp socket with temp state and config dirs (per-test `XDG_*` env via `tempfile`). - Use tiny long-running test-only binaries built in the workspace: - `xy-test-sleep-server`: sleeps until SIGTERM, prints periodic lines. - `xy-test-exit-immediately`: exits non-zero immediately, used for failure-mode tests. - Drive the real CLI subcommands; assert on `list` output and observable state transitions. ### CI ``` cargo +nightly fmt --check cargo clippy --all-targets -- -D warnings cargo test --all ``` ## Future work (out of scope for MVP) - Container isolation (rootless podman / Docker backend per server). - HTTP/MCP-aware health probes. - launchd LaunchAgent install command. - TUI dashboard (would reuse `xy-protocol` over the same socket). - macOS status bar app (same). - Optional auth on the socket if it ever leaves the user's machine.