Approved design for the MVP: single xy binary with a Cargo workspace (xy-protocol, xy-supervisor, xy-ipc, xy), Unix socket + newline-delimited JSON-RPC, per-server KDL configs at XDG paths (XDG on macOS too via etcetera), supervisor-per-server task model with per-server restart policy, log capture to disk + ring buffer + broadcast for follow. MVP commands: daemon, list, status, start/stop/restart (name|--all), reload, logs. Process-alive supervision only; HTTP/MCP-aware probes, container isolation, launchd integration, and TUI deferred. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 KiB
xy — HTTP MCP Server Supervisor
Date: 2026-05-25 Status: Approved — ready for implementation planning
Problem
HTTP-based MCP servers (currently two, more likely) need a long-running parent process so they survive terminal closures and can be inspected, restarted, and upgraded without ad-hoc terminal tabs. Today they're launched manually and their lifetime is coupled to a terminal window.
Goals (MVP)
- Run as a background daemon on macOS.
- Auto-launch every configured MCP server when the daemon starts.
- Provide a CLI to start, stop, restart, reload, list, and tail logs.
- Per-server restart policy with backoff.
- Capture stdout/stderr to rotating log files and an in-memory ring buffer.
Non-goals (deferred)
- Container isolation (planned for a later phase).
- TUI dashboard.
- macOS status bar app.
- HTTP/MCP-level health probes.
- Auto-start at login via launchd (manual daemon launch only for MVP).
- Remote management (everything is local-socket only).
Architecture
┌──────────────────────────────┐
│ xy daemon (process) │
│ │
xy CLI ──────►│ JSON-RPC server │
(Unix socket) │ │ │
│ ▼ │
│ Command handlers │
│ │ │
│ ▼ │
│ Supervisor (one task per │
│ managed server): │
│ spawn → wait → restart │
│ per per-server policy │
│ │ │
│ ▼ │
│ Log capture: stdout/stderr ──►│──► $XDG_STATE_HOME/xy/logs/<name>.log
│ Ring buffer (in RAM) │
└──────────────────────────────┘
│
▼
Child MCP server processes
(HTTP, fixed port from KDL)
Filesystem layout
XDG semantics on both Linux and macOS (no ~/Library/Application Support).
Use the etcetera crate's Xdg strategy, or hand-rolled env-var resolution.
| Purpose | Path |
|---|---|
| Configs | ${XDG_CONFIG_HOME:-~/.config}/xy/servers/*.kdl |
| Logs | ${XDG_STATE_HOME:-~/.local/state}/xy/logs/<name>.log |
| Socket | ${XDG_RUNTIME_DIR}/xy.sock if set, else ${XDG_STATE_HOME:-~/.local/state}/xy/xy.sock |
| Pidfile | ${XDG_STATE_HOME:-~/.local/state}/xy/xy.pid |
Socket permissions: 0600.
Concurrency model
Tokio multi-thread runtime. Each managed server owns one supervisor task that holds the canonical state for that server. RPC handlers communicate with supervisor tasks via channels:
mpsc::Sender<SupervisorCmd>—Start,Stop,Restart,Shutdown,Reconfigure(ServerConfig).watch::Receiver<ServerState>— outsiders observe current state without locks.broadcast::Sender<LogLine>— livelogs --followsubscribers.
No shared mutexes for server state; the supervisor task is the owner.
Crate layout (Cargo workspace)
xy/
├── Cargo.toml # workspace manifest
├── crates/
│ ├── xy-protocol/ # JSON-RPC types + KDL config schema (lib)
│ ├── xy-supervisor/ # process lifecycle, restart policy, log capture (lib)
│ ├── xy-ipc/ # socket framing + JSON-RPC client/server (lib)
│ └── xy/ # binary: clap CLI + daemon command, wires it all together
└── docs/superpowers/specs/
Single xy binary; xy daemon runs the supervisor in-process, all other
subcommands act as JSON-RPC clients.
Dependencies
tokio(features: rt-multi-thread, net, process, signal, sync, fs, io-util, macros)clapwithderivefeatureserde,serde_jsonkdl(KDL parser) with a small typed schema wrappertracing,tracing-subscriber(env-filter)thiserror(libraries),anyhow(binary)etcetera(XDG paths, works correctly on macOS)nix(SIGTERM/SIGKILL, process groups)
Format with cargo +nightly fmt. Lint with cargo clippy --all-targets -- -D warnings.
KDL config schema
One file per server: ${XDG_CONFIG_HOME}/xy/servers/<name>.kdl. Filename stem
is the canonical server name; the file itself does not repeat it.
Example ~/.config/xy/servers/insikt.kdl:
command "/Users/olsson/.cargo/bin/insikt-mcp"
args "--http" "--port" "8421"
port 8421
env {
RUST_LOG "info"
INSIKT_DATA_DIR "/Users/olsson/.local/share/insikt"
}
working-dir "/Users/olsson/Laboratory/insikt"
restart {
policy "on-failure" // "always" | "on-failure" | "never"
backoff-initial "1s"
backoff-max "30s"
max-retries-per-minute 5
}
stop {
grace "10s" // SIGTERM, then SIGKILL after this
}
Field semantics
| Field | Required | Default | Notes |
|---|---|---|---|
command |
yes | — | Absolute path to executable. |
args |
no | [] |
String list. |
port |
yes | — | Informational; xy doesn't bind it. Used for list display and load-time conflict detection across configs. |
env |
no | {} |
Merged onto inherited parent env; KDL wins on conflict. |
working-dir |
no | daemon's cwd | Process working directory. |
restart.policy |
no | on-failure |
always | on-failure | never. |
restart.backoff-initial |
no | 1s |
Humantime duration. |
restart.backoff-max |
no | 30s |
Cap for exponential backoff. |
restart.max-retries-per-minute |
no | 5 |
Sliding-60s window. Exceeded → failed. |
stop.grace |
no | 10s |
SIGTERM → wait → SIGKILL window. |
Validation at load
- Every file must parse and produce a complete
ServerConfig. - No two configs may declare the same
port. commandmust exist and be executable (warn but allow if not — child spawn will fail and supervisor will markfailed).
Validation failures at daemon startup are fatal (exit non-zero). Failures
during reload are returned to the CLI client as JSON-RPC errors; the daemon
keeps running.
JSON-RPC protocol
Transport: Unix socket, newline-delimited JSON (one JSON-RPC 2.0 message per line).
Methods
| Method | Params | Result |
|---|---|---|
list |
— | [{name, state, pid?, port, uptime_secs?, restart_count, last_exit?}] |
status |
{name} |
single entry as above + recent state transitions |
start |
{name} or {all: true} |
{started: [...], already_running: [...]} |
stop |
{name} or {all: true} |
{stopped: [...], not_running: [...]} |
restart |
{name} or {all: true} |
{restarted: [...]} |
reload |
— | {added: [...], removed: [...], changed: [...], unchanged: [...]} |
logs |
{name, tail?: u32, follow?: bool} |
Initial response {subscription_id}; the daemon then sends JSON-RPC notifications log {subscription_id, name, stream, line, ts} for each line. A final log_end notification {subscription_id} closes the stream. For non-follow, log_end fires after the buffered tail. For follow, the stream stays open until the client closes the connection or calls logs_cancel {subscription_id}. |
Server states
stopped | starting | running | restarting | failed | stopping
reload semantics
Diff current in-memory configs against on-disk config dir:
- Added (new file): start.
- Removed (file gone): stop running process.
- Changed (content hash differs): stop, then start with new config.
- Unchanged: leave alone.
Error codes
Standard JSON-RPC error objects with our codes:
| Code | Name |
|---|---|
-32001 |
ServerNotFound |
-32002 |
PortConflict |
-32003 |
ConfigInvalid |
-32004 |
AlreadyRunning |
-32005 |
NotRunning |
-32006 |
SpawnFailed |
Supervisor state machine
stopped ─── start ──► starting
▲ │
│ (spawn)
(stop_cmd) │
│ ▼
stopping ◄── stop ─── running ─── child_exit ──► (eval policy)
│ ▲ │
(SIGTERM, │ │
grace timer, (spawn ok) │
SIGKILL) │ │
│ │ ┌─ restart ─► restarting ──┐
▼ │ │ │
stopped └──────────────┤ │
│ │
└─ no-restart / cap hit ──► failed
│
start ────┘
reload
Spawn flow
- Open / rotate log file (append mode; size threshold 10 MB, keep last 5 generations).
- Build
tokio::process::Command:command,args, merged env,working-dir.kill_on_drop(true).process_group(0)— own process group so signals don't leak.
- Spawn. Pipe stdout and stderr.
- Spin up two log pumps per child:
stdout_pump: line-buffered → log file + ring buffer + broadcast channel.stderr_pump: same, taggedstderr.
await child.wait(). On exit, evaluate restart policy.
Stop flow
- Send
SIGTERMto the process group. - Start grace timer (
stop.grace). - On timer fire,
SIGKILLthe process group. await child.wait().- Close log pumps, transition to
stopped.
Shutdown (daemon receives SIGTERM/SIGINT)
Broadcast Shutdown to all supervisor tasks → each runs its stop flow in
parallel → daemon awaits all with an outer deadline of 2 × max(stop.grace)
across configs → exit 0.
Daemon boot
- Resolve XDG paths, create state directories if missing.
- Acquire pidfile (fail if another daemon is alive).
- Load and validate all configs. Fatal on any failure.
- Bind Unix socket (0600 perms).
- Spawn one supervisor task per config; send each an immediate
Start(auto-launch behavior). - Serve JSON-RPC until shutdown signal.
Log handling
Per server:
- Disk file at
${XDG_STATE_HOME}/xy/logs/<name>.log. Combined stdout+stderr with a leading tag per line:[out]/[err]. Size-based rotation: when current file ≥ 10 MB, rename to<name>.log.1(shifting older generations), open fresh. Keep at most 5 generations. - Ring buffer in RAM, ~1 MB per server, holds the most recent log lines. Source for
logs --tailwithout re-reading disk. - Broadcast channel (
tokio::sync::broadcast) for livelogs --followsubscribers. Lagged subscribers are dropped with a warning.
CLI surface
clap with derive. Subcommand structure:
xy daemon # foreground daemon (logs to stderr)
xy list # all configured servers + state
xy status <name> # single server detail
xy start <name|--all>
xy stop <name|--all>
xy restart <name|--all>
xy reload
xy logs <name> [--tail N] [--follow]
CLI exit codes:
| Code | Meaning |
|---|---|
| 0 | success |
| 1 | operational error (server not found, port conflict on reload) |
| 2 | daemon unreachable (socket missing or refused) |
| 3 | config invalid |
Error handling
Two layers, kept separate:
- Libraries (
xy-protocol,xy-supervisor,xy-ipc):thiserrorenums per crate. Callers can match on variants (e.g.,SupervisorError::AlreadyRunning,ConfigError::DuplicatePort { name_a, name_b, port }). - Binary (
xy):anyhowfor top-level startup and CLI reporting. The IPC layer has one match site translating typed errors into JSON-RPC error objects.
Fatal vs non-fatal
| Class | Examples | Behavior |
|---|---|---|
| Fatal at daemon startup | socket bind fails; state dir uncreatable; any config invalid; duplicate port | exit non-zero, log to stderr |
| Non-fatal at runtime | child spawn fails; restart cap hit; log file write fails | log, mark server failed (or degrade log subsystem), daemon keeps running |
Testing strategy
Unit tests
xy-supervisor: state-machine transitions using a mockChildHandletrait so tests don't actually spawn processes. Cases:- Restart policy decisions (
always/on-failure/never× clean/dirty exit). - Backoff math (initial, exponential, cap).
- Retry window (sliding 60s) →
failedtransition. - Stop flow: grace timer expires → SIGKILL escalation.
- Restart policy decisions (
xy-protocol: KDL parse cases (minimal, full, invalid). JSON-RPC envelope round-trips. Error-code mapping.
Integration tests
In crates/xy/tests/:
- Spin up the real daemon on a temp socket with temp state and config dirs (per-test
XDG_*env viatempfile). - Use tiny long-running test-only binaries built in the workspace:
xy-test-sleep-server: sleeps until SIGTERM, prints periodic lines.xy-test-exit-immediately: exits non-zero immediately, used for failure-mode tests.
- Drive the real CLI subcommands; assert on
listoutput and observable state transitions.
CI
cargo +nightly fmt --check
cargo clippy --all-targets -- -D warnings
cargo test --all
Future work (out of scope for MVP)
- Container isolation (rootless podman / Docker backend per server).
- HTTP/MCP-aware health probes.
- launchd LaunchAgent install command.
- TUI dashboard (would reuse
xy-protocolover the same socket). - macOS status bar app (same).
- Optional auth on the socket if it ever leaves the user's machine.