PidFile::acquire used create_new(true) with Drop-based cleanup, so a
pidfile surviving power loss or SIGKILL made the daemon refuse to start
until the file was deleted by hand.
On AlreadyExists, read the recorded PID and probe it with kill(pid, 0):
ESRCH (or unparseable content) means stale, so remove the file and
retry the atomic create. A live PID keeps the refusal and now names the
holding process. The retry loop is bounded to stay race-safe against a
concurrent starter.
Closes#1
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ensure_dirs() now creates config_dir alongside state_dir and log_dir,
so first daemon run materializes $XDG_CONFIG_HOME/xy/servers/ — making
it obvious where to drop server .kdl files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
37 planned tasks plus 3 follow-up fixes from final code review.
Architecture:
- Cargo workspace: xy-protocol, xy-supervisor, xy-ipc, xy (single binary)
- Unix socket + newline-delimited JSON-RPC 2.0
- Per-server KDL configs at XDG paths (XDG on macOS via etcetera)
- One supervisor task per managed server, owning all state
- Per-server log capture: rotating disk + ring buffer + broadcast stream
Features:
- Daemon auto-launches all configured servers on boot
- start/stop/restart (single or --all), reload (diff added/removed/changed),
list/status, logs (--tail / --follow)
- Per-server restart policy (always/on-failure/never) with exponential
backoff, sliding 60s retry window, and Failed state on cap
- Graceful shutdown via SIGTERM/SIGINT, SIGKILL escalation after grace
- 51 tests: unit (state machine via MockChild, KDL parser, framing) +
integration (real daemon + helper bins exercising lifecycle/reload/
restart-cap/logs)
Bugs found and fixed during execution:
- Connection deadlock from single shared read/write mutex (split into
separate reader/writer halves)
- LOGS response vs notification ordering race (oneshot gate)
- StartAck::Started returned even on spawn failure (added SpawnFailed)
- Backoff sleep blocked the supervisor's command channel (interruptible
select)
- list/status returned zeroed fields (now publish full Status via watch)
Add StartAck::SpawnFailed(String) so callers can distinguish a successful
start from a failed spawn. The Start command arm now sends SpawnFailed on
io::Error rather than the misleading Started. handlers.rs maps the new
variant to an RpcErrorCode::SpawnFailed JSON-RPC error response.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the bare sleep(delay).await in the Restart backoff arm with a
tokio::select! over the timer and cmd_rx. Stop/Shutdown are now handled
immediately during backoff (Stop → Stopped, Shutdown → clean exit);
Start/Restart/Reconfigure skip the remaining delay and retry at once.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace watch::Receiver<ServerState> on SupervisorHandle with watch::Receiver<Status>,
a richer snapshot type that carries pid, port, uptime_secs, restart_count and last_exit.
SupervisorTask maintains current_pid and publishes a fresh Status on every state
transition; handlers.rs reads the full Status so list/status no longer return
zeroed/None fields.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix a deadlock in the log-stream handler that caused all logs
requests to hang: Connection used a single Mutex<JsonFramed> for
both reads and writes, so the serve loop holding the read lock
blocked the spawned notification task from writing. Split
Connection into separate reader and writer mutexes.
Also fix a response/notification ordering race: the log task now
waits for an explicit ready signal sent by serve after writing the
LOGS response, ensuring notifications never arrive at the client
before their initiating response.
Replace bail!("not implemented") stubs with real RPC calls over the Unix
socket; add format::list_table for fixed-width list output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implement per-connection ConnState tracking active subscriptions, and the
logs/logs_cancel RPC handlers. Snapshot-only streams terminate with a
log_end notification; follow streams forward broadcast lines until
cancelled or connection close.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the `reload` JSON-RPC method: diffs the on-disk config dir
against the in-memory registry and reconciles — stops removed servers,
restarts changed servers (shutdown-then-respawn), and starts new ones.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per-connection JSON-RPC dispatch in daemon/handlers.rs — list, status,
start, stop, and restart are fully implemented; reload, logs, and
logs_cancel are stubbed with -32601 for later tasks.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
One async task per managed server owns all state transitions via a
tokio::select! loop over cmd_rx and wait_child. Includes RealSpawner
and a smoke test covering the Start → Running → exit → Stopped →
Shutdown happy path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Append RealChild (real tokio::process::Child wrapper) and spawn_with_logs
to child.rs. Uses nix::unistd::setpgid via tokio's re-exported pre_exec
to create an own process group, and fires per-stream log pump tasks that
drain stdout/stderr into the provided LogSink. terminate/kill signal the
whole process group via kill(-pgid, SIG*).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds kdl_parse module with parse_server_config() that deserialises a
KDL document into ServerConfig, with full validation of name, types,
durations, and restart/stop blocks. Also derives Default on
RestartPolicy to satisfy clippy.
37-task TDD-style plan across 7 phases: workspace skeleton,
xy-protocol (config/state/rpc types), xy-supervisor (state machine
with mock-driven unit tests), xy-ipc (JSON-RPC over Unix socket),
xy binary (daemon + CLI), integration tests with test-helper bins,
and polish (fmt/clippy/README).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Approved design for the MVP: single xy binary with a Cargo workspace
(xy-protocol, xy-supervisor, xy-ipc, xy), Unix socket + newline-delimited
JSON-RPC, per-server KDL configs at XDG paths (XDG on macOS too via
etcetera), supervisor-per-server task model with per-server restart policy,
log capture to disk + ring buffer + broadcast for follow.
MVP commands: daemon, list, status, start/stop/restart (name|--all),
reload, logs. Process-alive supervision only; HTTP/MCP-aware probes,
container isolation, launchd integration, and TUI deferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>