Daemon refuses to start after unclean shutdown: stale pidfile is treated as a running daemon #1
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Observed
After a machine restart (unclean shutdown), the daemon's pidfile and socket file were left on disk. Starting the daemon again failed with:
…but no daemon process existed. Manual cleanup (deleting the pid and sock files) was required before the daemon would start.
Root cause
PidFile::acquire(crates/xy/src/pidfile.rs) opens the pidfile withcreate_new(true), so any pre-existing file — live or stale — fails withAlreadyExists. Cleanup relies entirely onDrop, which never runs on power loss / SIGKILL / hard reboot.The socket file is not actually the blocker:
bind()incrates/xy-ipc/src/server.rsalready removes a pre-existing sock file before binding. But startup aborts at the pidfile check (crates/xy/src/daemon/mod.rs:72) before reaching that point.Proposed fix
On
AlreadyExists:kill(pid, 0)—ESRCHmeans stale).create_newacquire (loop, to stay race-safe against a concurrent starter).Alternative worth considering: hold an
flock()on the pidfile instead of relying oncreate_new. The kernel releases the lock when the process dies, regardless of how, which makes stale-file detection unnecessary (the file's existence stops being the liveness signal). PID contents stay useful for diagnostics.Either way: add a test that simulates the crash case — write a pidfile containing a dead PID, assert the daemon starts and replaces it.