When developing foo_daemon
, we encountered an issue where the process would occasionally ignore SIGTERM
signals during service stop/restart operations. This typically happens when:
- Custom signal handlers override default termination behavior
- Process gets stuck in uninterruptible sleep (D state)
- Race conditions prevent proper cleanup
By default, systemd uses the following sequence for service termination:
1. Send SIGTERM
2. Wait TimeoutStopSec (default: 90s)
3. Send SIGKILL
This isn't ideal for our use case because:
- 90s is too long for our fast-paced environment
- We need deterministic behavior when PID recycling is a concern
Here's the complete service file solution with annotations:
[Unit]
Description=Foo Daemon Service
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/foo_daemon
# Critical configuration parameters:
TimeoutStopSec=2s # Maximum wait for SIGTERM
KillSignal=SIGTERM # First attempt signal
FinalKillSignal=SIGKILL # Final attempt signal
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
For environments with extreme PID churn, we can add cgroup-based protection:
[Service]
...
KillMode=process # Default, kills main process only
# OR for absolute safety:
KillMode=mixed # Send SIGTERM to main, SIGKILL to all
SendSIGKILL=yes # Explicitly enable final SIGKILL
SendSIGHUP=no # Disable additional signals
To test the configuration:
# Start service
sudo systemctl start foo_daemon
# Verify running
sudo systemctl status foo_daemon
# Test graceful stop (with 2s timeout)
sudo systemctl stop foo_daemon
# Force bad state test
sudo kill -STOP $(pgrep foo_daemon) # Simulate unresponsive state
sudo systemctl stop foo_daemon # Should SIGKILL after 2s
Use journalctl -u foo_daemon -f
to monitor the complete lifecycle events.
- Monitor failed kills with
systemd-analyze verify foo_daemon.service
- Combine with
OOMScoreAdjust
for memory pressure scenarios - Consider
RuntimeMaxSec
for periodic recycling of long-running processes
When dealing with misbehaving daemons that ignore SIGTERM
, systemd's default stop behavior becomes problematic. The standard shutdown sequence sends SIGTERM
, waits TimeoutStopSec
(default 90s), then sends SIGKILL
. For rapid PID-recycling environments, this long window risks killing wrong processes.
Here's the complete solution for /etc/systemd/system/foo_daemon.service
:
[Unit]
Description=Foo Daemon with Forced Kill
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/foo_daemon
Restart=on-failure
KillMode=process
TimeoutStopSec=2s
RestartSec=1s
# Critical configurations:
ExecStop=/bin/kill -TERM $MAINPID
ExecStopPost=/bin/sh -c "if ps -p $MAINPID > /dev/null; then /bin/kill -KILL $MAINPID; fi"
SuccessExitStatus=143 TERM
- KillMode=process: Ensures only the main process is killed, not cgroups
- TimeoutStopSec=2s: Sets 2-second grace period before escalation
- ExecStopPost conditional: Verifies process existence before SIGKILL
- SuccessExitStatus: Recognizes terminated-by-SIGTERM as clean exit
Create a test service to validate the behavior:
# /usr/local/bin/sigterm_resistant_daemon
#!/bin/bash
trap "echo 'Ignoring TERM'" TERM
while true; do sleep 1; done
Apply the configuration and test:
sudo systemctl daemon-reload
sudo systemctl start foo_daemon
sudo systemctl stop foo_daemon
journalctl -u foo_daemon -n 30
For extreme cases where even PID tracking might fail:
ExecStopPost=/usr/bin/pkill -9 -f /usr/bin/foo_daemon
Note: This is more aggressive and might affect similar processes.
- Monitor
systemd-analyze critical-chain foo_daemon.service
for shutdown delays - Consider
KillSignal=SIGTERM
andFinalKillSignal=SIGKILL
for systemd v240+ - Add
WatchdogSec=
for hung process detection