When developing foo_daemon, we encountered an issue where the process would occasionally ignore SIGTERM signals during service stop/restart operations. This typically happens when:
- Custom signal handlers override default termination behavior
- Process gets stuck in uninterruptible sleep (D state)
- Race conditions prevent proper cleanup
By default, systemd uses the following sequence for service termination:
1. Send SIGTERM
2. Wait TimeoutStopSec (default: 90s)
3. Send SIGKILL
This isn't ideal for our use case because:
- 90s is too long for our fast-paced environment
- We need deterministic behavior when PID recycling is a concern
Here's the complete service file solution with annotations:
[Unit]
Description=Foo Daemon Service
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/foo_daemon
# Critical configuration parameters:
TimeoutStopSec=2s # Maximum wait for SIGTERM
KillSignal=SIGTERM # First attempt signal
FinalKillSignal=SIGKILL # Final attempt signal
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
For environments with extreme PID churn, we can add cgroup-based protection:
[Service]
...
KillMode=process # Default, kills main process only
# OR for absolute safety:
KillMode=mixed # Send SIGTERM to main, SIGKILL to all
SendSIGKILL=yes # Explicitly enable final SIGKILL
SendSIGHUP=no # Disable additional signals
To test the configuration:
# Start service
sudo systemctl start foo_daemon
# Verify running
sudo systemctl status foo_daemon
# Test graceful stop (with 2s timeout)
sudo systemctl stop foo_daemon
# Force bad state test
sudo kill -STOP $(pgrep foo_daemon) # Simulate unresponsive state
sudo systemctl stop foo_daemon # Should SIGKILL after 2s
Use journalctl -u foo_daemon -f to monitor the complete lifecycle events.
- Monitor failed kills with
systemd-analyze verify foo_daemon.service - Combine with
OOMScoreAdjustfor memory pressure scenarios - Consider
RuntimeMaxSecfor periodic recycling of long-running processes
When dealing with misbehaving daemons that ignore SIGTERM, systemd's default stop behavior becomes problematic. The standard shutdown sequence sends SIGTERM, waits TimeoutStopSec (default 90s), then sends SIGKILL. For rapid PID-recycling environments, this long window risks killing wrong processes.
Here's the complete solution for /etc/systemd/system/foo_daemon.service:
[Unit]
Description=Foo Daemon with Forced Kill
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/foo_daemon
Restart=on-failure
KillMode=process
TimeoutStopSec=2s
RestartSec=1s
# Critical configurations:
ExecStop=/bin/kill -TERM $MAINPID
ExecStopPost=/bin/sh -c "if ps -p $MAINPID > /dev/null; then /bin/kill -KILL $MAINPID; fi"
SuccessExitStatus=143 TERM
- KillMode=process: Ensures only the main process is killed, not cgroups
- TimeoutStopSec=2s: Sets 2-second grace period before escalation
- ExecStopPost conditional: Verifies process existence before SIGKILL
- SuccessExitStatus: Recognizes terminated-by-SIGTERM as clean exit
Create a test service to validate the behavior:
# /usr/local/bin/sigterm_resistant_daemon
#!/bin/bash
trap "echo 'Ignoring TERM'" TERM
while true; do sleep 1; done
Apply the configuration and test:
sudo systemctl daemon-reload
sudo systemctl start foo_daemon
sudo systemctl stop foo_daemon
journalctl -u foo_daemon -n 30
For extreme cases where even PID tracking might fail:
ExecStopPost=/usr/bin/pkill -9 -f /usr/bin/foo_daemon
Note: This is more aggressive and might affect similar processes.
- Monitor
systemd-analyze critical-chain foo_daemon.servicefor shutdown delays - Consider
KillSignal=SIGTERMandFinalKillSignal=SIGKILLfor systemd v240+ - Add
WatchdogSec=for hung process detection