How to Configure systemd Service for Forced SIGKILL After Failed Graceful Shutdown


1 views

When developing foo_daemon, we encountered an issue where the process would occasionally ignore SIGTERM signals during service stop/restart operations. This typically happens when:

  • Custom signal handlers override default termination behavior
  • Process gets stuck in uninterruptible sleep (D state)
  • Race conditions prevent proper cleanup

By default, systemd uses the following sequence for service termination:

1. Send SIGTERM
2. Wait TimeoutStopSec (default: 90s)
3. Send SIGKILL

This isn't ideal for our use case because:

  • 90s is too long for our fast-paced environment
  • We need deterministic behavior when PID recycling is a concern

Here's the complete service file solution with annotations:

[Unit]
Description=Foo Daemon Service
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/foo_daemon
# Critical configuration parameters:
TimeoutStopSec=2s          # Maximum wait for SIGTERM
KillSignal=SIGTERM         # First attempt signal
FinalKillSignal=SIGKILL    # Final attempt signal
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

For environments with extreme PID churn, we can add cgroup-based protection:

[Service]
...
KillMode=process          # Default, kills main process only
# OR for absolute safety:
KillMode=mixed            # Send SIGTERM to main, SIGKILL to all
SendSIGKILL=yes           # Explicitly enable final SIGKILL
SendSIGHUP=no             # Disable additional signals

To test the configuration:

# Start service
sudo systemctl start foo_daemon

# Verify running
sudo systemctl status foo_daemon

# Test graceful stop (with 2s timeout)
sudo systemctl stop foo_daemon

# Force bad state test
sudo kill -STOP $(pgrep foo_daemon)  # Simulate unresponsive state
sudo systemctl stop foo_daemon       # Should SIGKILL after 2s

Use journalctl -u foo_daemon -f to monitor the complete lifecycle events.

  • Monitor failed kills with systemd-analyze verify foo_daemon.service
  • Combine with OOMScoreAdjust for memory pressure scenarios
  • Consider RuntimeMaxSec for periodic recycling of long-running processes

When dealing with misbehaving daemons that ignore SIGTERM, systemd's default stop behavior becomes problematic. The standard shutdown sequence sends SIGTERM, waits TimeoutStopSec (default 90s), then sends SIGKILL. For rapid PID-recycling environments, this long window risks killing wrong processes.

Here's the complete solution for /etc/systemd/system/foo_daemon.service:

[Unit]
Description=Foo Daemon with Forced Kill
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/foo_daemon
Restart=on-failure
KillMode=process
TimeoutStopSec=2s
RestartSec=1s

# Critical configurations:
ExecStop=/bin/kill -TERM $MAINPID
ExecStopPost=/bin/sh -c "if ps -p $MAINPID > /dev/null; then /bin/kill -KILL $MAINPID; fi"
SuccessExitStatus=143 TERM
  • KillMode=process: Ensures only the main process is killed, not cgroups
  • TimeoutStopSec=2s: Sets 2-second grace period before escalation
  • ExecStopPost conditional: Verifies process existence before SIGKILL
  • SuccessExitStatus: Recognizes terminated-by-SIGTERM as clean exit

Create a test service to validate the behavior:

# /usr/local/bin/sigterm_resistant_daemon
#!/bin/bash
trap "echo 'Ignoring TERM'" TERM
while true; do sleep 1; done

Apply the configuration and test:

sudo systemctl daemon-reload
sudo systemctl start foo_daemon
sudo systemctl stop foo_daemon
journalctl -u foo_daemon -n 30

For extreme cases where even PID tracking might fail:

ExecStopPost=/usr/bin/pkill -9 -f /usr/bin/foo_daemon

Note: This is more aggressive and might affect similar processes.

  • Monitor systemd-analyze critical-chain foo_daemon.service for shutdown delays
  • Consider KillSignal=SIGTERM and FinalKillSignal=SIGKILL for systemd v240+
  • Add WatchdogSec= for hung process detection