Best Practices for Keeping a Daemon Process Alive from init.d Scripts in Linux


5 views

When managing system services through traditional init.d scripts, ensuring process persistence becomes crucial for critical daemons. The fundamental issue arises when a monitoring daemon (like dtnd in your case) needs to maintain continuous operation while being properly integrated with the system's init system.

For systems using SysVinit, the standard method involves creating a proper init script that handles start, stop, and status operations. Here's a basic template for your dtnd service:

#!/bin/sh
### BEGIN INIT INFO
# Provides:          dtnd
# Required-Start:    $remote_fs $syslog
# Required-Stop:     $remote_fs $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start DTN daemon
# Description:       Daemon for monitoring and restarting processes
### END INIT INFO

PATH=/sbin:/usr/sbin:/bin:/usr/bin
NAME=dtnd
DAEMON=/usr/sbin/$NAME
DAEMON_ARGS=""
PIDFILE=/var/run/$NAME.pid
USER=dtnduser

case "$1" in
  start)
    echo "Starting $NAME"
    start-stop-daemon --start --quiet --pidfile $PIDFILE \
        --chuid $USER --exec $DAEMON -- $DAEMON_ARGS
    ;;
  stop)
    echo "Stopping $NAME"
    start-stop-daemon --stop --quiet --pidfile $PIDFILE
    ;;
  restart)
    $0 stop
    sleep 1
    $0 start
    ;;
  status)
    status_of_proc -p $PIDFILE $DAEMON $NAME && exit 0 || exit $?
    ;;
  *)
    echo "Usage: /etc/init.d/$NAME {start|stop|restart|status}"
    exit 1
    ;;
esac

exit 0

For ensuring your daemon stays alive, you have several technical approaches:

1. Using a Wrapper Script

Create a simple shell wrapper that restarts your daemon if it fails:

#!/bin/sh
while true; do
    /usr/sbin/dtnd
    sleep 10
done

Then modify your init script to launch this wrapper instead of the direct binary.

2. Leveraging inittab (if available)

For systems with inittab, add an entry like:

dtnd:2345:respawn:/usr/sbin/dtnd

3. Cron-based Monitoring

Create a cron job that checks and restarts the process:

*/5 * * * * root pgrep dtnd || /etc/init.d/dtnd restart

While the above solutions work, consider these more robust approaches if system modifications are possible:

  • systemd: Create a proper service unit with Restart=always
  • supervisord: Dedicated process control system
  • monit: Advanced monitoring and restart capability

When implementing process resurrection:

  • Ensure proper logging to detect restart loops
  • Implement maximum restart attempts to prevent thrashing
  • Consider resource limits to prevent memory leaks
  • Add proper exit codes to distinguish between normal and error exits

For your specific constraints (init.d only), the wrapper script approach combined with proper init.d integration represents the most robust solution. The cron-based monitoring provides additional redundancy. Remember to:

  1. Test failure scenarios thoroughly
  2. Implement proper logging
  3. Document the resurrection mechanism

When managing persistent daemons through init.d scripts, the fundamental requirement is ensuring continuous operation even after unexpected crashes. The traditional approach using start-stop-daemon alone doesn't provide resurrection capabilities.

For systems limited to System-V init scripts, we can implement watchdog functionality directly in the init script. Here's a robust template:


#!/bin/sh
### BEGIN INIT INFO
# Provides:          dtnd-watchdog
# Required-Start:    $local_fs $network
# Required-Stop:     $local_fs $network
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: DTN daemon with auto-respawn
### END INIT INFO

DAEMON=/usr/sbin/dtnd
NAME=dtnd
USER=dtnuser
PIDFILE=/var/run/$NAME.pid
LOGFILE=/var/log/$NAME.log
MAX_RETRIES=5
RETRY_DELAY=5

start() {
    if [ -f $PIDFILE ]; then
        echo "$NAME is already running"
        return 1
    fi
    
    echo "Starting $NAME watchdog"
    start-stop-daemon --start --background --make-pidfile --pidfile $PIDFILE \
        --chuid $USER --exec $DAEMON -- >> $LOGFILE 2>&1 &
    
    # Start monitor process
    (
        retries=0
        while [ $retries -lt $MAX_RETRIES ]; do
            if ! pgrep -F $PIDFILE >/dev/null; then
                ((retries++))
                echo "Process crashed, restarting attempt $retries" >> $LOGFILE
                start-stop-daemon --start --background --make-pidfile --pidfile $PIDFILE \
                    --chuid $USER --exec $DAEMON -- >> $LOGFILE 2>&1
                sleep $RETRY_DELAY
            else
                retries=0
                sleep 10
            fi
        done
        echo "Max retries reached for $NAME" >> $LOGFILE
    ) &
}

stop() {
    echo "Stopping $NAME and watchdog"
    start-stop-daemon --stop --pidfile $PIDFILE
    rm -f $PIDFILE
}

case "$1" in
    start)
        start
        ;;
    stop)
        stop
        ;;
    restart)
        stop
        start
        ;;
    *)
        echo "Usage: $0 {start|stop|restart}"
        exit 1
        ;;
esac

For systems with minimal process supervision requirements, we can leverage cron as a fallback mechanism:


# In /etc/crontab
* * * * * root /usr/bin/test -f /var/run/dtnd.pid || /etc/init.d/dtnd start >/dev/null 2>&1

When implementing resurrection logic:

  • Implement maximum restart attempts to prevent thrashing
  • Include exponential backoff between restarts
  • Log all restart events with timestamps
  • Consider resource limits (memory, CPU) that might cause crashes

While maintaining compatibility with init.d, you can wrap the service in systemd (when available):


# /etc/systemd/system/dtnd.service
[Unit]
Description=DTN Daemon with Auto-Restart

[Service]
User=dtnuser
ExecStart=/usr/sbin/dtnd
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

The init.d script can then delegate to systemd when present:


if type systemctl >/dev/null 2>&1; then
    exec systemctl "$1" dtnd
else
    # Original init.d logic
fi