How to Auto-Restart Linux Services on Crash with Socket Cleanup (CentOS/RHEL Focus)


2 views

html

When developing network services on CentOS/RHEL systems, two critical pain points emerge during crash recovery:

  • Service restart automation
  • TCP socket cleanup (TIME_WAIT, FIN_WAIT states)

For CentOS 7+ and RHEL, systemd provides native crash recovery mechanisms. Create/edit your service unit file:

[Unit]
Description=My Network Service
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/my_service
Restart=on-failure
RestartSec=5s
LimitNOFILE=4096

# Critical for socket cleanup
KillMode=process

To handle lingering sockets more aggressively:

# Add to your service unit
[Service]
...
ExecStopPost=/bin/sh -c "ss -K dst :PORT_NUMBER"

For non-systemd systems or additional control:

[program:my_service]
command=/usr/bin/my_service
autostart=true
autorestart=true
startsecs=5
startretries=3
killasgroup=true
stopasgroup=true

Add these to /etc/sysctl.conf for faster socket recycling:

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_max_tw_buckets = 20000

Apply with sysctl -p

For a Python web service with socket cleanup:

#!/bin/bash
# Pre-start cleanup
ss -K dst :8000 || true
# Start service
exec /usr/bin/python3 /opt/app/server.py

Verify your auto-restart works:

# Force-kill your service
kill -9 $(pgrep my_service)

# Check status and logs
journalctl -u my_service --since "1 minute ago"
  • Port conflicts: Use ss -tulnp to identify lingering processes
  • Resource limits: Check /var/log/messages for "too many open files" errors
  • Startup timing: Adjust RestartSec for network dependencies

When developing network services on Linux (especially CentOS/RHEL systems), two persistent issues emerge:

  1. Service crashes don't automatically recover
  2. Abandoned sockets in TIME_WAIT/FIN_WAIT states block port reuse

For CentOS 7+, systemd provides native crash recovery. Create or modify your service unit file:

[Unit]
Description=My Network Service
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/my_service
Restart=always
RestartSec=5
StartLimitInterval=0

[Install]
WantedBy=multi-user.target

Key parameters:

  • Restart=always: Attempts restart on any exit
  • RestartSec=5: Waits 5 seconds before restarting
  • StartLimitInterval=0: Disables restart rate limiting

To handle lingering sockets, implement these strategies:

1. Kernel Parameter Tuning

# Reduce TIME_WAIT duration
echo 5 > /proc/sys/net/ipv4/tcp_fin_timeout

# Enable socket reuse
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

2. Application-Level Solutions

In your service code, set these socket options:

// C example
int enable = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(int));
setsockopt(sockfd, SOL_SOCKET, SO_REUSEPORT, &enable, sizeof(int));

3. Graceful Shutdown Handling

Implement signal handlers to properly close connections:

void handle_sigterm(int sig) {
    // Close all active connections
    close_all_sockets();
    exit(0);
}

int main() {
    signal(SIGTERM, handle_sigterm);
    // ... rest of service code
}

For non-systemd systems or additional monitoring:

[program:my_service]
command=/usr/local/bin/my_service
autostart=true
autorestart=true
startsecs=5
startretries=0

To test your configuration:

# Force kill the service
kill -9 $(pidof my_service)

# Check status
systemctl status my_service

# Monitor socket states
ss -tulnp | grep my_service

Remember that automatic restarts shouldn't replace proper error handling - implement both for robust services.