html
When developing network services on CentOS/RHEL systems, two critical pain points emerge during crash recovery:
- Service restart automation
- TCP socket cleanup (TIME_WAIT, FIN_WAIT states)
For CentOS 7+ and RHEL, systemd provides native crash recovery mechanisms. Create/edit your service unit file:
[Unit] Description=My Network Service After=network.target [Service] Type=simple ExecStart=/usr/bin/my_service Restart=on-failure RestartSec=5s LimitNOFILE=4096 # Critical for socket cleanup KillMode=process
To handle lingering sockets more aggressively:
# Add to your service unit [Service] ... ExecStopPost=/bin/sh -c "ss -K dst :PORT_NUMBER"
For non-systemd systems or additional control:
[program:my_service] command=/usr/bin/my_service autostart=true autorestart=true startsecs=5 startretries=3 killasgroup=true stopasgroup=true
Add these to /etc/sysctl.conf for faster socket recycling:
net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_fin_timeout = 15 net.ipv4.tcp_max_tw_buckets = 20000
Apply with sysctl -p
For a Python web service with socket cleanup:
#!/bin/bash # Pre-start cleanup ss -K dst :8000 || true # Start service exec /usr/bin/python3 /opt/app/server.py
Verify your auto-restart works:
# Force-kill your service kill -9 $(pgrep my_service) # Check status and logs journalctl -u my_service --since "1 minute ago"
- Port conflicts: Use
ss -tulnp
to identify lingering processes - Resource limits: Check
/var/log/messages
for "too many open files" errors - Startup timing: Adjust
RestartSec
for network dependencies
When developing network services on Linux (especially CentOS/RHEL systems), two persistent issues emerge:
- Service crashes don't automatically recover
- Abandoned sockets in TIME_WAIT/FIN_WAIT states block port reuse
For CentOS 7+, systemd provides native crash recovery. Create or modify your service unit file:
[Unit]
Description=My Network Service
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/my_service
Restart=always
RestartSec=5
StartLimitInterval=0
[Install]
WantedBy=multi-user.target
Key parameters:
- Restart=always: Attempts restart on any exit
- RestartSec=5: Waits 5 seconds before restarting
- StartLimitInterval=0: Disables restart rate limiting
To handle lingering sockets, implement these strategies:
1. Kernel Parameter Tuning
# Reduce TIME_WAIT duration
echo 5 > /proc/sys/net/ipv4/tcp_fin_timeout
# Enable socket reuse
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
2. Application-Level Solutions
In your service code, set these socket options:
// C example
int enable = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(int));
setsockopt(sockfd, SOL_SOCKET, SO_REUSEPORT, &enable, sizeof(int));
3. Graceful Shutdown Handling
Implement signal handlers to properly close connections:
void handle_sigterm(int sig) {
// Close all active connections
close_all_sockets();
exit(0);
}
int main() {
signal(SIGTERM, handle_sigterm);
// ... rest of service code
}
For non-systemd systems or additional monitoring:
[program:my_service]
command=/usr/local/bin/my_service
autostart=true
autorestart=true
startsecs=5
startretries=0
To test your configuration:
# Force kill the service
kill -9 $(pidof my_service)
# Check status
systemctl status my_service
# Monitor socket states
ss -tulnp | grep my_service
Remember that automatic restarts shouldn't replace proper error handling - implement both for robust services.