Automated Nginx Process Monitoring & Auto-Restart Implementation on AWS EC2


2 views

When running Nginx in production environments, service availability is critical. Unlike development setups where you might manually restart services, production deployments require automated solutions to maintain uptime. On AWS EC2 instances, unexpected failures or instance reboots can take your web server offline unless proper monitoring is in place.

For Linux distributions using SystemD (Ubuntu 16.04+, CentOS 7+), we can leverage built-in service management capabilities:

[Unit]
Description=Nginx Service Watchdog
After=nginx.service

[Service]
Type=oneshot
ExecStart=/bin/systemctl restart nginx

For older systems or additional monitoring, create a bash script to check Nginx status:

#!/bin/bash
if ! systemctl is-active --quiet nginx; then
    systemctl restart nginx
    echo "Nginx restarted at $(date)" | mail -s "Nginx Restart Alert" admin@example.com
fi

Set it to run every 5 minutes in crontab:

*/5 * * * * /usr/local/bin/nginx_monitor.sh

For EC2 instances, consider these additional measures:

  • Add script to User Data for startup execution
  • Configure CloudWatch alarms for process monitoring
  • Set up Auto Scaling health checks

For enterprise-grade monitoring, install Monit:

check process nginx with pidfile /run/nginx.pid
    start program = "/etc/init.d/nginx start"
    stop program = "/etc/init.d/nginx stop"
    if failed host 127.0.0.1 port 80 protocol http then restart
    if 3 restarts within 5 cycles then timeout

Enhance your solution with proper logging and alerts:

# Add to your monitoring script
LOGFILE="/var/log/nginx/autorestart.log"
echo "[$(date)] Nginx was down - Restart attempted" >> $LOGFILE

# Configure AWS SNS for alerts
aws sns publish --topic-arn arn:aws:sns:us-east-1:1234567890:nginx-alerts \
    --message "Nginx restarted on $(hostname)" \
    --subject "Nginx Service Intervention"

When running production web services on AWS EC2, ensuring continuous uptime for Nginx is crucial. Manual monitoring and intervention isn't scalable or reliable enough for mission-critical applications. We need a robust solution that can:

  • Detect Nginx failures immediately
  • Automatically restart the service
  • Optionally notify administrators
  • Handle edge cases like resource constraints

For Linux systems using systemd (most modern distributions), we can leverage its built-in service management capabilities:

# Edit the Nginx service unit file
sudo systemctl edit --full nginx.service

Add these critical configurations to the [Service] section:

[Service]
Restart=always
RestartSec=5s
StartLimitInterval=0

This configuration tells systemd to:
- Always restart Nginx if it fails (Restart=always)
- Wait 5 seconds between restart attempts (RestartSec)
- Disable rate limiting for restarts (StartLimitInterval)

For additional monitoring and email alerts, create a bash script at /usr/local/bin/nginx_monitor.sh:

#!/bin/bash

SERVICE="nginx"
STATUS=$(systemctl is-active $SERVICE)
EMAIL="admin@example.com"

if [ "$STATUS" != "active" ]; then
    systemctl restart $SERVICE
    echo "Nginx was restarted on $(hostname) at $(date)" | mail -s "Nginx Restart Alert" $EMAIL
fi

Make it executable and add to cron:

chmod +x /usr/local/bin/nginx_monitor.sh
(crontab -l ; echo "*/5 * * * * /usr/local/bin/nginx_monitor.sh") | crontab -

For EC2 instances, consider adding these AWS-focused improvements:

#!/bin/bash

INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)

# Check Nginx port
nc -z localhost 80 || {
    systemctl restart nginx
    aws sns publish \
        --topic-arn "arn:aws:sns:$REGION:123456789012:nginx-alerts" \
        --message "Nginx restarted on $INSTANCE_ID" \
        --region $REGION
}

For more sophisticated control, install Supervisor:

sudo apt-get install supervisor
echo "[program:nginx]
command=/usr/sbin/nginx -g 'daemon off;'
autostart=true
autorestart=true
startretries=5
stderr_logfile=/var/log/nginx/error.log
stdout_logfile=/var/log/nginx/access.log" | sudo tee /etc/supervisor/conf.d/nginx.conf

Then start and enable Supervisor:

sudo systemctl enable supervisor
sudo systemctl start supervisor
sudo supervisorctl reload

Implement proper logging for all restart attempts:

# Add to your monitoring script
LOGFILE="/var/log/nginx/restart.log"
echo "$(date) - Nginx restarted (Status was: $STATUS)" >> $LOGFILE
chown www-data:adm $LOGFILE
chmod 640 $LOGFILE