Optimal URL Monitoring with Monit: Solving Nginx/PHP-FPM Crash Recovery

When running nginx with PHP-FPM, service degradation often manifests as 502 Bad Gateway errors while processes remain zombie-like in the system. Traditional process monitoring falls short because:

Processes may appear running while being non-functional
Multiple interdependent services (nginx + PHP-FPM) require coordinated recovery
Different failure thresholds demand graduated responses

Initial attempts used process monitoring with layered URL checks:

check process webserver with pidfile /var/run/nginx.pid
   if failed (url https://example.com/healthcheck) then alert
   if 2 failed cycles then restart
   if 4 failed cycles then reboot

This had three key weaknesses:

Multiple HTTP requests per monitoring cycle
Process-oriented when we really care about service availability
Reboot conditions could trigger infinite loops

The refined solution shifts to host monitoring with state tracking:

check host webserver with address 127.0.0.1
  if failed port 443 protocol https
    with timeout 20 seconds
    and request "/healthcheck" 
    content = "healthy"
    for 2 cycles then restart
  if 2 restarts within 5 cycles then exec "/usr/local/bin/escalate-recovery"

Key lessons from production deployments:

Healthcheck Endpoint: Create a dedicated URL that tests both nginx routing and PHP execution:
```
// healthcheck.php
header('Cache-Control: no-cache');
die('healthy');
```

Recovery Scripts: Coordinate service restarts properly:

#!/bin/bash
# /usr/local/bin/webserver-recover
systemctl stop php-fpm nginx
pkill -9 php-fpm
pkill -9 nginx
systemctl start php-fpm nginx

State Management: Avoid Monit's queued events issue by:

# In monitrc
set idfile /tmp/monit.id
set statefile /tmp/monit.state

For critical systems, implement a multi-tier approach:

# Basic process monitoring
check process nginx with pidfile /var/run/nginx.pid

# Service-level monitoring
check host webserver with address 127.0.0.1
  if failed port 443 then alert
  if failed url /healthcheck then restart

# Synthetic transaction monitoring
check program api-test with path /usr/local/bin/api-smoketest
  if status != 0 for 2 cycles then alert

Over-aggressive rebooting: Use escalating responses (alert → restart → failover → reboot)
Single monitoring point: Monitor both localhost and external DNS endpoints
No post-mortem hooks: Always log state before recovery actions

When running nginx with PHP-FPM, service failures often manifest as 502 Bad Gateway errors while the processes themselves remain running. Traditional process monitoring alone isn't sufficient because:

PHP-FPM might hang while still showing as running
Nginx might fail to proxy requests properly
Ports might be open while service is unresponsive

Here's an optimized configuration that combines host checking with URL monitoring:

CHECK HOST webserver WITH ADDRESS 127.0.0.1
  START PROGRAM = "/etc/monit/webserver.start.sh"
  STOP PROGRAM = "/etc/monit/webserver.stop.sh"
  
  IF NOT EXIST THEN ALERT
  IF FAILED PORT 443 PROTOCOL HTTPS THEN ALERT
  
  IF FAILED (
    URL https://www.mydomain.com/healthcheck 
    CONTENT = "OK"
    TIMEOUT 15 SECONDS
    HTTP HEADERS [
      Host: www.mydomain.com
      Connection: close
    ]
  ) FOR 2 CYCLES THEN RESTART
  
  IF 3 RESTARTS WITHIN 10 CYCLES THEN EXEC "/usr/local/bin/escalate-alert.sh"

The healthcheck endpoint should be a simple PHP script that:

Returns quickly (no database queries)
Verifies PHP execution
Includes basic system checks

Example healthcheck.php:

<?php
header('Content-Type: text/plain');
try {
    // Verify PHP can execute
    if (!function_exists('version_compare')) {
        throw new Exception('PHP core functions missing');
    }
    
    // Simple file system check
    if (!is_writable('/tmp')) {
        throw new Exception('Temp directory not writable');
    }
    
    echo "OK";
} catch (Exception $e) {
    header('HTTP/1.1 503 Service Unavailable');
    echo "ERROR: " . $e->getMessage();
}

To prevent the reboot loop mentioned in the logs, implement these safeguards:

# /usr/local/bin/escalate-alert.sh
#!/bin/bash

# Only reboot if previous reboot was >30 minutes ago
if [ -f /var/run/last_reboot ] && \
   [ $(($(date +%s) - $(date -r /var/run/last_reboot +%s))) -lt 1800 ]; then
    echo "Recent reboot detected - not rebooting again" | mail -s "Server Alert" admin@example.com
    exit 0
fi

touch /var/run/last_reboot
/sbin/reboot

For more complex scenarios, consider:

Adding secondary monitoring with check program scripts
Implementing socket connection tests
Using Monit's depends directive for service relationships

Example program check:

CHECK PROGRAM php-fpm-health WITH PATH "/usr/local/bin/check_php_fpm.sh"
  IF STATUS != 0 FOR 2 CYCLES THEN ALERT

Where check_php_fpm.sh might contain:

#!/bin/bash
# Check PHP-FPM socket responsiveness
if ! echo "" | timeout 2 php-cgi -q 2>/dev/null | grep -q PONG; then
    exit 1
fi
exit 0

ServerDevWorker

Optimal URL Monitoring with Monit: Solving Nginx/PHP-FPM Crash Recovery

Related Articles