When running nginx with PHP-FPM, service degradation often manifests as 502 Bad Gateway errors while processes remain zombie-like in the system. Traditional process monitoring falls short because:
- Processes may appear running while being non-functional
- Multiple interdependent services (nginx + PHP-FPM) require coordinated recovery
- Different failure thresholds demand graduated responses
Initial attempts used process monitoring with layered URL checks:
check process webserver with pidfile /var/run/nginx.pid
if failed (url https://example.com/healthcheck) then alert
if 2 failed cycles then restart
if 4 failed cycles then reboot
This had three key weaknesses:
- Multiple HTTP requests per monitoring cycle
- Process-oriented when we really care about service availability
- Reboot conditions could trigger infinite loops
The refined solution shifts to host monitoring with state tracking:
check host webserver with address 127.0.0.1
if failed port 443 protocol https
with timeout 20 seconds
and request "/healthcheck"
content = "healthy"
for 2 cycles then restart
if 2 restarts within 5 cycles then exec "/usr/local/bin/escalate-recovery"
Key lessons from production deployments:
- Healthcheck Endpoint: Create a dedicated URL that tests both nginx routing and PHP execution:
// healthcheck.php header('Cache-Control: no-cache'); die('healthy');
- Recovery Scripts: Coordinate service restarts properly:
#!/bin/bash # /usr/local/bin/webserver-recover systemctl stop php-fpm nginx pkill -9 php-fpm pkill -9 nginx systemctl start php-fpm nginx
- State Management: Avoid Monit's queued events issue by:
# In monitrc set idfile /tmp/monit.id set statefile /tmp/monit.state
For critical systems, implement a multi-tier approach:
# Basic process monitoring
check process nginx with pidfile /var/run/nginx.pid
# Service-level monitoring
check host webserver with address 127.0.0.1
if failed port 443 then alert
if failed url /healthcheck then restart
# Synthetic transaction monitoring
check program api-test with path /usr/local/bin/api-smoketest
if status != 0 for 2 cycles then alert
- Over-aggressive rebooting: Use escalating responses (alert → restart → failover → reboot)
- Single monitoring point: Monitor both localhost and external DNS endpoints
- No post-mortem hooks: Always log state before recovery actions
When running nginx with PHP-FPM, service failures often manifest as 502 Bad Gateway errors while the processes themselves remain running. Traditional process monitoring alone isn't sufficient because:
- PHP-FPM might hang while still showing as running
- Nginx might fail to proxy requests properly
- Ports might be open while service is unresponsive
Here's an optimized configuration that combines host checking with URL monitoring:
CHECK HOST webserver WITH ADDRESS 127.0.0.1
START PROGRAM = "/etc/monit/webserver.start.sh"
STOP PROGRAM = "/etc/monit/webserver.stop.sh"
IF NOT EXIST THEN ALERT
IF FAILED PORT 443 PROTOCOL HTTPS THEN ALERT
IF FAILED (
URL https://www.mydomain.com/healthcheck
CONTENT = "OK"
TIMEOUT 15 SECONDS
HTTP HEADERS [
Host: www.mydomain.com
Connection: close
]
) FOR 2 CYCLES THEN RESTART
IF 3 RESTARTS WITHIN 10 CYCLES THEN EXEC "/usr/local/bin/escalate-alert.sh"
The healthcheck endpoint should be a simple PHP script that:
- Returns quickly (no database queries)
- Verifies PHP execution
- Includes basic system checks
Example healthcheck.php:
<?php
header('Content-Type: text/plain');
try {
// Verify PHP can execute
if (!function_exists('version_compare')) {
throw new Exception('PHP core functions missing');
}
// Simple file system check
if (!is_writable('/tmp')) {
throw new Exception('Temp directory not writable');
}
echo "OK";
} catch (Exception $e) {
header('HTTP/1.1 503 Service Unavailable');
echo "ERROR: " . $e->getMessage();
}
To prevent the reboot loop mentioned in the logs, implement these safeguards:
# /usr/local/bin/escalate-alert.sh
#!/bin/bash
# Only reboot if previous reboot was >30 minutes ago
if [ -f /var/run/last_reboot ] && \
[ $(($(date +%s) - $(date -r /var/run/last_reboot +%s))) -lt 1800 ]; then
echo "Recent reboot detected - not rebooting again" | mail -s "Server Alert" admin@example.com
exit 0
fi
touch /var/run/last_reboot
/sbin/reboot
For more complex scenarios, consider:
- Adding secondary monitoring with
check program
scripts - Implementing socket connection tests
- Using Monit's
depends
directive for service relationships
Example program check:
CHECK PROGRAM php-fpm-health WITH PATH "/usr/local/bin/check_php_fpm.sh"
IF STATUS != 0 FOR 2 CYCLES THEN ALERT
Where check_php_fpm.sh might contain:
#!/bin/bash
# Check PHP-FPM socket responsiveness
if ! echo "" | timeout 2 php-cgi -q 2>/dev/null | grep -q PONG; then
exit 1
fi
exit 0