Optimizing PHP-FPM and Nginx Configuration to Handle High Traffic and Prevent Connection Exhaustion

When running a high-traffic website with Nginx and PHP-FPM, you might encounter warnings like:

[02-Jun-2012 01:52:04] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers)
[02-Jun-2012 01:39:19] WARNING: [pool www] server reached pm.max_children setting (150), consider raising it

These indicate your PHP-FPM pool is struggling to handle incoming requests, leading to 504 Gateway Timeout errors for legitimate users.

From your setup, I see:

pm.max_children = 150
pm.start_servers = 75
pm.min_spare_servers = 20
pm.max_spare_servers = 150

While increasing pm.max_children helps temporarily, it's not a sustainable solution as each PHP-FPM process consumes memory. Your server has:

Mem:       6114284 (5.8GB) used: 5726984 (5.5GB)
Swap:      524284 (512MB) used: 5804 (5.6MB)

1. PHP-FPM Process Management

Instead of arbitrarily increasing children, calculate based on available memory:

; Calculate based on average PHP process memory usage
pm = dynamic
pm.max_children = (Total RAM - (OS + other services)) / Average PHP process size

; Example for a 6GB server running only PHP-FPM:
pm.max_children = 100  ; Assuming ~50MB per process
pm.start_servers = 20
pm.min_spare_servers = 10
pm.max_spare_servers = 30
pm.process_idle_timeout = 10s

2. Nginx Connection Optimization

Your current Nginx config has:

worker_connections 19000;
worker_rlimit_nofile 20000;

Consider these adjustments:

worker_processes auto;  # Match CPU cores
events {
    worker_connections 4096;
    multi_accept on;
    use epoll;
}
worker_rlimit_nofile 8192;

http {
    keepalive_timeout 30;
    keepalive_requests 100;
    ...
}

3. PHP Execution Timeouts

Align PHP and Nginx timeouts:

; php.ini
max_execution_time = 30
max_input_time = 60

; Nginx location block
fastcgi_read_timeout 60s;
fastcgi_send_timeout 60s;
fastcgi_connect_timeout 30s;

Implementing Process Recycling

Add to your PHP-FPM pool config:

pm.max_requests = 500  # Restart processes after serving 500 requests
request_terminate_timeout = 30s  # Hard kill after 30 seconds

OPcache Configuration

Reduce PHP execution time with proper OPcache settings:

[opcache]
opcache.enable=1
opcache.memory_consumption=128
opcache.interned_strings_buffer=8
opcache.max_accelerated_files=4000
opcache.revalidate_freq=60
opcache.fast_shutdown=1

Monitoring with Status Pages

Add these to monitor performance:

; PHP-FPM status
pm.status_path = /status

; Nginx config
location /status {
    access_log off;
    allow 127.0.0.1;
    deny all;
    include fastcgi_params;
    fastcgi_pass unix:/tmp/php5-fpm.sock;
}

Before deploying changes, test with:

ab -n 1000 -c 100 http://yoursite.com/
siege -b -c150 -t1M http://yoursite.com/

Monitor memory usage during tests:

watch -n 1 "free -m; ps auxf | grep php-fpm | awk '{sum+=\$6} END {print sum/1024}'"

When analyzing PHP-FPM connection issues under heavy load, we need to examine three key areas:

# Check current PHP-FPM status
sudo systemctl status php-fpm
# Monitor active connections
sudo netstat -anp | grep php-fpm | wc -l
# View real-time process count
ps -ef | grep php-fpm | wc -l

Your current memory usage shows:

Mem: 6114284 total (5726984 used)
Swap: 524284 total (5804 used)

A good rule of thumb is to allocate ~30MB per PHP-FPM child process. With 150 max_children:

150 children × 30MB = 4500MB (4.5GB)

This leaves little room for other services. Consider either:

Upgrading server RAM
Optimizing PHP memory usage
Implementing a more efficient process manager

Instead of static process management, try dynamic with these settings:

pm = dynamic
pm.max_children = 100
pm.start_servers = 20
pm.min_spare_servers = 10
pm.max_spare_servers = 30
pm.process_idle_timeout = 10s
pm.max_requests = 500

Key adjustments for better PHP handling:

fastcgi_connect_timeout 30s;
fastcgi_send_timeout 60s;
fastcgi_read_timeout 60s;
fastcgi_buffers 16 16k;
fastcgi_buffer_size 32k;
fastcgi_busy_buffers_size 64k;
fastcgi_temp_file_write_size 128k;
keepalive_timeout 15;

For high-traffic sites:

# Implement process recycling
pm.max_requests = 1000

# Enable status page
pm.status_path = /status
location = /status {
    include fastcgi_params;
    fastcgi_pass unix:/var/run/php-fpm.sock;
}

Essential commands for ongoing maintenance:

# Real-time monitoring
watch -n 1 "echo 'show status' | socat unix-connect:/var/run/php-fpm.sock -"

# Log analysis
tail -f /var/log/php-fpm.log | grep -E 'WARNING|ERROR'

# Performance metrics
php -i | grep memory_limit
php -r 'print_r(fpm_get_status());'

When vertical scaling reaches limits:

# Sample load balancer config
upstream php_servers {
    server 192.168.1.10:9000;
    server 192.168.1.11:9000;
    server 192.168.1.12:9000;
}

location ~ \.php$ {
    fastcgi_pass php_servers;
}

ServerDevWorker