Debugging Nginx + PHP-FPM 504 Gateway Timeout Errors: TCP Socket States and Persistent Connection Issues

When dealing with intermittent 504 errors in an Nginx+PHP-FPM environment, the key diagnostic artifacts appear in two places:

# Nginx error logs show:
[error] upstream timed out (110: Connection timed out) while reading response header from upstream
[error] recv() failed (104: Connection reset by peer) while reading response header from upstream
[error] connect() failed (111: Connection refused) while connecting to upstream

The critical insight comes from examining TCP socket states:

netstat -tnp | grep 9000
# Reveals numerous CLOSE_WAIT and FIN_WAIT2 pairs
tcp        9      0 localhost:9000  localhost:36094  CLOSE_WAIT  14269/php5-fpm  
tcp        0      0 localhost:46664 localhost:9000   FIN_WAIT2   -

The FIN_WAIT2/CLOSE_WAIT pairs indicate a fundamental TCP stack imbalance - the PHP-FPM workers aren't properly closing connections after Nginx terminates them. This creates socket exhaustion over time.

Three primary factors contribute to this:

PHP-FPM's process manager (static/dynamic/ondemand) not recycling workers properly
Keepalive timeouts mismatched between Nginx and PHP-FPM
PHP scripts hanging during execution (database queries, external API calls)

1. PHP-FPM Configuration Tuning

; /etc/php-fpm.d/www.conf
pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 2
pm.max_spare_servers = 8
pm.max_requests = 500  ; Critical for preventing memory leaks
request_terminate_timeout = 30s  ; Force kill hanging scripts
catch_workers_output = yes  ; For debugging

2. Nginx FastCGI Timeout Adjustments

location ~ \.php$ {
    fastcgi_read_timeout 300;
    fastcgi_send_timeout 300;
    fastcgi_connect_timeout 60;
    fastcgi_buffer_size 128k;
    fastcgi_buffers 4 256k;
    fastcgi_busy_buffers_size 256k;
    keepalive_timeout 15;  # Must be lower than PHP-FPM's
}

3. Kernel-Level TCP Tweaks

# /etc/sysctl.conf
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.core.somaxconn = 65535

Implement this status check script to catch issues early:

#!/bin/bash
# monitor_fpm_sockets.sh

WARN_THRESHOLD=50
CRIT_THRESHOLD=100

CLOSE_WAIT=$(netstat -tnp | grep 9000 | grep CLOSE_WAIT | wc -l)
FIN_WAIT=$(netstat -tnp | grep 9000 | grep FIN_WAIT | wc -l)

if [ $CLOSE_WAIT -ge $CRIT_THRESHOLD ]; then
    echo "CRITICAL: $CLOSE_WAIT CLOSE_WAIT sockets | sockets=$CLOSE_WAIT"
    service php-fpm restart
elif [ $CLOSE_WAIT -ge $WARN_THRESHOLD ]; then
    echo "WARNING: $CLOSE_WAIT CLOSE_WAIT sockets | sockets=$CLOSE_WAIT"
fi

When the issue persists, use these forensic tools:

# Show which PHP processes are stuck
strace -p $(pgrep -d, php-fpm) -s 1024 -f

# Monitor TCP connections in real-time
tcptrack -i eth0 port 9000

# Detailed PHP-FPM status
curl http://localhost/status?json | jq

When examining these timeout errors in your Nginx + PHP-FPM setup, several key patterns emerge from the logs:

# Typical error sequence observed
[error] upstream timed out (110: Connection timed out)
[error] recv() failed (104: Connection reset by peer)
[error] connect() failed (111: Connection refused)

The netstat output reveals a critical issue with TCP connection states:

tcp   0      0 localhost:46680 localhost:9000  FIN_WAIT2
tcp   1337   0 localhost:9000  localhost:46680 CLOSE_WAIT

This persistent CLOSE_WAIT/FIN_WAIT2 pairing indicates that PHP-FPM isn't properly closing connections after processing requests, leading to connection pool exhaustion.

After extensive testing, these are the most effective configuration changes:

Nginx Configuration

location ~ \.php$ {
    fastcgi_read_timeout 300;
    fastcgi_send_timeout 300;
    fastcgi_connect_timeout 75s;
    fastcgi_buffer_size 128k;
    fastcgi_buffers 4 256k;
    fastcgi_busy_buffers_size 256k;
    fastcgi_temp_file_write_size 256k;
    
    # Critical for connection reuse
    fastcgi_keep_conn on;
}

PHP-FPM Pool Adjustments

[www]
pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 10
pm.max_requests = 500

; Socket-specific fixes for TCP mode
listen = 127.0.0.1:9000
listen.backlog = 65535
listen.allowed_clients = 127.0.0.1
request_terminate_timeout = 300s
request_slowlog_timeout = 60s

Add these to /etc/sysctl.conf:

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

Apply with sysctl -p

Create this bash script to monitor connection states:

#!/bin/bash
watch -n 2 "netstat -tnpa | grep -E '9000|php-fpm' | awk '{print \$6}' | sort | uniq -c"

And for real-time PHP-FPM status:

watch -n 2 "curl -s 127.0.0.1/status | grep -E 'active|listen'"

Verify all timeouts match between Nginx and PHP-FPM
Implement proper connection pooling settings
Monitor TCP connection states post-fix
Consider switching to Unix sockets if possible
Implement proper process recycling with max_requests

ServerDevWorker