Optimizing TCP TIME_WAIT Socket Reduction for High-Traffic FCGI Services Behind Nginx


2 views

When dealing with high-traffic web services (230+ requests/sec) where Nginx proxies requests to FCGI backends, TCP sockets in TIME_WAIT state can quickly accumulate. Each connection termination leaves sockets lingering for 2*MSL (typically 60 seconds), consuming kernel resources.

Your existing TCP tweaks show good starting points:

# Current sysctl settings
tcp_fin_timeout = 1      # Fast FIN acknowledgment
tcp_tw_recycle = 1       # Enable fast recycling
tcp_tw_reuse = 1         # Allow TIME_WAIT reuse

1. Kernel-Level TCP Stack Tuning

Add these to /etc/sysctl.conf:

net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 3
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.ip_local_port_range = 1024 65535

2. Nginx Connection Management

Optimize your nginx.conf with keepalive settings:

upstream backend {
    server 127.0.0.1:9000;
    keepalive 512;  # Maintain persistent connections
}

server {
    location / {
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_pass http://backend;
    }
}

3. Socket Queue Optimization

Increase the SYN backlog queue:

echo 8192 > /proc/sys/net/core/somaxconn
echo 8192 > /proc/sys/net/ipv4/tcp_max_syn_backlog

UNIX Domain Sockets for Internal Communication

Convert Nginx worker ↔ FCGI connections:

# nginx.conf
upstream backend {
    server unix:/var/run/php-fpm.sock;
}

# php-fpm.conf
listen = /var/run/php-fpm.sock
listen.backlog = 65536

When Cross-Machine Connections Are Necessary

For distributed workers where domain sockets aren't possible:

# Enable timestamps for connection reuse
echo 1 > /proc/sys/net/ipv4/tcp_timestamps

# Set TCP keepalive probes
sysctl -w net.ipv4.tcp_keepalive_intvl=30
sysctl -w net.ipv4.tcp_keepalive_probes=8

Essential commands to verify improvements:

# Check TIME_WAIT counts
ss -tan | awk '{print $1}' | sort | uniq -c

# Monitor socket states in real-time
watch -n 1 "netstat -ant | awk '{print \$6}' | sort | uniq -c"

# Track connection rates
cat /proc/net/netstat | grep -E 'TcpExt|IpExt'

Remember that aggressive TIME_WAIT reduction carries tradeoffs. Test changes gradually in staging environments before production deployment.


When running high-traffic web services (230+ req/sec) with Nginx reverse proxies and FCGI backends, you'll inevitably encounter TCP socket accumulation in TIME_WAIT state. This occurs when:

# Typical TIME_WAIT symptoms
$ ss -tan | grep TIME-WAIT | wc -l
4821  # Excessive count in production

Your existing sysctl settings show good baseline optimizations:

net.ipv4.tcp_fin_timeout = 1       # Fast FIN timeout
net.ipv4.tcp_tw_recycle = 1        # Enable recycling
net.ipv4.tcp_tw_reuse = 1          # Enable reuse

For your specific stack (Nginx → Nginx workers → FCGI → DB), consider these structural changes:

# 1. Unix domain sockets between Nginx and FCGI
upstream backend {
    server unix:/var/run/php-fpm.sock;
}

# 2. HTTP/1.1 keepalives in Nginx config
keepalive_timeout 75;
keepalive_requests 10000;

Beyond basic settings, these parameters help in high-connection scenarios:

# /etc/sysctl.conf additions
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1 
net.ipv4.tcp_fin_timeout = 3
net.ipv4.ip_local_port_range = 1024 65000
net.core.somaxconn = 32768

Implement connection reuse at multiple levels:

# Nginx upstream configuration
upstream backend {
    server 10.0.0.1:9000;
    keepalive 100;  # Maintain 100 idle connections
}

# FCGI process manager settings (php-fpm.conf)
pm = dynamic
pm.max_children = 200
pm.start_servers = 30
pm.min_spare_servers = 20
pm.max_spare_servers = 50

Verify improvements with these diagnostic commands:

# Real-time socket monitoring
watch -n 1 'ss -s | grep TIME-WAIT'

# Connection state breakdown
netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

For extreme cases, consider these aggressive measures:

# Emergency sysctl settings (use with caution)
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
echo 1 > /proc/sys/net/ipv4/tcp_syncookies
echo 1024 65000 > /proc/sys/net/ipv4/ip_local_port_range

Always test changes in staging environments before production deployment.