Debugging Apache Webserver Freezes: TCP Connection Backlog and KeepAlive Timeout Analysis


2 views

When examining the server-status output during freeze incidents, the scoreboard shows an abnormal pattern dominated by "_" (waiting for connection) and "K" (Keepalive read) states. The netstat output reveals:

# During freeze:
109 CLOSE_WAIT
2652 ESTABLISHED
91 SYN_RECV

# Normal operation:
108 ESTABLISHED
50 SYN_RECV
11276 TIME_WAIT

The massive SYN_RECV state during incidents suggests TCP connection queue overflow. Try these sysctl adjustments:

# Increase SYN backlog and connection tracking
sysctl -w net.ipv4.tcp_max_syn_backlog=8192
sysctl -w net.core.somaxconn=4096
sysctl -w net.ipv4.tcp_syncookies=1

# Faster connection recycling (for TIME_WAIT)
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_fin_timeout=30

The current prefork configuration appears problematic for modern workloads:

# Problematic settings:
KeepAlive On
KeepAliveTimeout 1  # Too aggressive for high traffic
MaxClients 920      # Likely exceeding available memory

Recommended adjustments:

KeepAlive Off  # Or increase timeout to 3-5 seconds

    StartServers         20
    MinSpareServers      20
    MaxSpareServers      40
    MaxClients           400  # Based on 8GB RAM
    MaxRequestsPerChild  1000

Create a real-time monitoring script to catch connection buildup:

#!/bin/bash
watch -n 5 "netstat -ant | awk 'BEGIN {
    print \"HTTP States Monitoring\";
    print \"====================\";
}
NR>2 {
    s[$6]++
}
END {
    for (i in s) print i, s[i];
}' | sort -n -k2"

With mod_php, each Apache child carries full PHP memory overhead. Consider switching to:

  • PHP-FPM with mod_proxy_fcgi
  • Event MPM instead of prefork
  • OPcache with proper memory settings
# Example PHP-FPM pool config
pm = dynamic
pm.max_children = 50
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20

When incidents occur, capture tcpdump data:

tcpdump -ni eth0 'tcp port 80 and (tcp-syn|tcp-ack)' -w /tmp/http_debug.pcap

Analyze with Wireshark for:

  • SYN flood patterns
  • Retransmission rates
  • Keepalive negotiation

When your Apache cluster suddenly becomes unresponsive with all worker processes stuck in "_" (Waiting for Connection) state, while showing:

netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
  109 CLOSE_WAIT
 2652 ESTABLISHED
    2 FIN_WAIT1
   11 LAST_ACK
   91 SYN_RECV

This typically indicates a TCP connection handling issue rather than pure Apache misconfiguration. Let me share my troubleshooting journey and solution.

First, we need to check kernel-level TCP parameters that might be causing connection buildup:

# Check current TCP settings
sysctl net.ipv4.tcp_fin_timeout
sysctl net.ipv4.tcp_keepalive_time
sysctl net.ipv4.tcp_max_syn_backlog
sysctl net.ipv4.tcp_tw_reuse

# Temporary solution during crisis
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

The current configuration has:

KeepAlive On
MaxKeepAliveRequests 20
KeepAliveTimeout 1

For high-traffic servers, try this optimized version:

KeepAlive Off  # Or reduce timeout if must keep alive
MaxKeepAliveRequests 100
KeepAliveTimeout 2
TimeOut 30


    ServerLimit           600  # Reduce from 920 to prevent overcommit
    StartServers          50
    MinSpareServers       50
    MaxSpareServers      100
    MaxClients          600
    MaxRequestsPerChild   1000

Create this monitoring script (/usr/local/bin/conn_monitor.sh):

#!/bin/bash
watch -n 5 "date; \
echo '------ Apache Status ------'; \
apache2ctl status | grep 'Waiting'; \
echo '------ TCP Connections ------'; \
netstat -tn | awk '{print \$6}' | sort | uniq -c; \
echo '------ Top Connections ------'; \
ss -s | grep 'estab'; \
echo '------ Memory Usage ------'; \
free -m"

For mod_php setups, add these php.ini tweaks:

max_execution_time = 30
memory_limit = 128M
realpath_cache_size = 256k
opcache.enable=1
opcache.memory_consumption=128
  1. Reduce ServerLimit/MaxClients to 80% of memory capacity
  2. Set KeepAliveTimeout between 1-3 seconds max
  3. Enable TCP reuse and faster FIN timeouts
  4. Monitor with the connection tracking script
  5. Consider switching to event MPM if possible