Diagnosing and Fixing Nginx Client Connection Timeout & Premature Closure Issues on AWS


1 views

When users report intermittent connectivity issues where a website requires multiple reload attempts, while server logs show patterns like:

client timed out (110: Connection timed out) while waiting for request
client closed connection while waiting for request
client closed keepalive connection

We're typically dealing with either network-level problems or Nginx configuration mismatches. The fact that changing DNS (to Google's 8.8.8.8) resolved it for some users suggests DNS or network path issues, but let's examine all possibilities.

The vanilla Nginx configuration often needs these adjustments in /etc/nginx/nginx.conf:

http {
    # Timeout adjustments
    client_header_timeout 60s;
    client_body_timeout 60s;
    keepalive_timeout 75s;
    send_timeout 60s;
    
    # Buffer size adjustments
    client_header_buffer_size 4k;
    large_client_header_buffers 8 16k;
    client_max_body_size 8m;
    
    # TCP optimizations
    tcp_nodelay on;
    tcp_nopush on;
}

When DNS changes affect connectivity, we should:

  1. Check DNS resolution times: dig +trace somedomain.com
  2. Test alternative DNS providers (Cloudflare 1.1.1.1, OpenDNS)
  3. Verify MTU settings aren't causing fragmentation: ping -M do -s 1472 8.8.8.8

For EC2 instances, ensure:

# Security groups allow all required IP ranges
# Check Elastic Load Balancer (if used) idle timeout settings
# Verify VPC network ACLs aren't blocking connections
# Confirm instance has enough network bandwidth

Enable detailed logging by adding to your server block:

server {
    error_log /var/log/nginx/error.log debug;
    access_log /var/log/nginx/access.log upstream_time;
    
    location / {
        add_header X-Upstream-Addr $upstream_addr;
        add_header X-Request-Start $msec;
    }
}

Then monitor with: tail -f /var/log/nginx/error.log | grep -E "timed out|closed connection"

For production environments handling unreliable networks:

events {
    worker_connections 4096;
    multi_accept on;
    use epoll;
}

http {
    resolver 8.8.8.8 1.1.1.1 valid=300s;
    resolver_timeout 10s;
    
    # Existing timeouts from earlier
    # Plus:
    proxy_connect_timeout 60s;
    proxy_read_timeout 600s;
    proxy_send_timeout 600s;
}

Remember to test changes with nginx -t before reloading.


When users report intermittent connectivity issues (requiring 10+ attempts) while administrators can't reproduce locally, we're typically dealing with network-level problems. The NGINX error logs show two distinct patterns:

# Pattern 1: Timeout during request waiting phase
[info] 6940#0: *150649 client timed out (110: Connection timed out) while waiting for request

# Pattern 2: Premature connection closure
[info] 6940#0: *150670 client closed connection while waiting for request

The reported improvement after switching to Google DNS (8.8.8.8) suggests potential DNS resolution problems. AWS environments often require special DNS handling:

# Recommended AWS DNS resolver configuration
resolver 169.254.169.253 valid=300s;
resolver_timeout 10s;

The "client closed connection" messages indicate we need to adjust TCP-level parameters in the NGINX configuration:

http {
    # Timeout adjustments
    client_header_timeout 60s;
    client_body_timeout 60s;
    keepalive_timeout 75s;
    keepalive_requests 100;
    
    # Buffer sizes
    client_header_buffer_size 4k;
    large_client_header_buffers 4 16k;
    
    # TCP optimizations
    tcp_nodelay on;
    tcp_nopush on;
}

The issue might originate from intermediate systems. These NGINX directives help diagnose and mitigate such problems:

server {
    listen 80 deferred reuseport so_keepalive=on:5s:5s;
    proxy_buffer_size 16k;
    proxy_buffers 8 16k;
    
    # Log raw connection attempts
    error_log /var/log/nginx/tcp_errors.log debug_connection=xx.xxx.xxx.xx;
}

Implement these logging directives to capture detailed timing information:

log_format timed_combined '$remote_addr - $remote_user [$time_local] '
                          '"$request" $status $body_bytes_sent '
                          '"$http_referer" "$http_user_agent" '
                          'rt=$request_time uct="$upstream_connect_time"';
                          
access_log /var/log/nginx/access.log timed_combined;

Check these kernel parameters on the AWS instance:

# Verify TCP stack configuration
sysctl net.ipv4.tcp_fin_timeout
sysctl net.ipv4.tcp_keepalive_time
sysctl net.ipv4.tcp_keepalive_intvl
sysctl net.ipv4.tcp_keepalive_probes

# Temporary adjustment example
sysctl -w net.ipv4.tcp_keepalive_time=60