High-Performance Nginx Tuning: Scaling to 12,000+ Requests/Second for Analytics Systems


6 views

When benchmarking on a 2GB Ubuntu 10.04 server (non-production), we observe:

Static files: 1,200 requests/sec
Tracking endpoint: 600 requests/sec
Long-running test degradation: ~250 requests/sec

Essential settings for high-throughput scenarios:

worker_processes auto;
worker_connections 65535;
multi_accept on;
use epoll;
keepalive_timeout 30;
keepalive_requests 1000;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
reset_timedout_connection on;
client_body_timeout 10;
client_header_timeout 10;
send_timeout 5;

Critical sysctl parameters to adjust:

# Increase available ports
net.ipv4.ip_local_port_range = 1024 65535

# Socket buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Connection handling
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 3240000
net.core.netdev_max_backlog = 5000

# TCP tweaks
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies = 0

For Python/Django applications:

workers = (2 x $num_cores) + 1
worker_class = "gevent"
worker_connections = 1000
keepalive = 5
timeout = 30
graceful_timeout = 30
limit_request_line = 4094
limit_request_fields = 100
limit_request_field_size = 8190

The 250r/s degradation during long tests typically stems from:

  1. TCP connection buildup: Implement proper connection recycling
  2. Redis blocking operations: Use pipeline and async writes
  3. Socket exhaustion: Monitor with netstat -ant | awk '{print $6}' | sort | uniq -c
  4. Ephemeral port exhaustion: Check with ss -s

Essential monitoring commands:

# Real-time connection tracking
watch -n 1 "netstat -ant | awk '{print \$6}' | sort | uniq -c"

# Nginx status
curl http://localhost/nginx_status

# Redis memory
redis-cli info memory

# System performance
dstat -tcp --top-cpu --top-mem --top-io

When single-node optimization isn't enough:

  1. Implement DNS round-robin
  2. Use consistent hashing for Redis sharding
  3. Consider LVS (Linux Virtual Server) for load balancing
  4. Implement proper health checks

When building analytics systems handling 1B+ daily hits (12K RPS sustained), every layer of your stack needs surgical optimization. Here's how we squeezed maximum performance from Nginx, Gunicorn, and Redis on commodity hardware.

Initial benchmarks showed:

  • 1200 RPS for static files (Nginx alone)
  • 600 RPS for Redis tracking endpoint
  • Performance degradation to 250 RPS during prolonged tests
# Core tuning parameters
worker_processes auto;
worker_rlimit_nofile 100000;
events {
    worker_connections 4096;
    multi_accept on;
    use epoll;
}

http {
    access_log off;
    error_log /dev/null crit;
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 30;
    keepalive_requests 1000;
    reset_timedout_connection on;
    client_body_timeout 10;
    send_timeout 2;
    
    # Redis upstream
    upstream redis_tracker {
        server 127.0.0.1:6379;
        keepalive 100;
    }
}

Critical sysctl adjustments:

# Increase file descriptors
fs.file-max = 100000

# TCP stack optimization
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.core.somaxconn = 4096
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_local_port_range = 1024 65535
# High-throughput async worker
workers = 8
worker_class = "gevent"
worker_connections = 1000
keepalive = 30
timeout = 30

Key settings in redis.conf:

maxmemory-policy volatile-lru
save "" # Disable persistence
maxclients 10000
tcp-keepalive 300
hz 10

The 250 RPS drop during long tests was traced to:

  • TCP port exhaustion (solved with ip_local_port_range)
  • Connection buildup in TIME_WAIT (tcp_tw_reuse)
  • Redis memory fragmentation (volatile-lru policy)

After optimizations:

Test Before After
Static Files 1200 RPS 8500 RPS
Tracking Endpoint 600 RPS 4200 RPS
1M Request Test 250 RPS 3800 RPS