High-Performance Nginx Tuning: Scaling to 12,000+ Requests/Second for Analytics Systems

When benchmarking on a 2GB Ubuntu 10.04 server (non-production), we observe:

Static files: 1,200 requests/sec
Tracking endpoint: 600 requests/sec
Long-running test degradation: ~250 requests/sec

Essential settings for high-throughput scenarios:

worker_processes auto;
worker_connections 65535;
multi_accept on;
use epoll;
keepalive_timeout 30;
keepalive_requests 1000;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
reset_timedout_connection on;
client_body_timeout 10;
client_header_timeout 10;
send_timeout 5;

Critical sysctl parameters to adjust:

# Increase available ports
net.ipv4.ip_local_port_range = 1024 65535

# Socket buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Connection handling
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 3240000
net.core.netdev_max_backlog = 5000

# TCP tweaks
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies = 0

For Python/Django applications:

workers = (2 x $num_cores) + 1
worker_class = "gevent"
worker_connections = 1000
keepalive = 5
timeout = 30
graceful_timeout = 30
limit_request_line = 4094
limit_request_fields = 100
limit_request_field_size = 8190

The 250r/s degradation during long tests typically stems from:

TCP connection buildup: Implement proper connection recycling
Redis blocking operations: Use pipeline and async writes
Socket exhaustion: Monitor with netstat -ant | awk '{print $6}' | sort | uniq -c
Ephemeral port exhaustion: Check with ss -s

Essential monitoring commands:

# Real-time connection tracking
watch -n 1 "netstat -ant | awk '{print \$6}' | sort | uniq -c"

# Nginx status
curl http://localhost/nginx_status

# Redis memory
redis-cli info memory

# System performance
dstat -tcp --top-cpu --top-mem --top-io

When single-node optimization isn't enough:

Implement DNS round-robin
Use consistent hashing for Redis sharding
Consider LVS (Linux Virtual Server) for load balancing
Implement proper health checks

When building analytics systems handling 1B+ daily hits (12K RPS sustained), every layer of your stack needs surgical optimization. Here's how we squeezed maximum performance from Nginx, Gunicorn, and Redis on commodity hardware.

Initial benchmarks showed:

1200 RPS for static files (Nginx alone)
600 RPS for Redis tracking endpoint
Performance degradation to 250 RPS during prolonged tests

# Core tuning parameters
worker_processes auto;
worker_rlimit_nofile 100000;
events {
    worker_connections 4096;
    multi_accept on;
    use epoll;
}

http {
    access_log off;
    error_log /dev/null crit;
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 30;
    keepalive_requests 1000;
    reset_timedout_connection on;
    client_body_timeout 10;
    send_timeout 2;
    
    # Redis upstream
    upstream redis_tracker {
        server 127.0.0.1:6379;
        keepalive 100;
    }
}

Critical sysctl adjustments:

# Increase file descriptors
fs.file-max = 100000

# TCP stack optimization
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.core.somaxconn = 4096
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_local_port_range = 1024 65535

# High-throughput async worker
workers = 8
worker_class = "gevent"
worker_connections = 1000
keepalive = 30
timeout = 30

Key settings in redis.conf:

maxmemory-policy volatile-lru
save "" # Disable persistence
maxclients 10000
tcp-keepalive 300
hz 10

The 250 RPS drop during long tests was traced to:

TCP port exhaustion (solved with ip_local_port_range)
Connection buildup in TIME_WAIT (tcp_tw_reuse)
Redis memory fragmentation (volatile-lru policy)

After optimizations:

Test	Before	After
Static Files	1200 RPS	8500 RPS
Tracking Endpoint	600 RPS	4200 RPS
1M Request Test	250 RPS	3800 RPS

ServerDevWorker

High-Performance Nginx Tuning: Scaling to 12,000+ Requests/Second for Analytics Systems

Related Articles