When benchmarking on a 2GB Ubuntu 10.04 server (non-production), we observe:
Static files: 1,200 requests/sec
Tracking endpoint: 600 requests/sec
Long-running test degradation: ~250 requests/sec
Essential settings for high-throughput scenarios:
worker_processes auto;
worker_connections 65535;
multi_accept on;
use epoll;
keepalive_timeout 30;
keepalive_requests 1000;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
reset_timedout_connection on;
client_body_timeout 10;
client_header_timeout 10;
send_timeout 5;
Critical sysctl parameters to adjust:
# Increase available ports
net.ipv4.ip_local_port_range = 1024 65535
# Socket buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Connection handling
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 3240000
net.core.netdev_max_backlog = 5000
# TCP tweaks
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies = 0
For Python/Django applications:
workers = (2 x $num_cores) + 1
worker_class = "gevent"
worker_connections = 1000
keepalive = 5
timeout = 30
graceful_timeout = 30
limit_request_line = 4094
limit_request_fields = 100
limit_request_field_size = 8190
The 250r/s degradation during long tests typically stems from:
- TCP connection buildup: Implement proper connection recycling
- Redis blocking operations: Use pipeline and async writes
- Socket exhaustion: Monitor with
netstat -ant | awk '{print $6}' | sort | uniq -c
- Ephemeral port exhaustion: Check with
ss -s
Essential monitoring commands:
# Real-time connection tracking
watch -n 1 "netstat -ant | awk '{print \$6}' | sort | uniq -c"
# Nginx status
curl http://localhost/nginx_status
# Redis memory
redis-cli info memory
# System performance
dstat -tcp --top-cpu --top-mem --top-io
When single-node optimization isn't enough:
- Implement DNS round-robin
- Use consistent hashing for Redis sharding
- Consider LVS (Linux Virtual Server) for load balancing
- Implement proper health checks
When building analytics systems handling 1B+ daily hits (12K RPS sustained), every layer of your stack needs surgical optimization. Here's how we squeezed maximum performance from Nginx, Gunicorn, and Redis on commodity hardware.
Initial benchmarks showed:
- 1200 RPS for static files (Nginx alone)
- 600 RPS for Redis tracking endpoint
- Performance degradation to 250 RPS during prolonged tests
# Core tuning parameters
worker_processes auto;
worker_rlimit_nofile 100000;
events {
worker_connections 4096;
multi_accept on;
use epoll;
}
http {
access_log off;
error_log /dev/null crit;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 30;
keepalive_requests 1000;
reset_timedout_connection on;
client_body_timeout 10;
send_timeout 2;
# Redis upstream
upstream redis_tracker {
server 127.0.0.1:6379;
keepalive 100;
}
}
Critical sysctl adjustments:
# Increase file descriptors
fs.file-max = 100000
# TCP stack optimization
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.core.somaxconn = 4096
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_local_port_range = 1024 65535
# High-throughput async worker
workers = 8
worker_class = "gevent"
worker_connections = 1000
keepalive = 30
timeout = 30
Key settings in redis.conf:
maxmemory-policy volatile-lru
save "" # Disable persistence
maxclients 10000
tcp-keepalive 300
hz 10
The 250 RPS drop during long tests was traced to:
- TCP port exhaustion (solved with ip_local_port_range)
- Connection buildup in TIME_WAIT (tcp_tw_reuse)
- Redis memory fragmentation (volatile-lru policy)
After optimizations:
Test | Before | After |
---|---|---|
Static Files | 1200 RPS | 8500 RPS |
Tracking Endpoint | 600 RPS | 4200 RPS |
1M Request Test | 250 RPS | 3800 RPS |