Ultimate Guide to Achieving 500K RPS with Nginx: Server Optimization and Benchmarking Techniques


2 views

When aiming for 500K requests per second (RPS), several factors come into play:

# Current hardware limitations:
- 100Mbps network interface (max ~12.5MB/s theoretical throughput)
- 4-core CPU with hyperthreading
- Software RAID 1 configuration

Here's an optimized nginx.conf for high throughput:

worker_processes auto;
worker_rlimit_nofile 1000000;

events {
    worker_connections 65536;
    multi_accept on;
    use epoll;
}

http {
    access_log off;
    error_log /var/log/nginx/error.log crit;
    
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 10;
    keepalive_requests 100000;
    
    open_file_cache max=200000 inactive=20s;
    open_file_cache_valid 30s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;
    
    # Disable all features that aren't needed for static content
    server_tokens off;
    gzip off;
    
    server {
        listen 80 reuseport;
        location / {
            root /var/www/html;
            try_files $uri =404;
        }
    }
}

Add these to /etc/sysctl.conf:

net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_syncookies = 1
fs.file-max = 2097152

Different tools yield different results:

# Apache Benchmark (ab)
ab -n 1000000 -c 500 http://localhost/test.txt

# wrk (more modern tool)
wrk -t8 -c1000 -d30s http://localhost/test.txt

# h2load (for HTTP/2 testing)
h2load -n 1000000 -c 1000 -m 100 http://localhost/test.txt

When testing locally:

  • Disable all logging
  • Use RAM disk for test files
  • Monitor CPU affinity with taskset
  • Consider using multiple IP addresses

For production environments nearing 500K RPS:

# Multiple worker processes with CPU affinity
worker_processes 8;
worker_cpu_affinity auto;

# TCP optimizations
listen 80 reuseport so_keepalive=on backlog=65535;

# Zero-copy file transfer
sendfile on;
directio 4m;

Essential monitoring commands:

# CPU usage
mpstat -P ALL 1

# Network throughput
iftop -i eth0 -n

# Process limits
cat /proc/$(pgrep nginx)/limits

# Socket statistics
ss -s

When pushing a web server to its absolute limits, every configuration parameter matters. Let's analyze how we can optimize an Xeon E3-1270 server with Nginx to handle 500,000 requests per second for static content.

Your current setup has decent specs:

Intel® Xeon® E3-1270 4 Cores (8 HT) x 3.4 GHz
24GB DDR3 ECC RAM
100Mbps network (bottleneck warning)

The 100Mbps NIC will theoretically max out at ~12,500 requests/second (assuming 1KB responses), so for local benchmarking we'll ignore this limitation.

Your current config shows good practices, but let's enhance it further:

worker_processes auto; # Match CPU cores
worker_rlimit_nofile 1000000; # Increase open file limit

events {
    worker_connections 65536;
    multi_accept on;
    use epoll;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_requests 100000;
    keepalive_timeout 65;
    
    access_log off; # Disable during benchmarks
    error_log /dev/null crit; # Minimal error logging
    
    open_file_cache max=200000 inactive=20s;
    open_file_cache_valid 30s;
    open_file_cache_min_uses 2;
    open_file_cache_errors off;
}

Add these to /etc/sysctl.conf:

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_max_syn_backlog = 40000
net.ipv4.tcp_max_tw_buckets = 2000000
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65536
net.ipv4.tcp_no_metrics_save = 1
fs.file-max = 1000000

Different tools yield different results:

  • ab: Simple but limited (~35K RPS)
  • wrk: More efficient (~165K RPS)
  • vegeta: Distributed testing capability
  • k6: Modern load testing tool

For extreme performance:

# Use memory-backed filesystem for temporary files
mount -t tmpfs -o size=512M tmpfs /var/lib/nginx/tmp

# CPU affinity binding
worker_cpu_affinity auto;

# Zero-copy optimizations
sendfile_max_chunk 512k;

Use these commands simultaneously:

dstat -cmdn --top-cpu --top-mem --top-io
perf top -p $(pgrep -d, nginx)
iftop -nNP

Through these optimizations, you should see significant improvements in request handling capacity. The exact numbers will depend on your specific workload and environment.