When testing NGINX with 200+ concurrent connections using blitz.io, we're observing significant timeout issues despite adequate server resources. The symptoms suggest a configuration bottleneck rather than hardware limitations.
Let's examine the key parameters that need adjustment:
# System-level TCP optimizations
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65536
Here's an optimized NGINX configuration template for high concurrency:
worker_processes auto;
worker_rlimit_nofile 100000;
events {
worker_connections 4096;
multi_accept on;
use epoll;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 30;
keepalive_requests 100;
open_file_cache max=200000 inactive=20s;
open_file_cache_valid 30s;
open_file_cache_min_uses 2;
open_file_cache_errors on;
client_body_timeout 15;
client_header_timeout 15;
send_timeout 15;
reset_timedout_connection on;
# Buffer sizes
client_body_buffer_size 128k;
client_header_buffer_size 8k;
large_client_header_buffers 8 16k;
output_buffers 4 32k;
postpone_output 1460;
# Gzip settings
gzip on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/json;
gzip_disable "msie6";
gzip_vary on;
}
These sysctl settings dramatically improve performance:
# Increase the number of incoming connections backlog
net.core.netdev_max_backlog = 65536
# Increase maximum amount of option memory buffers
net.core.optmem_max = 25165824
# Increase the maximum number of remembered connection requests
net.ipv4.tcp_max_syn_backlog = 65536
# Increase the local port range
net.ipv4.ip_local_port_range = 1024 65535
# Reduce TCP keepalive time
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 60
When benchmarking with blitz.io, use these recommended parameters:
# Test command example
blitz -k -n 10000 -c 500 -t 60 http://yourserver.com/test.txt
# Where:
# -k : keep-alive connections
# -n : total requests
# -c : concurrent connections
# -t : timeout in seconds
Essential commands to monitor performance:
# Watch active connections
watch -n 1 "netstat -n | awk '/^tcp/ {++S[\$NF]} END {for(a in S) print a, S[a]}'"
# Monitor NGINX status
tail -f /var/log/nginx/{access,error}.log
# Check system limits
cat /proc/$(cat /var/run/nginx.pid)/limits
- Setting worker_connections higher than worker_rlimit_nofile
- Forgetting to update ulimit for the nginx user
- Not enabling keepalive connections
- Using default TCP stack settings
- Overlooking file descriptor limits at system level
When dealing with timeout issues in NGINX under high concurrency (>200 connections), we need to examine multiple layers of the stack. From your configuration and test results, several potential culprits emerge:
# Key metrics during test:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20225 nginx 20 0 48140 6248 1672 S 16.0 0.0 0:21.68 nginx
Your current sysctl configuration is good but needs refinement for extreme concurrency:
# Critical TCP stack optimizations
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_window_scaling = 1
net.core.netdev_max_backlog = 65536
The worker configuration needs adjustment based on your server's CPU architecture:
worker_processes auto; # Better than static count
worker_rlimit_nofile 100000; # Match ulimit settings
events {
worker_connections 65536;
use epoll; # Critical for Linux
multi_accept on;
accept_mutex off; # For high contention scenarios
}
For static file serving under load, these directives are crucial:
http {
sendfile_max_chunk 512k;
tcp_nopush on;
tcp_nodelay on;
reset_timedout_connection on;
# Keepalive tuning
keepalive_requests 100000;
keepalive_timeout 30s;
}
When testing with blitz.io, consider these parameters:
# Recommended test command:
--region ireland --rampup 1-1000:30 --hold-for 60s \
-T 5000 --timeout 45 http://dev.anuary.com/test.txt
# Key metrics to monitor:
- TCP retransmits (netstat -s | grep retransmit)
- Connection queue drops (ss -lntp | grep nginx)
- File descriptor usage (ls -l /proc/$(pgrep nginx)/fd | wc -l)
When timeouts persist, enable these diagnostic tools:
# In nginx.conf:
error_log /var/log/nginx/error.log debug;
# Monitor kernel drops:
watch -n 1 'grep -E "drop|overflow" /proc/net/netstat'
# Real-time connection states:
ss -antop | awk '{print $1}' | sort | uniq -c
After implementing these changes, verify:
- ulimit -n shows at least 65535 for nginx user
- sysctl values are applied (sysctl -p)
- NOFILE limits in /etc/security/limits.conf
- NGINX worker processes have sufficient memory (check with pmap)