When using Nginx as a reverse proxy for Java-based services (particularly those with potential GC pauses), I noticed an interesting pattern with timeout configurations. While both proxy_connect_timeout
and proxy_read_timeout
work as expected individually, their combined behavior doesn't manifest as anticipated.
# Current Nginx configuration snippet
location /service {
proxy_pass http://backend;
proxy_connect_timeout 2s;
proxy_read_timeout 5s;
proxy_next_upstream timeout;
}
In benchmarks with millions of requests:
- Some requests fail exactly at 2s (connect timeout)
- Others fail exactly at 5s (read timeout)
- But none fail between 2-5s or near 7s (sum of both timeouts)
After examining Nginx's source code (version 1.18.0), the behavior makes sense:
// Simplified from ngx_http_proxy_module.c
if (ngx_http_upstream_connect(r, u) != NGX_OK) {
// Connection phase timeout handling
ngx_add_timer(&u->peer.connection->write, u->connect_timeout);
} else {
// After connection established
ngx_add_timer(&u->peer.connection->read, u->read_timeout);
}
For Java services with GC pauses:
- If the pause occurs during connection: fails at
proxy_connect_timeout
- If the pause occurs during processing: fails at
proxy_read_timeout
- The timers are mutually exclusive - they don't accumulate
For better failover handling:
upstream backend {
server backend1.example.com max_fails=3 fail_timeout=30s;
server backend2.example.com max_fails=3 fail_timeout=30s;
}
server {
proxy_next_upstream timeout non_idempotent;
proxy_connect_timeout 2s;
proxy_read_timeout 5s;
proxy_send_timeout 3s;
}
Add these to your Nginx log format to track timeout patterns:
log_format timed_proxy '$remote_addr - $upstream_addr [$time_local] '
'"$request" $status $body_bytes_sent '
'$upstream_connect_time $upstream_response_time';
When dealing with Java-based services behind Nginx reverse proxies, we often encounter situations where garbage collection pauses cause service interruptions. Here's a typical configuration I've used:
location /service {
proxy_pass http://backend;
proxy_connect_timeout 2s;
proxy_read_timeout 5s;
proxy_next_upstream timeout;
}
During extensive benchmarking (millions of requests), I noticed:
- Connect timeouts consistently occurred at exactly 2s
- Read timeouts consistently occurred at exactly 5s
- No timeouts were observed in the 7s range (2+5)
After examining Nginx's source code and conducting packet-level analysis, I discovered:
// Simplified explanation of Nginx's timeout handling
if (connection_establishment) {
start connect_timeout_timer;
if (timeout) { handle_connect_timeout; }
} else {
start read_timeout_timer;
if (timeout) { handle_read_timeout; }
}
The key insight: these timeouts are mutually exclusive phases in the request lifecycle. Nginx doesn't accumulate them because:
- The connection phase must complete before the read phase begins
- Successful connection resets the timeout counter
For JVM-based services with GC pauses, consider this alternative approach:
upstream backend {
server 10.0.0.1:8080 max_fails=1 fail_timeout=5s;
server 10.0.0.2:8080 backup;
}
location /service {
proxy_pass http://backend;
proxy_connect_timeout 500ms;
proxy_read_timeout 1s;
health_check interval=2s fails=1 passes=1 uri=/health;
}
Based on my experience with high-traffic systems:
- Set
proxy_connect_timeout
aggressively low (200-500ms) - Use
proxy_next_upstream_timeout
to control total retry window - Combine with active health checks for better failover
- Monitor
$upstream_connect_time
in logs for tuning
Here's an example log format to track these metrics:
log_format timed_combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'rt=$request_time uct=$upstream_connect_time';