When your backend service restarts (e.g., during deployments or crashes), Nginx's default behavior is to immediately return a 502 Bad Gateway error if it can't establish a connection. This creates poor user experience and unnecessary failures for transient issues.
Nginx's proxy_next_upstream
directive combined with retry parameters solves this elegantly:
location / {
proxy_pass http://backend;
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 3;
proxy_next_upstream_timeout 10s;
proxy_connect_timeout 2s;
proxy_read_timeout 10s;
}
- proxy_next_upstream: Specifies which conditions warrant a retry (error, timeout, or specific HTTP status codes)
- proxy_next_upstream_tries: Maximum number of retry attempts (N in your question)
- proxy_next_upstream_timeout: Total time limit for all retry attempts
- proxy_connect_timeout: Individual connection attempt timeout
For the M seconds delay requirement, we need to combine Nginx with a bit of Lua scripting (requires ngx_http_lua_module):
location / {
access_by_lua_block {
local retry_count = tonumber(ngx.var.cookie_retry_count) or 0
if retry_count > 0 then
ngx.sleep(2) -- M seconds delay
end
ngx.header["Set-Cookie"] = "retry_count=" .. (retry_count + 1)
}
proxy_pass http://backend;
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 3;
proxy_intercept_errors on;
error_page 502 = @retry;
}
location @retry {
internal;
content_by_lua_block {
ngx.exit(502)
}
}
If you're using Nginx Plus, you get built-in retry delay functionality:
location / {
proxy_pass http://backend;
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 3;
proxy_next_upstream_timeout 10s;
health_check interval=2s fails=1 passes=1;
}
Use this simple Python script to simulate a restarting backend:
from flask import Flask
import time
import os
app = Flask(__name__)
start_time = time.time()
@app.route('/')
def hello():
# Return 503 for first 5 seconds to simulate restart
if time.time() - start_time < 5:
return "Backend restarting", 503
return "Backend available"
if __name__ == '__main__':
app.run(port=5000)
When your backend service restarts (whether for deployments or crashes), Nginx's default behavior is to immediately return a 502 Bad Gateway error if it can't establish a connection. This creates poor user experience during maintenance windows.
Nginx's proxy_next_upstream
directive combined with timeout settings provides the retry mechanism we need:
location / { proxy_pass http://backend; proxy_next_upstream error timeout invalid_header http_502 http_503 http_504; proxy_next_upstream_timeout 60s; # Total retry window proxy_next_upstream_tries 3; # Max retry attempts proxy_connect_timeout 5s; # Initial connection timeout }
For true resiliency, we should add delays between retries. While Nginx doesn't have built-in retry delay, we can approximate it:
http { # Custom error page that triggers client-side retry proxy_intercept_errors on; error_page 502 = @retry_backend; location @retry_backend { # Return 503 with Retry-After header add_header Retry-After 5; return 503; } location / { proxy_pass http://backend; proxy_next_upstream error timeout http_502; proxy_read_timeout 10s; } }
For exact retry timing control, use OpenResty with Lua:
location / { access_by_lua_block { local max_retries = 3 local retry_delay = 2 -- seconds for i = 1, max_retries do ngx.sleep(retry_delay) local res = ngx.location.capture("/proxy-pass") if res.status < 500 then ngx.exec("@handle_response", { res }) return end end ngx.exit(502) } } location /proxy-pass { internal; proxy_pass http://backend; }
- Always set reasonable timeout values (connect_timeout, read_timeout)
- Monitor retry metrics using Nginx status modules
- Combine with health_check for optimal backend selection
- Consider circuit breakers for prolonged outages