When your backend service restarts (e.g., during deployments or crashes), Nginx's default behavior is to immediately return a 502 Bad Gateway error if it can't establish a connection. This creates poor user experience and unnecessary failures for transient issues.
Nginx's proxy_next_upstream directive combined with retry parameters solves this elegantly:
location / {
proxy_pass http://backend;
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 3;
proxy_next_upstream_timeout 10s;
proxy_connect_timeout 2s;
proxy_read_timeout 10s;
}
- proxy_next_upstream: Specifies which conditions warrant a retry (error, timeout, or specific HTTP status codes)
- proxy_next_upstream_tries: Maximum number of retry attempts (N in your question)
- proxy_next_upstream_timeout: Total time limit for all retry attempts
- proxy_connect_timeout: Individual connection attempt timeout
For the M seconds delay requirement, we need to combine Nginx with a bit of Lua scripting (requires ngx_http_lua_module):
location / {
access_by_lua_block {
local retry_count = tonumber(ngx.var.cookie_retry_count) or 0
if retry_count > 0 then
ngx.sleep(2) -- M seconds delay
end
ngx.header["Set-Cookie"] = "retry_count=" .. (retry_count + 1)
}
proxy_pass http://backend;
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 3;
proxy_intercept_errors on;
error_page 502 = @retry;
}
location @retry {
internal;
content_by_lua_block {
ngx.exit(502)
}
}
If you're using Nginx Plus, you get built-in retry delay functionality:
location / {
proxy_pass http://backend;
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 3;
proxy_next_upstream_timeout 10s;
health_check interval=2s fails=1 passes=1;
}
Use this simple Python script to simulate a restarting backend:
from flask import Flask
import time
import os
app = Flask(__name__)
start_time = time.time()
@app.route('/')
def hello():
# Return 503 for first 5 seconds to simulate restart
if time.time() - start_time < 5:
return "Backend restarting", 503
return "Backend available"
if __name__ == '__main__':
app.run(port=5000)
When your backend service restarts (whether for deployments or crashes), Nginx's default behavior is to immediately return a 502 Bad Gateway error if it can't establish a connection. This creates poor user experience during maintenance windows.
Nginx's proxy_next_upstream directive combined with timeout settings provides the retry mechanism we need:
location / {
proxy_pass http://backend;
proxy_next_upstream error timeout invalid_header http_502 http_503 http_504;
proxy_next_upstream_timeout 60s; # Total retry window
proxy_next_upstream_tries 3; # Max retry attempts
proxy_connect_timeout 5s; # Initial connection timeout
}
For true resiliency, we should add delays between retries. While Nginx doesn't have built-in retry delay, we can approximate it:
http {
# Custom error page that triggers client-side retry
proxy_intercept_errors on;
error_page 502 = @retry_backend;
location @retry_backend {
# Return 503 with Retry-After header
add_header Retry-After 5;
return 503;
}
location / {
proxy_pass http://backend;
proxy_next_upstream error timeout http_502;
proxy_read_timeout 10s;
}
}
For exact retry timing control, use OpenResty with Lua:
location / {
access_by_lua_block {
local max_retries = 3
local retry_delay = 2 -- seconds
for i = 1, max_retries do
ngx.sleep(retry_delay)
local res = ngx.location.capture("/proxy-pass")
if res.status < 500 then
ngx.exec("@handle_response", { res })
return
end
end
ngx.exit(502)
}
}
location /proxy-pass {
internal;
proxy_pass http://backend;
}
- Always set reasonable timeout values (connect_timeout, read_timeout)
- Monitor retry metrics using Nginx status modules
- Combine with health_check for optimal backend selection
- Consider circuit breakers for prolonged outages