When running Nginx with multiple site configurations, a common frustration occurs during server reboots: if any upstream server in your config becomes temporarily unreachable (due to DNS resolution failure or network issues), Nginx refuses to start altogether. This creates a single point of failure where healthy sites can't serve traffic because of one problematic upstream.
By default, Nginx performs hostname resolution during configuration parsing. If an upstream hostname can't be resolved (like when your internal DNS shows example2.service.example.com as down), Nginx treats this as a fatal configuration error.
# This is what happens in the background during nginx -t
upstream example2 {
server example2.service.example.com; # Fails if DNS returns NXDOMAIN
}
1. Using DNS Resolver with Timeout
Configure Nginx to use a resolver with timeout settings for dynamic DNS resolution:
http {
resolver 8.8.8.8 valid=30s;
upstream example2 {
server example2.service.example.com resolve;
}
}
2. IP Fallback with Health Checks
Combine IP addresses with active health checks:
upstream backend {
server 192.168.1.1:80 max_fails=3 fail_timeout=30s;
server backup.example.com:80 backup;
server 127.0.0.1:8080 down; # Local fallback
}
3. Configuration Splitting
Separate your configs into critical and non-critical includes:
http {
include /etc/nginx/conf.d/core/*.conf; # Always available services
include /etc/nginx/conf.d/optional/*.conf; # May fail independently
}
4. Using the 'resolve' Parameter (Nginx Plus)
For commercial Nginx Plus users:
upstream dynamic {
zone upstream_dynamic 64k;
server example2.service.example.com resolve;
}
Here's a complete solution combining multiple techniques:
http {
resolver 8.8.8.8 1.1.1.1 valid=10s;
# Main configuration
include /etc/nginx/sites-enabled/_stable/*.conf;
# Optional configurations (will not block startup)
include /etc/nginx/sites-enabled/_optional/*.conf;
}
# In your optional config file:
upstream example2 {
server example2.service.example.com resolve max_fails=2;
server fallback.example.com:80 backup;
server 127.0.0.1:8080 down;
}
server {
listen 80;
server_name example2.com;
location / {
proxy_pass http://example2;
proxy_next_upstream error timeout invalid_header;
proxy_next_upstream_timeout 0;
proxy_next_upstream_tries 3;
}
}
- DNS caching behavior varies between Nginx versions
- The 'resolve' parameter requires Nginx Plus for production use
- Always test with
nginx -t
before applying changes - Health checks add minimal overhead but improve reliability
Implement proper monitoring to detect configuration issues:
# In your server block
location /nginx_status {
stub_status;
allow 127.0.0.1;
deny all;
}
When running multiple websites through Nginx with separate upstream configurations, a common pain point occurs during server reboots. If any upstream server is unreachable at startup, Nginx fails to start entirely - even for perfectly healthy sites. This creates unnecessary downtime for working services.
Nginx performs DNS resolution during configuration parsing, and by default treats unresolvable upstream hosts as fatal errors. When using dynamic DNS where hosts automatically register/deregister based on availability (common in microservices architectures), this becomes particularly problematic.
upstream problematic {
server down.service.example.com; # Causes entire Nginx to fail if DNS can't resolve
}
We can make Nginx more resilient using these techniques:
1. DNS Resolution Directive
Add resolver
with valid
parameter to cache DNS lookups:
http {
resolver 8.8.8.8 valid=30s; # Use your own DNS server here
upstream example2 {
server example2.service.example.com resolve;
}
}
2. Backup Server Fallback
Configure a backup that always responds (like a local maintenance page):
upstream resilient {
server primary.service.example.com;
server 127.0.0.1:8080 backup; # Local maintenance server
}
server {
listen 8080;
return 503 'Service Temporarily Unavailable';
}
3. Dynamic Upstream with Health Checks
Use Nginx Plus or OpenResty for active health checks:
upstream dynamic {
zone upstream_dynamic 64k;
server example1.service.example.com resolve;
server example2.service.example.com resolve;
health_check interval=5 fails=1 passes=1;
}
Here's a complete solution combining these approaches:
http {
resolver 10.0.0.2 valid=10s; # Internal DNS server
# Default catch-all upstream
upstream maintenance {
server 127.0.0.1:8080;
}
server {
listen 8080;
location / {
return 503 '{"status":"maintenance"}';
add_header Content-Type application/json;
}
}
# Actual site configuration
upstream example1 {
server example1.service.example.com resolve;
server maintenance backup;
}
upstream example2 {
server example2.service.example.com resolve;
server maintenance backup;
}
}
- The
resolve
parameter requires Nginx 1.7.2+ - Always test configurations with
nginx -t
before applying - Consider implementing proper circuit breakers in your application code
- For complex environments, explore service discovery integration
If you can't modify Nginx configurations:
- Use static IPs in /etc/hosts as fallback
- Implement startup scripts that verify upstream availability before starting Nginx
- Consider container orchestration solutions that handle this at the infrastructure level