When HAProxy returns a 503 error with NOSRV in the logs while backends are operational, it typically indicates a routing failure where HAProxy cannot select an appropriate server from the designated backend pool. Let's examine this specific case where traffic was confirmed reaching the ELB backend, yet HAProxy failed to route requests.
# Sample error from syslog
Mar 26 19:47:01 localhost haproxy[23910]: 10.0.0.30:34261
[26/Mar/2013:19:46:48.579] fe v2/ 12801/-1/-1/-1/12801 503
212 - - SC-- 0/0/0/0/0 0/0 "GET /path/v2/ HTTP/1.1"
The presented configuration shows several potential optimization points:
backend v2
server v2Elb internal-xxx.us-west-1.elb.amazonaws.com:80
weight 1 maxconn 512 check
backend v2e
server v2eElb 10.0.1.28:80 weight 1 maxconn 512 check
The NOSRV error typically occurs when:
- All servers in the backend are marked as DOWN
- DNS resolution fails for ELB endpoints
- Health checks timeout before marking servers UP
- Connection limits (maxconn) are exhausted
Consider these improvements for better reliability:
backend v2
balance roundrobin
option redispatch # Allow retries on different servers
option allbackups # Use backup servers when primaries fail
server v2Elb internal-xxx.us-west-1.elb.amazonaws.com:80
resolvers mydns check inter 2000 rise 2 fall 3
weight 1 maxconn 1000
backup server v2Elb-fallback internal-xxx-fallback.elb.amazonaws.com:80 backup
When facing NOSRV issues, these commands help diagnose:
# Check server states
echo "show stat" | socat /var/run/haproxy.stat stdio | column -s, -t
# Verify DNS resolution
dig +short internal-xxx.us-west-1.elb.amazonaws.com
# Monitor health checks in real-time
watch -n 1 'echo "show stat" | socat /var/run/haproxy.stat stdio | grep v2Elb'
Implement these best practices:
- Use DNS resolvers with TTL tracking
- Configure proper health check timeouts
- Implement backup servers and retry logic
- Monitor socket allocation and connection limits
Recently encountered a puzzling scenario where HAProxy returned 503 errors with NOSRV status despite confirmed backend availability. Logs showed entries like:
Mar 26 19:47:01 localhost haproxy[23910]: 10.0.0.30:34261
[26/Mar/2013:19:46:48.579] fe v2/<NOSRV> 12801/-1/-1/-1/12801 503
212 - - SC-- 0/0/0/0/0 0/0 "GET /path/v2/ HTTP/1.1"
Here's the relevant HAProxy configuration snippet:
global
maxconn 1000
daemon
nbproc 1
log 127.0.0.1 local1
defaults
mode http
clitimeout 60000
timeout server 300000
contimeout 4000
option httpclose
backend v2
server v2Elb internal-xxx.us-west-1.elb.amazonaws.com:80 weight 1 maxconn 512 check
backend v2e
server v2eElb 10.0.1.28:80 weight 1 maxconn 512 check
frontend fe
bind :80
option httpchk
option forwardfor
option httplog
log global
acl v2e path_beg /path/v2e
acl v2 path_beg /path/v2
redirect location https://my.domain.com/path/v2/ if !v2e !v2
use_backend v2e if v2e
use_backend v2 if v2
After thorough investigation, several potential causes emerged:
- DNS Resolution Issues: The ELB hostname might have temporary DNS resolution failure
- Health Check Timeouts: Strict timeout values causing premature backend marking
- Connection Pool Exhaustion: maxconn limits being hit during traffic spikes
- Process Isolation: Single nbproc causing queueing delays
Here are effective configurations to prevent NOSRV errors:
# Enhanced Timeout Settings
timeout connect 10s
timeout client 30s
timeout server 30s
# Improved Health Checking
option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
http-check expect status 200
# Connection Management
defaults
option redispatch
retries 3
Implement these monitoring checks:
# Sample Nagios check
define command {
command_name check_haproxy
command_line $USER1$/check_haproxy -H $HOSTADDRESS$ -u /haproxy?stats -s '$ARG1$'
}
# Example Prometheus metrics
haproxy_backend_http_requests_total{backend="v2"}
haproxy_backend_current_sessions{backend="v2"}
For enterprise deployments, consider this robust setup:
backend v2
balance leastconn
server v2Elb internal-xxx.us-west-1.elb.amazonaws.com:80
resolvers mydns
check inter 5s rise 2 fall 3
maxconn 2048
weight 100
on-error mark-down
backup server v2-fallback 10.0.1.29:80 check
When NOSRV occurs:
- Check real-time stats:
echo "show stat" | sudo socat /var/run/haproxy.sock stdio
- Verify DNS:
dig +short internal-xxx.us-west-1.elb.amazonaws.com
- Inspect health:
curl -I http://internal-xxx.us-west-1.elb.amazonaws.com/health
- Review logs:
journalctl -u haproxy -f