HAProxy NOSRV 503 Error: Debugging Backend Server Availability Issues


2 views

When HAProxy returns a 503 error with NOSRV in the logs while backends are operational, it typically indicates a routing failure where HAProxy cannot select an appropriate server from the designated backend pool. Let's examine this specific case where traffic was confirmed reaching the ELB backend, yet HAProxy failed to route requests.

# Sample error from syslog
Mar 26 19:47:01 localhost haproxy[23910]: 10.0.0.30:34261 
[26/Mar/2013:19:46:48.579] fe v2/ 12801/-1/-1/-1/12801 503 
212 - - SC-- 0/0/0/0/0 0/0 "GET /path/v2/ HTTP/1.1"

The presented configuration shows several potential optimization points:

backend v2
    server v2Elb internal-xxx.us-west-1.elb.amazonaws.com:80 
    weight 1 maxconn 512 check

backend v2e
    server v2eElb 10.0.1.28:80 weight 1 maxconn 512 check

The NOSRV error typically occurs when:

  • All servers in the backend are marked as DOWN
  • DNS resolution fails for ELB endpoints
  • Health checks timeout before marking servers UP
  • Connection limits (maxconn) are exhausted

Consider these improvements for better reliability:

backend v2
    balance roundrobin
    option redispatch  # Allow retries on different servers
    option allbackups  # Use backup servers when primaries fail
    server v2Elb internal-xxx.us-west-1.elb.amazonaws.com:80 
    resolvers mydns check inter 2000 rise 2 fall 3
    weight 1 maxconn 1000
    backup server v2Elb-fallback internal-xxx-fallback.elb.amazonaws.com:80 backup

When facing NOSRV issues, these commands help diagnose:

# Check server states
echo "show stat" | socat /var/run/haproxy.stat stdio | column -s, -t

# Verify DNS resolution
dig +short internal-xxx.us-west-1.elb.amazonaws.com

# Monitor health checks in real-time
watch -n 1 'echo "show stat" | socat /var/run/haproxy.stat stdio | grep v2Elb'

Implement these best practices:

  1. Use DNS resolvers with TTL tracking
  2. Configure proper health check timeouts
  3. Implement backup servers and retry logic
  4. Monitor socket allocation and connection limits

Recently encountered a puzzling scenario where HAProxy returned 503 errors with NOSRV status despite confirmed backend availability. Logs showed entries like:

Mar 26 19:47:01 localhost haproxy[23910]: 10.0.0.30:34261 
[26/Mar/2013:19:46:48.579] fe v2/<NOSRV> 12801/-1/-1/-1/12801 503 
212 - - SC-- 0/0/0/0/0 0/0 "GET /path/v2/ HTTP/1.1"

Here's the relevant HAProxy configuration snippet:

global
    maxconn     1000
    daemon
    nbproc      1
    log         127.0.0.1 local1
defaults
        mode        http
        clitimeout  60000
        timeout server 300000
        contimeout  4000
        option      httpclose

backend v2
        server v2Elb internal-xxx.us-west-1.elb.amazonaws.com:80 weight 1 maxconn 512 check
backend v2e
        server v2eElb 10.0.1.28:80 weight 1 maxconn 512 check
frontend fe
        bind :80
        option httpchk
        option forwardfor
        option httplog
        log global
        acl v2e path_beg /path/v2e
        acl v2 path_beg /path/v2
        redirect location https://my.domain.com/path/v2/ if !v2e !v2
        use_backend v2e if v2e
        use_backend v2 if v2

After thorough investigation, several potential causes emerged:

  • DNS Resolution Issues: The ELB hostname might have temporary DNS resolution failure
  • Health Check Timeouts: Strict timeout values causing premature backend marking
  • Connection Pool Exhaustion: maxconn limits being hit during traffic spikes
  • Process Isolation: Single nbproc causing queueing delays

Here are effective configurations to prevent NOSRV errors:

# Enhanced Timeout Settings
timeout connect 10s
timeout client 30s
timeout server 30s

# Improved Health Checking
option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
http-check expect status 200

# Connection Management
defaults
    option redispatch
    retries 3

Implement these monitoring checks:

# Sample Nagios check
define command {
    command_name    check_haproxy
    command_line    $USER1$/check_haproxy -H $HOSTADDRESS$ -u /haproxy?stats -s '$ARG1$'
}

# Example Prometheus metrics
haproxy_backend_http_requests_total{backend="v2"}
haproxy_backend_current_sessions{backend="v2"}

For enterprise deployments, consider this robust setup:

backend v2
    balance leastconn
    server v2Elb internal-xxx.us-west-1.elb.amazonaws.com:80 
        resolvers mydns
        check inter 5s rise 2 fall 3
        maxconn 2048
        weight 100
        on-error mark-down
        backup server v2-fallback 10.0.1.29:80 check

When NOSRV occurs:

  1. Check real-time stats: echo "show stat" | sudo socat /var/run/haproxy.sock stdio
  2. Verify DNS: dig +short internal-xxx.us-west-1.elb.amazonaws.com
  3. Inspect health: curl -I http://internal-xxx.us-west-1.elb.amazonaws.com/health
  4. Review logs: journalctl -u haproxy -f