Round-Robin DNS for High Availability: Client Failover Behavior Analysis

Round-robin DNS operates by returning multiple IP addresses for a domain in a rotating order. When configured with two IPs (A records) for example.com, DNS queries will alternate between them:


example.com.    IN  A  192.0.2.1
example.com.    IN  A  203.0.113.2

The fundamental limitation emerges when one IP becomes unavailable. Standard DNS behavior doesn't include health checks - the failed IP remains in rotation. Client behavior varies:

Modern browsers implement "Happy Eyeballs" (RFC 8305) attempting parallel connections
Many applications simply try the first returned IP and fail on timeout
DNS cache TTLs delay failover to alternative IPs

Let's simulate the behavior using a simple HTTP request:


# First attempt (gets unresponsive IP)
$ curl -v http://example.com
* Trying 192.0.2.1:80...
* connect to 192.0.2.1 port 80 failed: Connection timed out

# Second attempt after DNS cache expires
$ curl -v http://example.com
* Trying 203.0.113.2:80...
* Connected to example.com (203.0.113.2) port 80

For true high availability, consider these enhanced approaches:


# DNS-based solution with health checks (AWS Route53 example)
resource "aws_route53_health_check" "backend" {
  ip_address        = "192.0.2.1"
  port              = 80
  type              = "HTTP"
  resource_path     = "/health"
  failure_threshold = 2
  request_interval  = 30
}

# Application-layer retry pattern (Python example)
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

session = requests.Session()
retries = Retry(
    total=3,
    connect=3,
    read=3,
    status=3,
    backoff_factor=0.5,
    allowed_methods=frozenset(['GET', 'POST'])
)
session.mount('http://', HTTPAdapter(max_retries=retries))

Despite limitations, round-robin works well for:

Load distribution across healthy endpoints
Blue-green deployments with controlled cutovers
Geographic distribution when combined with EDNS Client Subnet

Round-robin DNS distributes requests across multiple IP addresses in a rotating fashion, but it's crucial to understand that DNS itself provides no health checking or automatic failover mechanism. When a client receives multiple IPs from a DNS response, the typical behavior is:


// Example DNS response with two A records
example.com.    300 IN  A  192.0.2.1
example.com.    300 IN  A  203.0.113.2

Most modern operating systems and HTTP clients implement "happy eyeballs" algorithms that attempt parallel connections:

The client gets both IP addresses from DNS
Attempts to connect to the first IP
If no response within timeout (typically 300ms-1s), tries the next IP
This happens at the TCP layer, before any application protocol handshake

Different technologies handle this differently:

Web Browsers

Modern browsers (Chrome, Firefox) implement sophisticated connection strategies:


// Chrome's connection behavior pseudocode
async function connect(url) {
  const ips = await dns.resolve(url);
  const connections = ips.map(ip => tryConnect(ip));
  return Promise.any(connections);
}

Programming Languages

Most HTTP libraries will automatically try alternative IPs:


// Python requests example
import requests
try:
    response = requests.get('http://example.com', timeout=5)
except requests.exceptions.ConnectTimeout:
    # The library already tried all IPs before failing
    handle_failure()

Round-robin DNS alone isn't a complete HA solution because:

DNS caching means clients may continue trying dead IPs until TTL expires
No awareness of server health or load
Uneven distribution if some clients cache DNS longer than others

For production systems, consider combining with:

Health-Checking DNS

Services like Amazon Route 53 or NS1 provide DNS with health checks:


# Route 53 health check configuration
resource "aws_route53_health_check" "example" {
  ip_address        = "192.0.2.1"
  port              = 80
  type              = "HTTP"
  resource_path     = "/health"
  failure_threshold = 3
}

Client-Side Retry Logic

Implement explicit retries in your application code:


// Node.js with retry logic
const axiosRetry = require('axios-retry');
const axios = require('axios');

axiosRetry(axios, { 
  retries: 3,
  retryCondition: (error) => {
    return axiosRetry.isNetworkError(error) || 
      (error.response && error.response.status >= 500);
  }
});

When using round-robin DNS, implement:

DNS resolution monitoring to ensure all IPs are returned
Endpoint availability checks for each IP
TTL expiration tracking to detect caching issues

ServerDevWorker

Round-Robin DNS for High Availability: Client Failover Behavior Analysis

Web Browsers

Programming Languages

Health-Checking DNS

Client-Side Retry Logic

Related Articles