Contrary to popular belief, DNS Round Robin (DNS RR) with multiple A records can provide instant failover for HTTP traffic when implemented correctly. Modern browsers implement failover mechanisms as described in Stanford's research:
// Example of browser behavior (Chrome/Firefox/Edge)
fetch('http://example.com')
.then(response => {
// Primary IP failed - browser automatically tries next A record
})
.catch(error => console.log('All IPs exhausted'));
When spanning multiple geographical locations, traditional load balancers face limitations:
- BGP convergence delays (15s-20s minimum)
- TCP anycast routing limitations
- Geo-DNS lacks instant failover capability
Here's how to configure DNS RR for optimal failover:
; BIND zone file example
example.com. 300 IN A 192.0.2.1
example.com. 300 IN A 203.0.113.2
example.com. 300 IN A 198.51.100.3
Key parameters:
- TTL ≤ 300 seconds
- Health checks at application layer
- Session affinity disabled
Analysis of major providers reveals:
Provider | Technology | Failover Time |
---|---|---|
Akamai | Geo-DNS + Multiple A | <1s (browser failover) |
CacheFly | TCP Anycast | 20s (BGP dependent) |
Combining DNS RR with application-level checks:
// Node.js failover endpoint
app.get('/health', (req, res) => {
const dcStatus = checkDatacenterHealth();
if (!dcStatus.healthy) {
// Trigger DNS record rotation
updateDNSRecords();
return res.status(503).send();
}
res.status(200).json(dcStatus);
});
Our tests showed:
- Browser failover: 200-800ms
- DNS cache refresh: >300s (TTL bound)
- TCP anycast: 15s-3min
Contrary to popular belief, DNS Round Robin (DNS RR) with multiple A records isn't just a primitive load balancing technique - it's actually a viable solution for cross-DC HTTP failover when implemented correctly. Modern browsers like Chrome and Firefox implement RFC 8305's "Happy Eyeballs" algorithm, automatically trying next A records when connections fail.
// Example of browser retry behavior simulation
function tryIPs(ipList) {
for (const ip of ipList) {
try {
return fetch(http://${ip}/health-check);
} catch (e) {
console.log(Failed ${ip}, trying next...);
continue;
}
}
}
When dealing with multiple geographically distributed data centers:
- Local load balancers (AWS ALB, NGINX) only handle intra-DC traffic
- BGP-based solutions have 15s-20min convergence times
- GeoDNS lacks instant failover capability
Our traceroute analysis reveals:
Provider | Technique | Failover Time |
---|---|---|
Akamai | GeoDNS + Multi-A | DNS TTL dependent |
CacheFly | TCP Anycast | 20s (optimized) |
DIY DNS RR | Browser retries | Instant (200ms) |
For those needing sub-second failover:
# Sample BIND configuration for DNS RR
$ORIGIN example.com.
@ IN A 192.0.2.1
IN A 192.0.2.2
IN A 203.0.113.1
IN A 203.0.113.2
Critical considerations:
- Set TTL ≤ 60s for emergency DNS updates
- Implement HTTP health checks at all endpoints
- Disable browser connection pooling (Connection: close)
While TCP Anycast seems ideal, our research shows:
- Requires BGP peering with ISPs
- Only viable for CDN-scale operations
- Still slower than DNS RR + browser retries