DNS Failover Techniques: Implementing Backup A Records and Alternatives for High Availability

While DNS has specialized record types like MX (mail exchange) and NS (name server) that support priority-based failover through preference values (e.g., MX 10 and MX 20), standard A records don't natively include this functionality. This creates challenges when trying to implement primary-backup server architectures at the DNS level.

For specific services, DNS does provide priority mechanisms:


; Mail server example with priorities
example.com.  IN  MX  10 mail1.example.com.
example.com.  IN  MX  20 mail2.example.com.

; Nameserver example
example.com.  IN  NS  ns1.example.com.
example.com.  IN  NS  ns2.example.com.

When you need backup A records, consider these approaches:

1. DNS TTL Optimization

Reduce TTL (Time To Live) to allow rapid DNS changes during outages:


example.com.  300  IN  A  192.0.2.1  ; 5 minute TTL

2. Round-Robin DNS

List multiple IPs and let clients choose:


example.com.  IN  A  192.0.2.1
example.com.  IN  A  192.0.2.2

3. Health-Check Based DNS

Implement dynamic DNS updates with health checks using tools like:

AWS Route 53 failover
Azure Traffic Manager
PowerDNS with health check scripts

Here's a Python script that monitors server health and updates DNS records via API:


import requests
import dns.update
import dns.query

def check_server_health(ip):
    try:
        response = requests.get(f"http://{ip}/health", timeout=2)
        return response.status_code == 200
    except:
        return False

def update_dns_record(primary_ip, backup_ip):
    update = dns.update.Update('example.com')
    update.replace('@', 300, 'A', backup_ip if not check_server_health(primary_ip) else primary_ip)
    dns.query.tcp(update, 'ns1.example.com')

Provider	Feature	Implementation
Cloudflare	Load balancing	Health checks + failover
AWS Route 53	Failover routing	Active-passive configuration
DNS Made Easy	Failover A records	HTTP/S monitoring

DNS propagation delays (even with low TTL)
Client-side DNS caching behavior
Health check frequency and monitoring costs
False positive scenarios

While DNS supports backup NS (nameserver) records and MX (mail server) records with priority values, there's no native mechanism for A record failover in the DNS protocol itself. When you query for A records, DNS servers typically return all available records in random order, with no inherent priority system.

Here are several proven approaches to implement high availability for your services:

# Example DNS zone file showing multiple A records
example.com.    300 IN  A   192.0.2.1
example.com.    300 IN  A   192.0.2.2
example.com.    300 IN  A   192.0.2.3

Modern applications should implement their own failover logic when multiple IPs are returned:

// JavaScript implementation of client-side failover
async function fetchWithRetry(url, ips, options = {}) {
  for (const ip of ips) {
    try {
      const modifiedUrl = url.replace(/^https?:\/\//, http://${ip}/);
      const response = await fetch(modifiedUrl, {
        ...options,
        headers: { ...options.headers, Host: new URL(url).hostname }
      });
      return response;
    } catch (error) {
      console.log(Failed to connect to ${ip}, trying next...);
    }
  }
  throw new Error('All servers unavailable');
}

Some DNS providers offer custom solutions:

DNS Made Easy: Failover system that monitors servers
Amazon Route 53: Health checks and DNS failover
Cloudflare: Load balancing with health checks

For critical services, consider implementing a monitoring system that updates DNS records dynamically:

# Python example using Route 53 API
import boto3
from healthcheck import check_server

def update_dns_based_on_health():
    route53 = boto3.client('route53')
    healthy_ips = [ip for ip in ['192.0.2.1', '192.0.2.2'] if check_server(ip)]
    
    if healthy_ips:
        route53.change_resource_record_sets(
            HostedZoneId='Z1PA6795UKMFR9',
            ChangeBatch={
                'Changes': [{
                    'Action': 'UPSERT',
                    'ResourceRecordSet': {
                        'Name': 'example.com',
                        'Type': 'A',
                        'TTL': 60,
                        'ResourceRecords': [{'Value': ip} for ip in healthy_ips]
                    }
                }]
            }
        )

Use low TTL values (30-60 seconds) for dynamic records
Implement monitoring for both primary and backup servers
Consider geographic distribution of backup servers
Test failover procedures regularly
Document your failover strategy clearly

ServerDevWorker

DNS Failover Techniques: Implementing Backup A Records and Alternatives for High Availability

1. DNS TTL Optimization

2. Round-Robin DNS

3. Health-Check Based DNS

Related Articles