While DNS has specialized record types like MX (mail exchange) and NS (name server) that support priority-based failover through preference values (e.g., MX 10 and MX 20), standard A records don't natively include this functionality. This creates challenges when trying to implement primary-backup server architectures at the DNS level.
For specific services, DNS does provide priority mechanisms:
; Mail server example with priorities
example.com. IN MX 10 mail1.example.com.
example.com. IN MX 20 mail2.example.com.
; Nameserver example
example.com. IN NS ns1.example.com.
example.com. IN NS ns2.example.com.
When you need backup A records, consider these approaches:
1. DNS TTL Optimization
Reduce TTL (Time To Live) to allow rapid DNS changes during outages:
example.com. 300 IN A 192.0.2.1 ; 5 minute TTL
2. Round-Robin DNS
List multiple IPs and let clients choose:
example.com. IN A 192.0.2.1
example.com. IN A 192.0.2.2
3. Health-Check Based DNS
Implement dynamic DNS updates with health checks using tools like:
- AWS Route 53 failover
- Azure Traffic Manager
- PowerDNS with health check scripts
Here's a Python script that monitors server health and updates DNS records via API:
import requests
import dns.update
import dns.query
def check_server_health(ip):
try:
response = requests.get(f"http://{ip}/health", timeout=2)
return response.status_code == 200
except:
return False
def update_dns_record(primary_ip, backup_ip):
update = dns.update.Update('example.com')
update.replace('@', 300, 'A', backup_ip if not check_server_health(primary_ip) else primary_ip)
dns.query.tcp(update, 'ns1.example.com')
Provider | Feature | Implementation |
---|---|---|
Cloudflare | Load balancing | Health checks + failover |
AWS Route 53 | Failover routing | Active-passive configuration |
DNS Made Easy | Failover A records | HTTP/S monitoring |
- DNS propagation delays (even with low TTL)
- Client-side DNS caching behavior
- Health check frequency and monitoring costs
- False positive scenarios
While DNS supports backup NS (nameserver) records and MX (mail server) records with priority values, there's no native mechanism for A record failover in the DNS protocol itself. When you query for A records, DNS servers typically return all available records in random order, with no inherent priority system.
Here are several proven approaches to implement high availability for your services:
# Example DNS zone file showing multiple A records
example.com. 300 IN A 192.0.2.1
example.com. 300 IN A 192.0.2.2
example.com. 300 IN A 192.0.2.3
Modern applications should implement their own failover logic when multiple IPs are returned:
// JavaScript implementation of client-side failover
async function fetchWithRetry(url, ips, options = {}) {
for (const ip of ips) {
try {
const modifiedUrl = url.replace(/^https?:\/\//, http://${ip}/);
const response = await fetch(modifiedUrl, {
...options,
headers: { ...options.headers, Host: new URL(url).hostname }
});
return response;
} catch (error) {
console.log(Failed to connect to ${ip}, trying next...);
}
}
throw new Error('All servers unavailable');
}
Some DNS providers offer custom solutions:
- DNS Made Easy: Failover system that monitors servers
- Amazon Route 53: Health checks and DNS failover
- Cloudflare: Load balancing with health checks
For critical services, consider implementing a monitoring system that updates DNS records dynamically:
# Python example using Route 53 API
import boto3
from healthcheck import check_server
def update_dns_based_on_health():
route53 = boto3.client('route53')
healthy_ips = [ip for ip in ['192.0.2.1', '192.0.2.2'] if check_server(ip)]
if healthy_ips:
route53.change_resource_record_sets(
HostedZoneId='Z1PA6795UKMFR9',
ChangeBatch={
'Changes': [{
'Action': 'UPSERT',
'ResourceRecordSet': {
'Name': 'example.com',
'Type': 'A',
'TTL': 60,
'ResourceRecords': [{'Value': ip} for ip in healthy_ips]
}
}]
}
)
- Use low TTL values (30-60 seconds) for dynamic records
- Implement monitoring for both primary and backup servers
- Consider geographic distribution of backup servers
- Test failover procedures regularly
- Document your failover strategy clearly