DNS Migration Best Practices: When to Update Records After TTL Reduction from 24h to 5min


1 views

When migrating services between servers, DNS TTL (Time-To-Live) becomes critical infrastructure. The 24-hour TTL you initially set means recursive resolvers worldwide cached your records for that duration. By reducing it to 300 seconds (5 minutes), you're instructing resolvers to refresh more frequently - but existing caches still honor the original TTL until they expire.

For clean DNS handoff:

Original state:   TTL=86400 (24h)
Update 1:         TTL=300 (5min) at 00:00 UTC
Wait period:      24 hours (full original TTL)
Update 2:         Change IP records at 24:00 UTC
Propagation:      Global updates within 5 minutes

If immediate migration is necessary:

  • Implement a blue-green deployment with load balancing
  • Use DNS failover services like Amazon Route 53 health checks
  • Maintain session stickiness during transition with cookies
# Example AWS CLI command for immediate failover
aws route53 change-resource-record-sets \
--hosted-zone-id Z1PA6795UKMFR9 \
--change-batch file://new_records.json

Check global DNS propagation using:

dig +short example.com @8.8.8.8       # Google DNS
dig +short example.com @1.1.1.1       # Cloudflare
nslookup example.com 208.67.222.222   # OpenDNS
  1. Set TTL reduction at least 24h before migration
  2. Configure monitoring for both old and new endpoints
  3. Prepare rollback procedure (reverting IP changes)
  4. Update SPF/DKIM records if handling email

When migrating services between servers, DNS Time-To-Live (TTL) becomes critical. The scenario you described is classic - reducing TTL from 24 hours to 5 minutes (300 seconds) before a migration window. Here's what happens at each stage:

# Example dig command showing TTL value
$ dig example.com +noall +answer
example.com.      86400   IN      A       192.0.2.1

Technically, you don't need to wait the full 24 hours after changing the TTL. However, you should consider:

  • Clients and DNS resolvers that cached the old record with 24h TTL
  • Intermediate DNS servers that may ignore your new TTL
  • ISPs that implement longer caching than specified

Here's how I handle such migrations:

# Migration timeline example
1. T-48h: Reduce TTL to 5 minutes
2. T-0h: 
   - Take application offline
   - Perform data sync
   - Update DNS records
3. T+5m: New server should receive traffic
4. T+24h: Full propagation expected

Use these methods to verify propagation:

# Using dig to check authoritative servers
$ dig @ns1.yourprovider.com example.com +short

# Checking local resolver cache
$ dig example.com +nocmd +noall +answer +ttlunits

# Global DNS check (using Google's DNS)
$ dig @8.8.8.8 example.com +short

For critical migrations, consider these additional measures:

  • Implement a health check endpoint on both old and new servers
  • Use DNS weight balancing during transition
  • Prepare rollback procedures in case of issues

Here's how I handled a recent migration for a Python web app:

# Health check endpoint example
@app.route('/health')
def health_check():
    return jsonify({
        'status': 'healthy',
        'server': 'new-prod-01',
        'timestamp': datetime.utcnow().isoformat()
    }), 200