DNS CNAME Resolution: Should Resolver Retry or Server Follow Chain? A Deep Dive into A/CNAME Lookup Behavior


4 views

When troubleshooting CNAME resolution in our internal network (domain1), I observed an interesting behavior pattern. The resolver initially sends an A record request:

// Sample DNS query capture
16:15:45.837525 IP (tos 0x0, ttl 64, id 36911, offset 0, flags [none], proto UDP (17), length 62)
myhost.domain1.40684 > dnsserver.domain1.domain: 15355+ A? cfengine.domain1. (34)

But receives a ServFail response when the target is actually a CNAME record. This raises fundamental questions about DNS resolution responsibility.

In mixed-mode DNS servers (authoritative + recursive), the behavior depends on:

  • Whether the RD (Recursion Desired) flag is set
  • If the query is for an authoritative zone
  • Server implementation (BIND, PowerDNS, etc.)

A proper functioning server should:

1. Check if query is for authoritative zone
2. For authoritative queries with RD=0:
   - Return CNAME directly if exists
   - Return NXDOMAIN if no record
3. For recursive queries (RD=1):
   - Follow CNAME chain automatically
   - Return final A record or error

To diagnose this properly, use these dig commands:

# Basic A record lookup (may fail)
dig @dnsserver cfengine.domain1 A +norecurse

# Explicit CNAME query
dig @dnsserver cfengine.domain1 CNAME +norecurse

# Full recursive resolution
dig @dnsserver cfengine.domain1 A +recurse

The key difference is the +norecurse/+recurse flag which controls RD bit setting.

When dealing with combined authoritative/recursive servers:

# BIND named.conf example for proper CNAME handling
zone "domain1" {
    type master;
    file "db.domain1";
    allow-recursion { any; };
};

options {
    recursion yes;
    cname-auto-alias yes;  # Modern BIND feature
};

For client-side workarounds:

# Resolver configuration (CentOS/RHEL)
options timeout:1 attempts:2
search domain1
nameserver dnsserver.domain1

A properly configured DNS server should:

  • Detect CNAME records during A record queries
  • Automatically follow the chain for recursive queries
  • Return additional section data when possible (per RFC 1034)

The ServFail response suggests either:

  1. Server misconfiguration for authoritative zone handling
  2. CNAME chain pointing to unreachable domains
  3. DNSSEC validation failures (if enabled)

When troubleshooting CNAME resolution, we need to understand the exact workflow:

Resolver → Query for A record → Authoritative Server
     ↑                                |
     |                                ↓
     +---- CNAME response ←----- Check zone data

From your packet capture, we see the critical failure pattern:

# Failed A record query
myhost.domain1.40684 > dnsserver.domain1.domain: 15355+ A? cfengine.domain1. (34)
dnsserver.domain1.domain > myhost.domain1.40684: 15355 ServFail 0/0/0 (34)

But manual CNAME query succeeds:

dig CNAME cfengine.domain1
;; ANSWER SECTION:
cfengine.domain1.   3600    IN  CNAME   helm02.domain2.

The server's dual role creates interesting edge cases. When authoritative for domain1:

  1. For A record queries: Should return CNAME if exists (RFC 1034 Section 3.6.2)
  2. For CNAME queries: Should return just the CNAME record

Per RFC 1034, authoritative servers must:

if (query.type == A && zone.has(CNAME)):
    return CNAME + follow chain
elif (query.type == CNAME):
    return CNAME only

For your mixed-environment scenario:

# Force CNAME resolution then A lookup
dig CNAME cfengine.domain1 +short | xargs dig A

Or configure resolver to handle chaining:

# /etc/resolv.conf options
options rotate
options attempts:2
options no-tld-query

Verify your DNS server can properly handle authoritative CNAMEs:

# BIND check
named-checkzone domain1 /var/named/domain1.zone
# PowerDNS debug
pdns_server --daemon=no --loglevel=10

Here's what successful resolution should look like:

;; QUESTION SECTION:
;cfengine.domain1.       IN  A

;; ANSWER SECTION:
cfengine.domain1. 3600   IN  CNAME   helm02.domain2.
helm02.domain2.    300   IN  A       192.168.1.10

Systematic troubleshooting steps:

  1. Verify zone file syntax
  2. Check server logs for SERVFAIL reasons
  3. Test with +norecurse to isolate authoritative behavior
  4. Validate DNSSEC status if enabled