How to Diagnose DNS Delegation Path Failures: A Traceroute Approach for DNS Lookups


2 views

When troubleshooting DNS resolution failures, the first step is to verify basic connectivity to your configured nameservers. As shown in the example:

$ host www.foo.com
;; connection timed out; no servers could be reached

$ for ns in grep -o '$[0-9]\+[.]$\{3\}[0-9]\+$' /etc/resolv.conf; do ping -qc 1 $ns; done
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
--- 192.168.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss

The ping tests confirm nameserver reachability, suggesting the issue lies somewhere in the DNS delegation path.

DNS resolution follows a hierarchical delegation model:
1. Root servers (.)
2. TLD servers (.com)
3. Authoritative servers (foo.com)
4. Potential subdomain delegation

The dig command with +trace flag provides complete visibility into the resolution chain:

$ dig +trace www.foo.com

; <<>> DiG 9.16.1 <<>> +trace www.foo.com
;; global options: +cmd
.			518400	IN	NS	a.root-servers.net.
.			518400	IN	NS	b.root-servers.net.
;; Received 525 bytes from 192.168.1.1#53(192.168.1.1) in 10 ms

com.			172800	IN	NS	a.gtld-servers.net.
com.			172800	IN	NS	b.gtld-servers.net.
;; Received 1172 bytes from 198.41.0.4#53(a.root-servers.net) in 24 ms

foo.com.		172800	IN	NS	ns1.foo.com.
foo.com.		172800	IN	NS	ns2.foo.com.
;; Received 840 bytes from 192.5.6.30#53(a.gtld-servers.net) in 52 ms

www.foo.com.		3600	IN	A	203.0.113.45
;; Received 64 bytes from 203.0.113.1#53(ns1.foo.com) in 105 ms

For programmatic inspection, this Python script captures the delegation path:

import dns.resolver

def trace_dns(domain):
    resolver = dns.resolver.Resolver()
    resolver.use_edns(0, dns.flags.DO, 1232)
    
    try:
        answers = resolver.resolve(domain, 'A', raise_on_no_answer=False)
        if answers.rrset:
            print(f"Direct resolution: {answers[0].address}")
            return
            
        # Follow delegation
        current = domain.split('.')
        while len(current) > 0:
            ns_query = '.'.join(current)
            try:
                ns_records = resolver.resolve(ns_query, 'NS')
                for ns in ns_records:
                    try:
                        ip = resolver.resolve(str(ns.target), 'A')
                        print(f"Checking {ns_query} via {ns.target} ({ip[0]})")
                        answers = resolver.resolve(domain, 'A', 
                                                  nameserver=[str(ip[0])],
                                                  raise_on_no_answer=False)
                        if answers.rrset:
                            print(f"Resolved via {ns.target}: {answers[0].address}")
                            return
                    except Exception as e:
                        print(f"Failed at {ns.target}: {str(e)}")
            except:
                pass
            current.pop(0)
            
    except Exception as e:
        print(f"Resolution failed: {str(e)}")

trace_dns("www.foo.com")

When dig +trace isn't available, consider these alternatives:

  • drill -T (from ldns-tools)
  • host -v -t ns foo.com followed by manual queries
  • Online tools like DNSViz or IntoDNS
  1. Lame delegation: Parent zone points to incorrect nameservers
  2. Missing glue records: No A records for nameservers in parent zone
  3. Timeout chains: Intermediate nameservers not responding
  4. DNSSEC validation failures: Broken signature chains

When DNS debugging suggests a network issue:

$ mtr -rwzc 10 --udp -P 53 ns1.foo.com
Start: 2023-01-01T00:00:00+0000
HOST: example                  Loss%   Snt   Last   Avg  Best  Wrst StDev
 1.|-- router.local            0.0%    10    1.2   1.5   1.0   3.0   0.6
 2.|-- 203.0.113.1            40.0%    10   12.1  15.2  10.1  22.3   4.2

When troubleshooting DNS resolution failures where local nameservers are reachable but ultimate resolution fails, we need to examine the delegation path. The process resembles IP routing, but operates at the DNS hierarchy level.

# Using dig with +trace to follow the delegation path
dig +trace www.foo.com

# Sample output for educational purposes
; <<>> DiG 9.16.1-Ubuntu <<>> +trace www.foo.com
;; global options: +cmd
.                       3600    IN      NS      a.root-servers.net.
.                       3600    IN      NS      b.root-servers.net.
;; Received 525 bytes from 192.168.1.1#53(192.168.1.1) in 10 ms

com.                    172800  IN      NS      a.gtld-servers.net.
com.                    172800  IN      NS      b.gtld-servers.net.
;; Received 1175 bytes from 198.41.0.4#53(a.root-servers.net) in 54 ms

foo.com.                172800  IN      NS      ns1.foo.com.
foo.com.                172800  IN      NS      ns2.foo.com.
;; Received 775 bytes from 192.5.6.30#53(a.gtld-servers.net) in 120 ms

When +trace doesn't work due to firewall restrictions, try these methods:

# Manual walk through DNS hierarchy
dig @a.root-servers.net www.foo.com
dig @a.gtld-servers.net www.foo.com
dig @ns1.foo.com www.foo.com

# Checking specific DNS server response
delv @8.8.8.8 www.foo.com
host -v -t any www.foo.com 1.1.1.1

Typical issues in delegation chains include:

  • Expired or incorrect glue records at TLD servers
  • Authoritative nameservers not responding to queries
  • Missing reverse DNS for nameserver IPs
  • Network filtering of DNS traffic (port 53)
# Checking DNSSEC validation chain
dig +dnssec www.foo.com
delv +vtrace www.foo.com

# Testing TCP fallback (when UDP fails)
dig +tcp www.foo.com
host -T www.foo.com

Remember that some DNS servers may intentionally block trace queries to prevent DNS amplification attacks. In such cases, manual stepping through the hierarchy as shown above becomes necessary.