Ubuntu DNS Resolution Failure: Why resolv.conf Doesn’t Fall Back to Secondary Nameservers


2 views

When working with multiple DNS nameservers in Ubuntu (particularly older versions like 10.04), you might encounter a frustrating behavior where the system fails to query subsequent nameservers when the primary one doesn't respond with a positive answer. This manifests when:

# Primary NS responds with NXDOMAIN for private zone
$ host host.private.example.org 10.0.0.20
Host host.private.example.org not found: 3(NXDOMAIN)

# Secondary NS has the record but never gets queried
$ host host.private.example.org 10.0.0.30
host.private.example.org has address 10.0.0.60

The issue stems from how the GNU C Library (glibc) resolver handles NXDOMAIN responses. When the first nameserver returns NXDOMAIN (non-existent domain), glibc treats this as a definitive answer and stops querying other nameservers. This differs from SERVFAIL scenarios where it would properly fail over.

Key points about this behavior:

  • Affects all glibc-based applications (ping, host, Thunderbird, etc.)
  • Network Manager merely generates the resolv.conf file
  • More noticeable with split DNS configurations

Option 1: Use a Local DNS Caching Resolver

Configure a local resolver like dnsmasq or unbound that can properly handle multiple upstream servers:

# Install dnsmasq
sudo apt-get install dnsmasq

# Configure /etc/dnsmasq.conf
server=/public.example.org/10.0.0.20
server=/private.example.org/10.0.0.30

Option 2: Modify Resolver Timeout Settings

Adjust timeout and retry values in /etc/resolv.conf (requires disabling NetworkManager overwrites):

options timeout:1 attempts:3 rotate
nameserver 10.0.0.20
nameserver 10.0.0.30

Option 3: Conditional Forwarding with BIND

For advanced setups, configure a local BIND instance:

zone "public.example.org" {
    type forward;
    forwarders { 10.0.0.20; };
};

zone "private.example.org" {
    type forward;
    forwarders { 10.0.0.30; };
};

For newer Ubuntu versions (17.10+), systemd-resolved offers better handling:

# Configure with resolvectl
resolvectl dns example.org 10.0.0.20 10.0.0.30
resolvectl domain example.org ~public.example.org ~private.example.org

Remember that these solutions may require adjusting your firewall rules to allow DNS traffic between your local resolver and the upstream servers.


In Ubuntu 10.04 with NetworkManager, I've encountered a peculiar DNS resolution behavior where the system refuses to query the secondary nameserver when the primary fails to resolve a domain. Here's the exact symptom:

# Current resolv.conf configuration
search example.org
nameserver 10.0.0.20  # public nameserver (public.example.org)
nameserver 10.0.0.30  # private nameserver (private.example.org)

The resolution works asymmetrically:

# Works when querying public domain (primary NS)
$ ping host.public.example.org
PING host.public.example.org (10.0.0.50) 56(84) bytes of data.

# Fails when querying private domain (secondary NS)
$ ping host.private.example.org
ping: unknown host host.private.example.org

# But dig confirms the record exists
$ dig @10.0.0.30 host.private.example.org
;; ANSWER SECTION:
host.private.example.org. 3600 IN A 10.0.0.60

The issue stems from how NetworkManager (v0.8) manages resolv.conf:

  • By default, it implements a "strict-order" resolution policy
  • Timeout handling between nameservers is problematic (default 5s timeout per attempt)
  • NXDOMAIN responses may be cached aggressively

Option 1: Modify NetworkManager Configuration

# Edit /etc/NetworkManager/NetworkManager.conf
[main]
dns=default
rc-manager=resolvconf

Option 2: Use resolvconf with Custom Settings

# Install resolvconf if not present
sudo apt-get install resolvconf

# Configure custom options
echo "options timeout:1 attempts:2 rotate" | sudo tee /etc/resolvconf/resolv.conf.d/head
sudo service resolvconf restart

Option 3: Manual resolv.conf Management

# Make resolv.conf immutable to prevent NetworkManager overwrites
sudo chattr +i /etc/resolv.conf

# Sample optimized resolv.conf
nameserver 10.0.0.20
nameserver 10.0.0.30
options timeout:1 attempts:2 rotate

Use this Python script to test fallback behavior:

import socket
import dns.resolver

resolver = dns.resolver.Resolver()
resolver.nameservers = ['10.0.0.20', '10.0.0.30']
resolver.lifetime = 2  # timeout in seconds

try:
    answers = resolver.resolve('host.private.example.org', 'A')
    for rdata in answers:
        print(rdata.address)
except dns.resolver.NXDOMAIN:
    print("NXDOMAIN received")
except dns.resolver.NoAnswer:
    print("No answer received")
except dns.resolver.Timeout:
    print("All nameservers timed out")

For production environments, consider:

  • Setting up a local caching resolver (dnsmasq/unbound)
  • Implementing DNS views in BIND to merge zones
  • Upgrading to newer Ubuntu versions with improved NetworkManager