Optimizing DNS Failover Behavior in Linux: Reducing Query Latency When Primary Nameserver Fails


1 views

When working with the standard Linux /etc/resolv.conf configuration like:

nameserver 192.168.1.1
nameserver 8.8.8.8

The glibc resolver follows a specific algorithm where it:

  • Sends queries to the first nameserver
  • Waits for timeout (typically 5 seconds) before trying the next
  • Repeats this for each query, even if previous attempts failed

The main problems with this approach are:

# Example showing problematic behavior
$ time dig example.com @192.168.1.1
;; connection timed out; no servers could be reached

real    0m5.010s

Each failed attempt introduces significant latency before failover occurs.

1. Using resolvconf with Dynamic Configuration

Install and configure resolvconf:

sudo apt install resolvconf  # Debian/Ubuntu
sudo systemctl enable resolvconf

Then modify /etc/resolvconf/resolv.conf.d/head:

options timeout:1 attempts:1 rotate
nameserver 192.168.1.1
nameserver 8.8.8.8
nameserver 1.1.1.1

2. Implementing dnsmasq as Local Caching Resolver

Install and configure dnsmasq:

sudo apt install dnsmasq
sudo nano /etc/dnsmasq.conf

Add these configuration options:

no-resolv
server=192.168.1.1
server=8.8.8.8
server=1.1.1.1
all-servers

3. Using systemd-resolved

For systems with systemd:

sudo systemctl enable systemd-resolved
sudo systemctl start systemd-resolved

Configure DNS servers:

sudo resolvectl dns eth0 192.168.1.1 8.8.8.8
sudo resolvectl options eth0 use-vc timeout:1 attempts:1

Verify your DNS resolution behavior:

dig +stats example.com
dnstraceroute example.com

Look for reduced timeout periods and proper failover behavior in the output.

For maximum control, create a script to monitor DNS health:

#!/bin/bash
PRIMARY_DNS="192.168.1.1"
SECONDARY_DNS="8.8.8.8"

if ! dig +short +time=1 +tries=1 example.com @$PRIMARY_DNS > /dev/null; then
    sudo resolvconf -a eth0 <

In most Linux distributions, the default behavior when using multiple nameservers in /etc/resolv.conf can lead to suboptimal performance during primary DNS server failures. The standard configuration:

nameserver 123.123.123.123
nameserver 8.8.8.8

follows a simple sequential approach where the system:

  • Tries the first nameserver
  • Waits for timeout (typically 5 seconds)
  • Only then attempts the next server in the list

This behavior means that when your primary DNS server (123.123.123.123) becomes unavailable, every DNS query will experience:

  • A 5-second delay (default timeout)
  • Potential cascading timeouts across applications
  • Degraded user experience until the system fails over

Here are several approaches to improve this behavior:

1. Using the 'options' Directive

Add timeout and attempts parameters to /etc/resolv.conf:

options timeout:1 attempts:2
nameserver 123.123.123.123
nameserver 8.8.8.8

This configuration:

  • Reduces timeout to 1 second
  • Only makes 2 attempts per server
  • Speeds up failover to the backup server

2. Implementing a Local DNS Cache

Install and configure systemd-resolved or dnsmasq:

# For systemd-resolved
sudo systemctl enable systemd-resolved
sudo systemctl start systemd-resolved
sudo ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf

3. Using NetworkManager for Dynamic DNS Handling

Configure NetworkManager to manage DNS settings:

# Edit /etc/NetworkManager/NetworkManager.conf
[main]
dns=systemd-resolved

For more sophisticated setups, consider these approaches:

#!/bin/bash
# Simple DNS health check script
PRIMARY_DNS="123.123.123.123"
BACKUP_DNS="8.8.8.8"

if ! dig +time=1 +tries=1 @$PRIMARY_DNS example.com >/dev/null; then
    echo "Primary DNS down, switching to backup"
    sed -i "s/nameserver .*/nameserver $BACKUP_DNS/" /etc/resolv.conf
fi
  • Always have at least 3 nameservers configured
  • Use different DNS providers for redundancy
  • Consider implementing monitoring for DNS resolution times
  • For critical systems, use local DNS caching servers