Debugging Local DNS Resolution: Why dig & nslookup Fail Without Explicit Server Specification


2 views

When setting up a private network with CentOS 6 as the gateway, I encountered a peculiar DNS behavior where local hostnames wouldn't resolve unless explicitly specifying the DNS server. Here's what I found:

# Non-working query
nslookup sun.beowulf.iecs
Server:     142.3.102.202
Address:    142.3.102.202#53
** server can't find sun.beowulf.iecs: NXDOMAIN

# Working query
nslookup sun.beowulf.iecs 192.168.42.1
Server:     192.168.42.1
Address:    192.168.42.1#53
Name:   sun.beowulf.iecs
Address: 192.168.42.1

The system follows this resolution order:

  1. Checks /etc/hosts
  2. Queries DNS servers in /etc/resolv.conf
  3. Falls back to other configured resolvers

The root cause appears when examining /etc/nsswitch.conf:

hosts:      files dns

Here's the working configuration that solved my issue:

# /etc/dnsmasq.conf
domain=beowulf.iecs
expand-hosts
local=/beowulf.iecs/
resolv-file=/etc/dnsmasq-resolv.conf
no-poll
no-resolv
server=8.8.8.8
server=8.8.4.4
strict-order
addn-hosts=/etc/hosts.dnsmasq
dhcp-host=00:1A:4B:XX:XX:XX,mercury,192.168.42.2

To prevent DHCP from overwriting resolv.conf:

# /etc/dhcp/dhclient.conf
supersede domain-name "beowulf.iecs";
supersede domain-search "beowulf.iecs", "biol.uregina.ca";
supersede domain-name-servers 127.0.0.1;
request subnet-mask, broadcast-address, routers;

Use these to test your configuration:

# Check resolution order
getent hosts mercury

# Test dnsmasq directly
dig @127.0.0.1 mercury.beowulf.iecs

# Verify DHCP settings
dhclient -v eth0

For better caching performance:

# /etc/nscd.conf
enable-cache            hosts           yes
positive-time-to-live   hosts           3600
negative-time-to-live   hosts           20
suggested-size          hosts           211
check-files             hosts           yes
persistent              hosts           yes
shared                  hosts           yes
max-db-size             hosts           33554432

When working with a private network setup where dnsmasq serves as both DHCP and DNS server, you might encounter situations where standard DNS lookup tools like dig and nslookup fail to resolve local hostnames unless explicitly told which DNS server to query.

# This works:
nslookup sun.beowulf.iecs sun.beowulf.iecs

# This fails:
nslookup sun.beowulf.iecs

The issue stems from how the system's DNS resolution is configured. In this setup:

# /etc/resolv.conf gets overwritten by DHCP:
nameserver 142.3.102.202
nameserver 142.3.100.15

# While dnsmasq is configured to use:
resolv-file=/etc/dnsmasq-resolv.conf
strict-order

The system is bypassing dnsmasq for DNS resolution, going directly to the university's DNS servers which obviously don't know about your private hosts.

To fix this, we need to ensure all DNS queries first go through dnsmasq. Here's how:

# First, prevent DHCP from overwriting /etc/resolv.conf
# Edit /etc/sysconfig/network-scripts/ifcfg-eth0 and add:
PEERDNS=no

# Then modify /etc/resolv.conf to point to localhost:
nameserver 127.0.0.1
options edns0

Update your dnsmasq configuration to handle both local and external queries:

# /etc/dnsmasq.conf
domain=beowulf.iecs
expand-hosts
local=/beowulf.iecs/

# DHCP configuration
dhcp-range=192.168.42.10,192.168.42.254,12h
dhcp-host=mercury,192.168.42.2

# DNS configuration
server=/beowulf.iecs/127.0.0.1
server=8.8.8.8
server=8.8.4.4

After restarting dnsmasq (service dnsmasq restart), test with:

# Should now work without specifying server
dig sun.beowulf.iecs
nslookup mercury

# Check the query path
dig +trace sun.beowulf.iecs

If issues persist, consider these checks:

# Verify name resolution order
cat /etc/nsswitch.conf
# Should have:
hosts:      files dns

# Check DNS cache
systemctl status nscd

Remember that changes to network configuration might require a full system reboot to take effect completely.