During our infrastructure migration from Solaris to Linux, we encountered a peculiar networking issue where:
ping server.idmz.example.com
resolves successfullydig server.idmz.example.com
returns correct records- But
ssh server.idmz.example.com
fails with "Could not resolve hostname"
First, let's verify the DNS resolution chain:
# Check DNS resolution order
$ cat /etc/nsswitch.conf | grep hosts
hosts: files dns
# Verify DNS servers
$ cat /etc/resolv.conf
search example.org
nameserver 192.168.1.1
nameserver 192.168.1.2
One common culprit in such cases is IPv6 resolution. SSH might be attempting IPv6 lookups even when IPv4 works:
# Try forcing IPv4
$ ssh -4 server.idmz.example.com
# Or disable IPv6 in SSH config
echo "AddressFamily inet" >> ~/.ssh/config
Sometimes GSSAPI authentication can cause resolution failures:
# Check current SSH configuration
$ ssh -G server.idmz.example.com
# Disable GSSAPI if needed
$ ssh -o GSSAPIAuthentication=no server.idmz.example.com
SSH has stricter timeout settings than ping. Try adjusting them:
# Increase DNS resolution timeout
$ ssh -o ConnectTimeout=30 -o ConnectionAttempts=5 server.idmz.example.com
After extensive testing, we found the most reliable solution was to modify the SSH client configuration:
# /etc/ssh/ssh_config or ~/.ssh/config
Host *.idmz.example.com
AddressFamily inet
GSSAPIAuthentication no
CheckHostIP no
ConnectTimeout 20
StrictHostKeyChecking no
To confirm everything works as expected:
$ ssh -v server.idmz.example.com
[...]
debug1: Connecting to server.idmz.example.com [192.168.1.3] port 22.
debug1: Connection established.
When migrating from Solaris to Linux jump hosts, we encountered a peculiar case where standard networking tools (ping
, dig
) could resolve hostnames in idmz.example.com
while SSH clients failed with "Could not resolve hostname". This manifested specifically:
$ dig +short server.idmz.example.com
192.168.1.3
$ ssh -v server.idmz.example.com
OpenSSH_8.9p1, OpenSSL 3.0.7 1 Nov 2022
debug1: Connecting to server.idmz.example.com [192.168.1.3] port 22.
Connection timed out during DNS resolution
Key observations from packet captures (tcpdump -n -i any port 53
):
- SSH performs sequential DNS queries (A/AAAA records) without respecting TTLs
- The resolver falls back to non-authoritative NS records when authoritative lookups fail
- DMZ subdomains with working SSH show proper SOA record propagation
# Compare working vs broken domains:
$ dig +nocmd +noall +answer SOA jdmz.example.com
jdmz.example.com. 3600 IN SOA ns1.jdmz.example.com. admin.example.com. 2023121401 ...
$ dig +nocmd +noall +answer SOA idmz.example.com
;; No SOA records returned
The OpenSSH client implements custom hostname resolution logic that differs from glibc:
- Prioritizes IPv6 (AAAA) queries even when IPv4 is requested
- Implements strict timeout thresholds (5s default)
- Doesn't honor
options edns0
in/etc/resolv.conf
Workaround configuration for /etc/ssh/ssh_config
:
Host *.idmz.example.com
AddressFamily inet
ConnectTimeout 15
CheckHostIP no
GSSAPIAuthentication no
In environments with strict DNSSEC validation (common in DMZs), missing DS records cause resolution failures. Diagnostic steps:
$ delv server.idmz.example.com
;; resolution failed: broken trust chain
# Temporary bypass (not recommended for production):
$ sudo sysctl -w net.dns.resolver.options=edns0:0
For enterprise environments, implement either:
# Option 1: Local host override
echo "192.168.1.3 server.idmz.example.com" | sudo tee -a /etc/hosts
# Option 2: Custom resolv.conf with search domains
cat << EOF | sudo tee /etc/resolv.conf.d/dmz.conf
search idmz.example.com example.com
nameserver 192.168.1.1
options timeout:2 attempts:1
EOF
For Ansible-managed environments:
- name: Configure DNS overrides
blockinfile:
path: /etc/hosts
block: |
{% for server in dmz_servers %}
{{ server.ip }} {{ server.fqdn }}
{% endfor %}