Debugging SSH Connection Stuck at “expecting SSH2_MSG_KEX_DH_GEX_GROUP”: Causes and Solutions for Key Exchange Failures


4 views

When attempting SSH connections between specific servers, the handshake process gets stuck at the Diffie-Hellman Group Exchange (DH-GEX) phase. The debug output shows:

debug1: SSH2_MSG_KEX_DH_GEX_REQUEST sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP

Notably, this occurs in a specific asymmetric pattern where:

  • Servers A and B can mutually SSH to each other
  • Servers A/B cannot SSH to Server C
  • Server C can SSH to A/B
  • Network connectivity (ping/traceroute) works bidirectionally

From netstat -a output, we see TCP connections reaching ESTABLISHED state but the SSH protocol negotiation fails at the cryptographic handshake phase. This suggests:

  1. The TCP layer is functioning correctly
  2. Firewalls aren't blocking the connection
  3. The failure occurs during the SSH protocol negotiation

The SSH2_MSG_KEX_DH_GEX_GROUP message is part of the Diffie-Hellman Group Exchange key agreement protocol. The hanging indicates one of several potential issues:

# Common causes:
1. Mismatched cryptographic policies between client/server
2. Incompatible OpenSSH versions
3. Network devices interfering with large DH packets
4. System entropy shortage during key generation
5. SELinux/AppArmor policies blocking cryptographic operations

1. Verify SSH versions and configurations

# On all servers:
ssh -V
cat /etc/ssh/sshd_config | grep -i kex

# Expected output should show compatible versions and algorithms

2. Test with minimal cryptographic configuration

# On client (A/B), attempt connection with:
ssh -oKexAlgorithms=diffie-hellman-group14-sha256 \
    -oCiphers=aes256-ctr \
    -oMACs=hmac-sha2-256 \
    user@serverC

3. Packet capture analysis

# On server C:
tcpdump -i eth0 'port 22' -w ssh_debug.pcap

# Filter for DH-GEX packets in Wireshark:
ssh2.msg_code == 31 || ssh2.msg_code == 32

Solution 1: Update cryptographic policies

# On server C's /etc/ssh/sshd_config:
KexAlgorithms curve25519-sha256,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com

# Then restart sshd:
systemctl restart sshd

Solution 2: Bypass network middleboxes

# Use smaller packet sizes:
ssh -oKexAlgorithms=ecdh-sha2-nistp256 \
    -oCompression=no \
    user@serverC

Solution 3: System-level checks

# Verify entropy pool:
cat /proc/sys/kernel/random/entropy_avail

# Check security policies:
audit2allow -a # For SELinux
aa-status # For AppArmor

The most robust solution is to standardize SSH configurations across all servers:

# Example standardized configuration (/etc/ssh/ssh_config.d/99-common.conf):
Host *
    KexAlgorithms curve25519-sha256@libssh.org
    HostKeyAlgorithms ssh-ed25519-cert-v01@openssh.com
    Ciphers chacha20-poly1305@openssh.com
    MACs umac-64-etm@openssh.com
    Protocol 2

In a multi-server environment, Servers A and B can SSH into each other and other machines, but fail when attempting to connect to Server C. Interestingly, Server C can SSH into both A and B. The connection hangs at:

debug1: SSH2_MSG_KEX_DH_GEX_REQUEST sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP

Network diagnostics (ping, traceroute) work bidirectionally, and netstat -a shows an established TCP connection that never completes the SSH handshake.

  • The failure occurs during the key exchange phase of SSH protocol
  • TCP connection establishes successfully (visible in netstat)
  • Asymmetric connectivity pattern (C → A/B works, but not vice versa)
  • No firewall blocks detected (since TCP handshake completes)

Based on similar cases reported in OpenSSH bug trackers and Stack Overflow threads, these are the most likely culprits:

  1. Mismatched DH Group parameters: Server C might be using non-standard or very large DH groups
  2. SSH version incompatibility: Different OpenSSH versions negotiating incompatible key exchange methods
  3. Resource exhaustion: Server C failing to generate DH parameters due to system constraints
  4. Custom SSH configurations: Non-standard sshd_config settings on Server C

First, gather more verbose debugging output from both client and server:

ssh -vvv user@serverC

On Server C, check the SSH daemon logs (location varies by OS):

sudo tail -f /var/log/auth.log
# or
sudo journalctl -u sshd -f

1. Force Specific Key Exchange Methods

Try explicitly specifying key exchange algorithms in ~/.ssh/config:

Host serverC
    KexAlgorithms diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256
    Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com

2. Modify Server Configuration

On Server C, edit /etc/ssh/sshd_config:

# Use more compatible DH group sizes
KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256
Subsystem sftp /usr/lib/openssh/sftp-server

# Then restart sshd
sudo systemctl restart sshd

3. Check System Resources

DH parameter generation requires sufficient entropy and CPU resources. Check:

cat /proc/sys/kernel/random/entropy_avail  # Should be > 1000
sudo apt install haveged  # For entropy generation if low

If the issue persists, capture network traffic for analysis:

sudo tcpdump -i eth0 -w ssh_debug.pcap port 22

Then analyze with Wireshark, looking specifically at the SSH protocol negotiation phase.

  • Standardize OpenSSH versions across servers
  • Maintain consistent sshd_config settings
  • Monitor system entropy pools in production environments
  • Consider using more modern key exchange methods like curve25519