Debugging SSH “Write failed: Broken pipe” After Key Authentication on Ubuntu Servers


6 views

Many sysadmins have encountered this frustrating scenario: your SSH key authentication succeeds, the server logs show a successful login, but then Write failed: Broken pipe appears and the connection terminates. Let's dissect this issue through multiple technical angles.

The TCP dump reveals critical insights about the connection flow:

19:00:41.211348 IP [server].ssh > [client]: Flags [S.], seq 4135716624, ack 3430788633
19:01:34.714519 IP [client] > [server].ssh: Flags [P.], seq 2702:3162, ack 2790 (retransmission)

Notice the 30-second gap between packets before retransmission attempts begin. This suggests either:

  • Network path MTU issues
  • Stateful firewall interference
  • TCP window sizing problems

First, verify these critical SSH server settings in /etc/ssh/sshd_config:

# Example of crucial parameters
TCPKeepAlive yes
ClientAliveInterval 30
ClientAliveCountMax 5
LoginGraceTime 2m
AllowTcpForwarding yes

The auth.log shows an interesting warning:

error: Could not load host key: /etc/ssh/ssh_host_ed25519_key

Generate missing host keys with:

sudo ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key
sudo systemctl restart sshd

Different networks may require MTU adjustments. Try this diagnostic sequence:

# Find optimal MTU (reduce by 10 until ping works)
ping -M do -s 1472 -c 3 your.server.ip

# Set temporarily for testing:
sudo ifconfig en0 mtu 1400

# For permanent change (MacOS):
sudo networksetup -setMTU en0 1400

Create a detailed connection profile using:

ssh -vvv -o ConnectTimeout=30 -o ConnectionAttempts=3 \
    -o ServerAliveInterval=15 -o ServerAliveCountMax=3 \
    user@server.example.com

Key timeout parameters to experiment with:

  • IPQoS throughput/interactive
  • ConnectTimeout (default 30s)
  • ServerAliveInterval (recommended 15-30)

When standard SSH fails, try these fallbacks:

# Try different cipher suites
ssh -c aes128-ctr user@host

# Use alternative authentication:
ssh -o PreferredAuthentications=keyboard-interactive user@host

# Test through jump host:
ssh -J gateway.example.com target.example.com

Check for silent packet drops using these Ubuntu commands:

# Monitor conntrack entries
sudo conntrack -E -p tcp --dport 22

# Check firewall logs
sudo journalctl -k --grep="DROP" --since "1 hour ago"

# Inspect TCP window scaling
ss -itmp '( dport = :ssh )'

Recently while traveling between countries, I encountered a peculiar SSH issue where connections would fail after successful public key authentication. The debug logs showed:

debug1: Authentication succeeded (publickey).
debug2: channel 0: open confirm rwindow 0 rmax 32768
Write failed: Broken pipe

This was particularly frustrating because:

  • Authentication succeeds (visible in auth.log)
  • Works fine from other networks
  • Local datacenter SSH works between servers
  • Console login remains functional

Packet captures revealed the TCP handshake completes normally, with the break occurring during the encrypted session establishment phase. The tcpdump output shows normal SYN/SYN-ACK/ACK exchange, followed by encrypted payloads before the stall.

Key observations from the network traces:

19:00:41.760341 IP [redacted_ip].ssh > 192.168.1.2.50409: Flags [P.], seq 1490:1674, ack 22, win 114
19:01:34.714519 IP 192.168.1.2.50409 > [redacted_ip].ssh: Flags [P.], seq 2702:3162, ack 2790, win 4096

The auth.log contained one notable warning:

error: Could not load host key: /etc/ssh/ssh_host_ed25519_key

This suggests the server may be falling back to less secure key types. To regenerate all host keys:

sudo rm /etc/ssh/ssh_host_*
sudo dpkg-reconfigure openssh-server

Several client-side factors can contribute to this behavior:

  1. TCP Keepalives: Add these to ~/.ssh/config:
    Host *
        ServerAliveInterval 60
        ServerAliveCountMax 5
        TCPKeepAlive yes
  2. Cipher Selection: Force modern ciphers:
    ssh -oCiphers=chacha20-poly1305@openssh.com,aes256-gcm@openssh.com
  3. MTU Issues: Try lowering MTU:
    sudo ifconfig en0 mtu 1400

Network security devices often interfere with SSH:

Device Type Common Issues Test Command
Stateful Firewall Aggressive connection timeouts ssh -o ConnectTimeout=30
IDS/IPS SSH version filtering ssh -o "Protocol 2"
NAT Gateway TCP RST injection sudo tcpdump 'tcp[tcpflags] & (tcp-rst) != 0'

For persistent cases, we need deeper inspection:

# Client-side debug
strace -f -e trace=network -s 10000 -o ssh.strace ssh -vvv user@host

# Server-side monitoring
sudo journalctl -u ssh --follow --output=cat

Particularly watch for:

  • TCP retransmissions in tcpdump
  • SELinux/AppArmor denials in system logs
  • Resource exhaustion (sshd memory usage)

When standard SSH fails, consider these workarounds:

# Use HTTP CONNECT proxy
ssh -o ProxyCommand="nc -X connect -x proxy:3128 %h %p" user@host

# Try mosh for unstable connections
mosh --ssh="ssh -p 22" user@host

# Web-based fallback
sudo apt install shellinabox