When SSH connections start failing in a user-specific manner while working elsewhere, it's time for some serious debugging. Here's what we know about this particular case:
- Successful from other users/systems using identical credentials
- Gateway authentication works, but node connection times out
- Only affects one macOS user account with previously working setup
The gateway logs show successful authentication but then fail when attempting to connect to the target node:
error: connect_to port 22: failed.
Connection closed by
The verbose SSH output reveals where things go wrong:
channel 0: open failed: connect failed: Connection timed out
stdio forwarding failed
ssh_exchange_identification: Connection closed by remote host
The gateway's sshd_config
shows TCP forwarding is specifically allowed for the gatekeeper user:
Match User gatekeeper
AllowTcpForwarding yes
AllowAgentForwarding no
X11Forwarding no
Since this is user-specific on one machine, we should check:
- SSH Control Path Issues:
rm -rf ~/.ssh/controlmasters/*
- Local SSH Config Conflicts:
ssh -F /dev/null -v -J gatekeeper@gateway ubuntu@node
- Key Agent Problems:
ssh-add -l ssh-add -D ssh-add ~/.ssh/id_rsa
- User-Specific Environment Variables:
env | grep SSH
When basic checks don't reveal the issue, try these advanced methods:
# Check for firewall rules affecting the specific user
sudo pfctl -sr | grep $USER
# Compare effective SSH options between working/non-working users
ssh -G node > working_user_config
su otheruser -c "ssh -G node" > other_user_config
diff working_user_config other_user_config
# Test with a minimal known-working configuration
ssh -o "ProxyCommand=ssh -W %h:%p gatekeeper@gateway" ubuntu@node
If the issue persists after all these checks, consider recreating the user's SSH environment:
mkdir ~/.ssh/backup
mv ~/.ssh/{config,known_hosts,authorized_keys} ~/.ssh/backup/
ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa -N ""
Remember to redistribute your new public key to all systems where you need access. This should eliminate any corrupted state in your SSH configuration while preserving your existing keys in backup.
Recently encountered a particularly puzzling SSH issue where jump host connections worked universally except for one specific user account on my macOS machine. Here's my deep dive into troubleshooting this edge case.
The setup was standard:
ssh -v -J gatekeeper@gateway ubuntu@node -i ~/.ssh/id_rsa
Key observations:
- Authentication succeeds to gateway host
- Connection times out during node forwarding
- Issue persists only for my primary user account
- Works from other accounts on same machine with identical keys
- Works from other machines entirely
The gateway logs revealed:
sshd[7739]: error: connect_to <node-ip> port 22: failed.
sshd[7739]: Connection closed by <laptop-out-ip>
Notably absent were any connection attempts logged on the target node.
1. SSH Config File Conflicts
First suspect was conflicting configurations. Checked for:
grep -r "Host gateway" ~/.ssh/config*
cat /etc/ssh/ssh_config
Found no relevant differences between working/non-working user accounts.
2. Permission and Ownership Issues
Validated file permissions with:
ls -la ~/.ssh/
stat ~/.ssh/id_rsa
All showed correct 600 permissions for the private key.
3. Network Stack Differences
Compared network configurations:
netstat -rn | grep utun
ifconfig | grep inet
Discovered the problematic user had residual VPN routes that weren't properly cleared.
Running a packet capture revealed the root cause:
sudo tcpdump -i any -n host <node-ip> and port 22
Output showed the connection attempts were being routed through a defunct VPN interface rather than the main network interface.
For my specific case, these commands resolved the issue:
sudo route -n delete <node-subnet>
sudo ifconfig utun0 down
ssh -o ProxyJump=gatekeeper@gateway ubuntu@node
Added these checks to my troubleshooting toolkit:
- Periodically flush old VPN routes
- Verify routing tables with
netstat -rn
- Compare environment variables between user accounts
When debugging user-specific SSH issues:
- Never assume identical environments - check everything
- Packet captures don't lie when logs are ambiguous
- Residual network configurations often cause the weirdest issues