SSH ProxyJump Failure: Debugging User-Specific Connection Timeouts on macOS


4 views

When SSH connections start failing in a user-specific manner while working elsewhere, it's time for some serious debugging. Here's what we know about this particular case:

  • Successful from other users/systems using identical credentials
  • Gateway authentication works, but node connection times out
  • Only affects one macOS user account with previously working setup

The gateway logs show successful authentication but then fail when attempting to connect to the target node:

error: connect_to  port 22: failed.
Connection closed by 

The verbose SSH output reveals where things go wrong:

channel 0: open failed: connect failed: Connection timed out
stdio forwarding failed
ssh_exchange_identification: Connection closed by remote host

The gateway's sshd_config shows TCP forwarding is specifically allowed for the gatekeeper user:

Match User gatekeeper
AllowTcpForwarding yes
AllowAgentForwarding no
X11Forwarding no

Since this is user-specific on one machine, we should check:

  1. SSH Control Path Issues:
    rm -rf ~/.ssh/controlmasters/*
    
    
  2. Local SSH Config Conflicts:
    ssh -F /dev/null -v -J gatekeeper@gateway ubuntu@node
    
    
  3. Key Agent Problems:
    ssh-add -l
    ssh-add -D
    ssh-add ~/.ssh/id_rsa
    
    
  4. User-Specific Environment Variables:
    env | grep SSH
    
    

When basic checks don't reveal the issue, try these advanced methods:

# Check for firewall rules affecting the specific user
sudo pfctl -sr | grep $USER

# Compare effective SSH options between working/non-working users
ssh -G node > working_user_config
su otheruser -c "ssh -G node" > other_user_config
diff working_user_config other_user_config

# Test with a minimal known-working configuration
ssh -o "ProxyCommand=ssh -W %h:%p gatekeeper@gateway" ubuntu@node

If the issue persists after all these checks, consider recreating the user's SSH environment:

mkdir ~/.ssh/backup
mv ~/.ssh/{config,known_hosts,authorized_keys} ~/.ssh/backup/
ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa -N ""

Remember to redistribute your new public key to all systems where you need access. This should eliminate any corrupted state in your SSH configuration while preserving your existing keys in backup.


Recently encountered a particularly puzzling SSH issue where jump host connections worked universally except for one specific user account on my macOS machine. Here's my deep dive into troubleshooting this edge case.

The setup was standard:

ssh -v -J gatekeeper@gateway ubuntu@node -i ~/.ssh/id_rsa

Key observations:

  • Authentication succeeds to gateway host
  • Connection times out during node forwarding
  • Issue persists only for my primary user account
  • Works from other accounts on same machine with identical keys
  • Works from other machines entirely

The gateway logs revealed:

sshd[7739]: error: connect_to <node-ip> port 22: failed.
sshd[7739]: Connection closed by <laptop-out-ip>

Notably absent were any connection attempts logged on the target node.

1. SSH Config File Conflicts

First suspect was conflicting configurations. Checked for:

grep -r "Host gateway" ~/.ssh/config*
cat /etc/ssh/ssh_config

Found no relevant differences between working/non-working user accounts.

2. Permission and Ownership Issues

Validated file permissions with:

ls -la ~/.ssh/
stat ~/.ssh/id_rsa

All showed correct 600 permissions for the private key.

3. Network Stack Differences

Compared network configurations:

netstat -rn | grep utun
ifconfig | grep inet

Discovered the problematic user had residual VPN routes that weren't properly cleared.

Running a packet capture revealed the root cause:

sudo tcpdump -i any -n host <node-ip> and port 22

Output showed the connection attempts were being routed through a defunct VPN interface rather than the main network interface.

For my specific case, these commands resolved the issue:

sudo route -n delete <node-subnet>
sudo ifconfig utun0 down
ssh -o ProxyJump=gatekeeper@gateway ubuntu@node

Added these checks to my troubleshooting toolkit:

  • Periodically flush old VPN routes
  • Verify routing tables with netstat -rn
  • Compare environment variables between user accounts

When debugging user-specific SSH issues:

  • Never assume identical environments - check everything
  • Packet captures don't lie when logs are ambiguous
  • Residual network configurations often cause the weirdest issues