Debugging Persistent SSH Remote Port Forwarding Failures: Address Already in Use and Connection Stability


1 views

After months of stable operation, your SSH reverse tunnel suddenly starts failing with "Address already in use" errors. The /var/log/secure reveals the culprit:

bind: Address already in use
error: bind: Address already in use
error: channel_setup_fwd_listener: cannot listen to port: X

This indicates sshd isn't properly releasing ports when connections drop. On CentOS 6.5 with OpenSSH 5.3, the problem manifests through:

  • Ports remaining in TIME_WAIT state
  • sshd failing to garbage-collect dead connections
  • Requiring server reboots to clear stuck ports

The legacy OpenSSH version (5.3) has known issues with socket handling during abrupt disconnections. Modern systems handle this through:

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30

But your CentOS 6.5 box lacks these optimizations. The rapid disconnection sequence exposes three underlying problems:

  1. No socket reuse policy
  2. Missing dead connection cleanup
  3. No forced port release mechanism

Here's the complete arsenal to combat this issue:

1. The Nuclear Option (Kill Script)

Create /usr/local/bin/clean_ssh_ports:

#!/bin/bash
for port in {X..Y}; do
  fuser -k -n tcp $port
done
lsof -i -n | grep sshd | awk '{print $2}' | xargs kill -9

2. SSH Client Configuration

Modify your tunnel command with critical options:

ssh -o ExitOnForwardFailure=yes \
    -o ServerAliveInterval=60 \
    -o ServerAliveCountMax=3 \
    -R *:X:localhost:X user@B

3. System-Level Tweaks

Add to /etc/sysctl.conf:

net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 300

Combine everything in a robust tunnel script (/usr/local/bin/secure_tunnel):

#!/bin/bash
while true; do
  clean_ssh_ports
  ssh -N \
      -o "ExitOnForwardFailure=yes" \
      -o "ServerAliveInterval=30" \
      -o "ServerAliveCountMax=3" \
      -R *:X:localhost:X \
      -R *:Y:localhost:Y \
      user@B || {
    echo "Connection failed, retrying in 10 seconds..."
    sleep 10
  }
done

Verify socket states with:

watch -n 1 'netstat -tulnp | grep ssh'

Check established connections:

ss -o state established '( sport = :X or sport = :Y )'

After months of stable operation, my SSH reverse tunnel setup began failing with "bind: Address already in use" errors. The setup involves:

# From machine A behind firewall
ssh -R *:2222:localhost:22 user@vps.example.com
ssh -R *:3389:localhost:3389 user@vps.example.com

Several symptoms emerged:

  • Sudden failure after 3 months of stability
  • Multiple rapid failures within 24 hours
  • Ports remain bound after disconnect
  • Only VPS reboot resolves the issue

Examining /var/log/secure revealed critical clues:

bind: Address already in use
error: bind: Address already in use
error: channel_setup_fwd_listener: cannot listen to port: 2222

Key observations about the environment:

  • OpenSSH_5.3p1 on CentOS 6.5
  • Single-purpose VPS with minimal configuration
  • No recent configuration changes
  • Tunnel duration reduced to just 4 hours

The root cause appears to be TCP sockets entering TIME_WAIT state and sshd failing to properly clean up. Here are effective countermeasures:

Solution 1: Kernel-Level Socket Recycling

# Add to /etc/sysctl.conf
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 30

# Apply changes
sysctl -p

Solution 2: SSH Client Hardening

Modify your SSH command with these critical parameters:

ssh -o ExitOnForwardFailure=yes \
    -o ServerAliveInterval=60 \
    -o ServerAliveCountMax=3 \
    -R *:2222:localhost:22 \
    user@vps.example.com

Solution 3: The Nuclear Option - Automatic Cleanup

Create a kill script to run before reconnection attempts:

#!/bin/bash
PORTS=(2222 3389) # Add all your forwarded ports
SSH_PID=$(pgrep -f "ssh.*-R")

for port in "${PORTS[@]}"; do
  fuser -k -n tcp ${port} >/dev/null 2>&1
  lsof -ti:${port} | xargs kill -9 >/dev/null 2>&1
done

[ -n "$SSH_PID" ] && kill -9 $SSH_PID

On your VPS (machine B), modify /etc/ssh/sshd_config:

ClientAliveInterval 30
ClientAliveCountMax 3
TCPKeepAlive yes

Restart sshd after changes:

service sshd restart

Implement a watchdog script on machine A:

#!/bin/bash
TARGET_HOST="vps.example.com"
CHECK_PORT=2222

while true; do
  if ! nc -z ${TARGET_HOST} ${CHECK_PORT}; then
    ./kill_ports.sh # Your cleanup script
    ssh -Nf -R *:2222:localhost:22 user@${TARGET_HOST}
  fi
  sleep 300
done

For production environments, consider these additional measures:

  • Upgrade to OpenSSH 8.0+ for better resource handling
  • Implement systemd socket activation for SSH
  • Use autossh for automatic reconnection