How to Implement SSH Server Availability Check in Bash Scripts for Automated VM Deployment


1 views

When automating virtual machine provisioning, we often need to wait for the SSH service to become available before executing remote commands. The naive approach of immediately attempting SSH connections leads to script failures during the VM's boot sequence.

Here are several robust methods to check SSH availability in bash scripts:

# Method 1: Using nc (netcat) with timeout
until nc -z -w5 $VM_IP 22; do
    echo "Waiting for SSH..."
    sleep 5
done
# Method 2: Using ssh-keyscan with timeout
until ssh-keyscan -t rsa $VM_IP &>/dev/null; do
    echo "SSH not ready yet..."
    sleep 3
done

Here's a production-ready bash function with error handling:

wait_for_ssh() {
    local ip=$1
    local max_retries=30
    local retry_interval=10
    
    for ((i=1; i<=max_retries; i++)); do
        if ssh-keyscan -T 5 -t rsa "$ip" &>/dev/null; then
            echo "SSH connection established to $ip"
            return 0
        fi
        echo "Attempt $i/$max_retries: SSH not ready. Retrying in $retry_interval seconds..."
        sleep "$retry_interval"
    done
    
    echo "Error: SSH connection timed out after $((max_retries * retry_interval)) seconds"
    return 1
}

For environments with strict security requirements:

# Using SSH strict host key checking with timeout
wait_for_secure_ssh() {
    local ip=$1
    local ssh_opts=(
        -o "ConnectTimeout=5"
        -o "StrictHostKeyChecking=no"
        -o "UserKnownHostsFile=/dev/null"
        -o "LogLevel=ERROR"
    )
    
    until ssh "${ssh_opts[@]}" "$ip" exit &>/dev/null; do
        sleep 5
    done
}

Example using AWS EC2 with instance metadata:

# AWS EC2 provisioning workflow
instance_id=$(aws ec2 run-instances ...)
aws ec2 wait instance-running --instance-ids "$instance_id"
vm_ip=$(aws ec2 describe-instances --instance-ids "$instance_id" --query ...)

wait_for_ssh "$vm_ip" || exit 1
ssh "$vm_ip" "sudo yum update -y"
  • Always implement timeout mechanisms to prevent infinite loops
  • Consider network security groups/firewall rules that might block connections
  • For Windows-based targets, use different approaches like Test-NetConnection in PowerShell
  • Log connection attempts for debugging provisioning issues

When automating virtual machine provisioning and connection in shell scripts, we often face the timing problem where the VM's SSH service isn't immediately available after the machine boots. The naive approach of simply running ssh might fail because:

# This will fail if the VM isn't ready yet
ssh user@192.168.2.38

1. Using ssh-keyscan in a Loop

The most accurate method checks SSH service readiness by attempting to scan the host's keys:

until ssh-keyscan 192.168.2.38 >> ~/.ssh/known_hosts 2>/dev/null; do
    sleep 5
    echo "Waiting for SSH at 192.168.2.38..."
done
ssh user@192.168.2.38

2. TCP Port Check with nc (netcat)

A lightweight alternative that verifies port 22 availability:

while ! nc -z -w5 192.168.2.38 22; do
    sleep 5
    echo "Waiting for port 22..."
done
ssh user@192.168.2.38

3. Using nmap for Advanced Checks

For more complex scenarios where you need detailed port information:

until nmap -p22 192.168.2.38 | grep -q "22/tcp open"; do
    sleep 10
    echo "Checking SSH port status..."
done
ssh user@192.168.2.38

For frequent use in scripts, create this bash function:

waitforssh() {
    local host=$1
    local user=${2:-$USER}
    local timeout=${3:-300}
    local interval=${4:-5}

    local start_time=$(date +%s)
    
    echo "Waiting for SSH on $host..."
    while ! ssh-keyscan "$host" >> ~/.ssh/known_hosts 2>/dev/null; do
        sleep $interval
        local current_time=$(date +%s)
        local elapsed=$((current_time - start_time))
        
        if [ $elapsed -ge $timeout ]; then
            echo "Timeout waiting for SSH after $timeout seconds"
            return 1
        fi
        echo "Elapsed: ${elapsed}s - Retrying in ${interval}s..."
    done
    
    echo "SSH is available on $host"
    ssh "$user@$host"
}

# Usage:
waitforssh 192.168.2.38 myuser

For production scripts, consider these additional scenarios:

  • Specifying a custom SSH port
  • Handling host key verification failures
  • Custom timeout values
  • Parallel execution with multiple VMs

Here's an enhanced version supporting custom ports:

waitforssh_port() {
    local host=$1
    local port=${2:-22}
    local user=${3:-$USER}
    local timeout=${4:-300}
    
    local start_time=$(date +%s)
    
    echo "Waiting for SSH on $host:$port..."
    until ssh -p $port -o ConnectTimeout=5 -o StrictHostKeyChecking=no $user@$host exit >/dev/null 2>&1; do
        sleep 5
        local current_time=$(date +%s)
        local elapsed=$((current_time - start_time))
        
        if [ $elapsed -ge $timeout ]; then
            echo "Timeout waiting for SSH after $timeout seconds"
            return 1
        fi
    done
    
    echo "SSH is available on $host:$port"
    ssh -p $port "$user@$host"
}