When automating virtual machine provisioning, we often need to wait for the SSH service to become available before executing remote commands. The naive approach of immediately attempting SSH connections leads to script failures during the VM's boot sequence.
Here are several robust methods to check SSH availability in bash scripts:
# Method 1: Using nc (netcat) with timeout
until nc -z -w5 $VM_IP 22; do
echo "Waiting for SSH..."
sleep 5
done
# Method 2: Using ssh-keyscan with timeout
until ssh-keyscan -t rsa $VM_IP &>/dev/null; do
echo "SSH not ready yet..."
sleep 3
done
Here's a production-ready bash function with error handling:
wait_for_ssh() {
local ip=$1
local max_retries=30
local retry_interval=10
for ((i=1; i<=max_retries; i++)); do
if ssh-keyscan -T 5 -t rsa "$ip" &>/dev/null; then
echo "SSH connection established to $ip"
return 0
fi
echo "Attempt $i/$max_retries: SSH not ready. Retrying in $retry_interval seconds..."
sleep "$retry_interval"
done
echo "Error: SSH connection timed out after $((max_retries * retry_interval)) seconds"
return 1
}
For environments with strict security requirements:
# Using SSH strict host key checking with timeout
wait_for_secure_ssh() {
local ip=$1
local ssh_opts=(
-o "ConnectTimeout=5"
-o "StrictHostKeyChecking=no"
-o "UserKnownHostsFile=/dev/null"
-o "LogLevel=ERROR"
)
until ssh "${ssh_opts[@]}" "$ip" exit &>/dev/null; do
sleep 5
done
}
Example using AWS EC2 with instance metadata:
# AWS EC2 provisioning workflow
instance_id=$(aws ec2 run-instances ...)
aws ec2 wait instance-running --instance-ids "$instance_id"
vm_ip=$(aws ec2 describe-instances --instance-ids "$instance_id" --query ...)
wait_for_ssh "$vm_ip" || exit 1
ssh "$vm_ip" "sudo yum update -y"
- Always implement timeout mechanisms to prevent infinite loops
- Consider network security groups/firewall rules that might block connections
- For Windows-based targets, use different approaches like Test-NetConnection in PowerShell
- Log connection attempts for debugging provisioning issues
When automating virtual machine provisioning and connection in shell scripts, we often face the timing problem where the VM's SSH service isn't immediately available after the machine boots. The naive approach of simply running ssh
might fail because:
# This will fail if the VM isn't ready yet
ssh user@192.168.2.38
1. Using ssh-keyscan in a Loop
The most accurate method checks SSH service readiness by attempting to scan the host's keys:
until ssh-keyscan 192.168.2.38 >> ~/.ssh/known_hosts 2>/dev/null; do
sleep 5
echo "Waiting for SSH at 192.168.2.38..."
done
ssh user@192.168.2.38
2. TCP Port Check with nc (netcat)
A lightweight alternative that verifies port 22 availability:
while ! nc -z -w5 192.168.2.38 22; do
sleep 5
echo "Waiting for port 22..."
done
ssh user@192.168.2.38
3. Using nmap for Advanced Checks
For more complex scenarios where you need detailed port information:
until nmap -p22 192.168.2.38 | grep -q "22/tcp open"; do
sleep 10
echo "Checking SSH port status..."
done
ssh user@192.168.2.38
For frequent use in scripts, create this bash function:
waitforssh() {
local host=$1
local user=${2:-$USER}
local timeout=${3:-300}
local interval=${4:-5}
local start_time=$(date +%s)
echo "Waiting for SSH on $host..."
while ! ssh-keyscan "$host" >> ~/.ssh/known_hosts 2>/dev/null; do
sleep $interval
local current_time=$(date +%s)
local elapsed=$((current_time - start_time))
if [ $elapsed -ge $timeout ]; then
echo "Timeout waiting for SSH after $timeout seconds"
return 1
fi
echo "Elapsed: ${elapsed}s - Retrying in ${interval}s..."
done
echo "SSH is available on $host"
ssh "$user@$host"
}
# Usage:
waitforssh 192.168.2.38 myuser
For production scripts, consider these additional scenarios:
- Specifying a custom SSH port
- Handling host key verification failures
- Custom timeout values
- Parallel execution with multiple VMs
Here's an enhanced version supporting custom ports:
waitforssh_port() {
local host=$1
local port=${2:-22}
local user=${3:-$USER}
local timeout=${4:-300}
local start_time=$(date +%s)
echo "Waiting for SSH on $host:$port..."
until ssh -p $port -o ConnectTimeout=5 -o StrictHostKeyChecking=no $user@$host exit >/dev/null 2>&1; do
sleep 5
local current_time=$(date +%s)
local elapsed=$((current_time - start_time))
if [ $elapsed -ge $timeout ]; then
echo "Timeout waiting for SSH after $timeout seconds"
return 1
fi
done
echo "SSH is available on $host:$port"
ssh -p $port "$user@$host"
}