Understanding %st (Steal Time) in Linux top Command: Hypervisor Impact on VM Performance


1 views

In virtualized environments, the %st (steal time) metric in top output represents the percentage of time your virtual CPU was ready to run but couldn't because the hypervisor assigned that CPU time to other virtual machines. This occurs when the physical host is oversubscribed (running more VMs than available physical CPU cores).

When you see consistently high %st values:

  • Your VM is competing for CPU resources with other VMs on the same host
  • Performance degradation occurs even if your VM isn't fully utilizing its vCPUs
  • EBS I/O operations might be affected (indirectly) due to CPU contention

Example of problematic output:

Cpu(s): 15.0%us, 5.0%sy, 0.0%ni, 60.0%id, 0.0%wa, 0.0%hi, 0.0%si, 20.0%st

Here's a Bash script to monitor steal time:

#!/bin/bash
while true; do
    steal=$(top -bn1 | grep "Cpu(s)" | awk '{print $8}')
    echo "$(date) - Steal Time: ${steal}%"
    if [[ $(echo "$steal > 10" | bc) -eq 1 ]]; then
        echo "Warning: High steal time detected!"
    fi
    sleep 5
done

Different cloud providers handle CPU scheduling differently:

  • AWS: Steal time often relates to instance neighbors and instance type
  • Google Cloud: Uses custom scheduler that may result in lower steal times
  • Azure: Burstable VMs show different steal patterns

When facing high steal time:

# Check if your VM is CPU-bound
vmstat 1 5

# Consider vertical scaling (larger instance type)
# Or horizontal scaling (more smaller instances)

# For EBS-heavy workloads:
# Monitor both steal time and IO wait (%wa)
iostat -x 1 5

For deeper investigation, use Linux perf tools:

# Install perf if needed
sudo apt install linux-tools-common linux-tools-generic

# Monitor VMEXIT events (indicator of hypervisor overhead)
sudo perf stat -e 'kvm:*' -a sleep 10

Steal time becomes critical when:

  • Consistently above 10% during peak hours
  • Correlated with application performance degradation
  • Combined with high ready queue (vmstat 'r' column)

Consider these for better visibility:

# Using mpstat for per-CPU stats
mpstat -P ALL 1

# Using sar for historical data
sar -u 1 5

In virtualization environments, %st (Steal Time) represents the percentage of time your virtual CPU was ready to run but couldn't because the hypervisor assigned that CPU time to other virtual machines on the same physical host. This metric appears in Linux's top command under CPU statistics.

# Sample top output with steal time
Cpu(s):  6.0%us,  3.0%sy,  0.0%ni, 78.7%id,  0.0%wa,  0.0%hi,  0.3%si, 12.0%st

High steal time (typically >10%) indicates:

  • Your physical host is oversubscribed (too many VMs competing for CPU)
  • Neighbor VMs are consuming disproportionate CPU resources
  • Potential performance degradation for CPU-bound workloads

Use these commands for deeper analysis:

# Continuous monitoring with 1-second intervals
vmstat 1

# Checking historical CPU steal
sar -u 1 3

# Isolating steal time with mpstat
mpstat -P ALL 1

In AWS EC2 environments:

  • EBS I/O operations don't directly contribute to steal time
  • Burstable instances (t-series) experience higher steal during CPU credits exhaustion
  • Steal time spikes may correlate with noisy neighbor issues

When experiencing high steal time:

  1. Right-size instances: Upgrade to instances with dedicated cores
  2. Monitor patterns: Check if spikes correlate with specific times/neighbors
  3. Workload shifting: Move CPU-intensive tasks to low-steal periods
  4. Cloud provider tools: Use AWS Trusted Advisor or equivalent

You can estimate real CPU available accounting for steal:

#!/bin/bash
# Calculate effective CPU utilization
total_cpu=$(grep -c ^processor /proc/cpuinfo)
steal_percent=$(top -bn1 | grep "Cpu(s)" | awk '{print $8}')
effective_cpu=$(echo "$total_cpu * (1 - $steal_percent/100)" | bc -l)
echo "Effective vCPUs: $effective_cpu"