When your VMware infrastructure reports Co-Stop values exceeding 127,835ms (yes, that's 2+ minutes of waiting time), you're facing severe CPU scheduling contention. The KB article's definition holds true - this represents time when vCPUs are ready to execute but can't due to resource contention. Our case with 1x8 vCPU and 14x4 vCPU guests on dual 4-core physical CPUs is a textbook example of overcommitment.
Let's break down the resource allocation:
Total physical cores = 2 sockets × 4 cores = 8 cores Total vCPUs allocated = (1×8) + (14×4) = 64 vCPUs Overcommitment ratio = 64:8 or 8:1
This extreme overcommitment explains why your Veeam reports seem unbelievable - because they reveal an unsustainable configuration.
Here's how to verify Co-Stop metrics directly:
Connect-VIServer -Server your_vcenter Get-VMHost | Get-Stat -Stat cpu.coStop -Realtime -MaxSamples 10 | Sort-Object -Property Timestamp | Format-Table -Property EntityName, Timestamp, Value -AutoSize
This outputs raw Co-Stop measurements in milliseconds for each host.
Use this ESXTOP command to monitor real-time contention:
esxtop -b -d 2 -n 10 | grep -E "PCPU_USED|CSTP"
Key columns to watch:
%CSTP - Percentage of time in Co-Stop state PCPU_USED(%) - Physical CPU utilization
Immediate actions for our case study:
1. Right-size the 8-vCPU VM (most business apps don't need this) 2. Implement VM CPU limits: Get-VM "CriticalVM" | Set-VM -CpuLimitMhz 8000 -Confirm:$false 3. Enable CPU Hot Add to reduce initial allocation 4. Distribute VMs across hosts using DRS rules: New-DrsRule -Name "Anti-Affinity-Group" -VM (Get-VM "VM1","VM2") -Type KeepSeparated
For sustainable performance:
1. Maintain max 3:1 vCPU:pCPU ratio for CPU-intensive workloads 2. Implement proper reservations for critical VMs: Get-VM "DB_Server" | Set-VM -NumCpu 4 -CpuReservationMhz 8000 3. Upgrade to newer CPUs with higher core counts 4. Consider vSphere Distributed Resource Scheduler (DRS) automation level
Create this vRealize Operations Manager super metric:
"CPU Co-Stop Time" = (sum(${adaptertype=VMWARE, objecttype=VirtualMachine, metric=cpu|coStop}) / ${adaptertype=VMWARE, objecttype=VirtualMachine, metric=cpu|corecount} * 1000) > 5000 ? "Critical" : "Normal"
Threshold guidelines:
< 1000ms - Normal 1000-5000ms - Warning > 5000ms - Critical
When diagnosing performance issues in VMware environments, the Co-Stop metric is one of the most telling indicators of CPU scheduling contention. As defined in VMware's official documentation:
Co-Stop = Time when a vCPU is ready to run but delayed due to co-vCPU scheduling contention
This occurs when multiple vCPUs from the same VM need to be scheduled simultaneously, but the hypervisor cannot allocate the required physical CPU resources immediately. The result is what users perceive as "slowness" or unresponsiveness.
In the case described, we observe a Co-Stop average of 127,835.94 ms (over 2 minutes) on a host with:
- Host configuration: 2 physical CPUs @ 4 cores each (8 logical CPUs total)
- VM configuration: 1×8 vCPU VM + 14×4 vCPU VMs
The math reveals the fundamental issue:
Total vCPUs allocated = (1×8) + (14×4) = 64 vCPUs
Physical CPUs available = 8 logical cores
Overcommitment ratio = 64:8 = 8:1
To properly measure Co-Stop values, we can use VMware's PowerCLI:
Connect-VIServer -Server vcenter.example.com
Get-Stat -Entity (Get-VMHost) -Stat "cpu.coStop" -Realtime -MaxSamples 10 |
Measure-Object -Property Value -Average |
Select-Object -Property Average
For deeper analysis of individual VMs:
Get-VM | Where-Object {$_.PowerState -eq "PoweredOn"} | ForEach-Object {
$stats = Get-Stat -Entity $_ -Stat "cpu.ready.summation","cpu.coStop" -IntervalMins 5
[PSCustomObject]@{
VMName = $_.Name
CPUReady = ($stats | Where-Object {$_.MetricId -eq "cpu.ready.summation"}).Value
CoStop = ($stats | Where-Object {$_.MetricId -eq "cpu.coStop"}).Value
}
} | Sort-Object -Property CoStop -Descending
Immediate remediation steps:
- Right-size over-allocated VMs (especially the 8-vCPU VM)
- Enable CPU Hot Add to allow dynamic scaling
- Configure VM CPU affinity where appropriate
Advanced configuration tweaks:
# Set CPU reservation for critical VMs
Get-VM "CriticalVM1" | Set-VM -CpuReservationMhz 4000 -Confirm:$false
# Configure NUMA affinity
Get-VM "LargeVM" | Set-VM -Numaffinity "0,1,2,3"
# Adjust CPU shares for priority
Get-VM "LowPriorityVM" | Set-VM -CpuSharesLevel Low
Implement these thresholds for proactive monitoring:
Metric | Warning | Critical |
---|---|---|
CPU Co-Stop | > 1000ms | > 5000ms |
CPU Ready | > 5% | > 10% |
vCPU:pCPU Ratio | > 4:1 | > 6:1 |
For continuous monitoring, consider this Python script using pyVmomi:
from pyVmomi import vim
from tools import cli
def check_co_stop(content, host):
perfManager = content.perfManager
metricId = perfManager.QueryAvailablePerfMetric(
entity=host,
metricId=[vim.PerformanceManager.MetricId(counterId=126)]
)
# Additional implementation for query and alert logic
...