How ESXi Internally Handles SSD Marking: Performance Impacts and Low-Level Mechanisms


2 views

When you mark a disk as SSD in ESXi, you're fundamentally altering how the hypervisor interacts with the storage device at multiple layers:

// Simplified storage stack interaction
StorageDevice → PSA (Pluggable Storage Architecture) → VMFS/NFS → VMM
                    ↑
               SSD detection layer

The SSD flag triggers these specific optimizations:

  • Queue Depth Adjustment: ESXi increases the device queue depth from 32 (HDD default) to 64 or higher
  • I/O Scheduler Bypass: Disables seek optimization algorithms meant for rotating media
  • Write Coalescing: Reduces write amplification by batching small writes

Example of misconfigured SSD marking causing latency spikes:

esxcli storage core device list -d naa.60050768018301abcdef
   Is SSD: true (manually set)
   Queue Depth: 64
   Observed Latency: 150ms (should be <5ms for real SSD)

ESXi 6.7+ includes these automatic detection mechanisms that manual marking can interfere with:

  1. ATA IDENTIFY checks for rotational rate=0
  2. SCSI VPD page 0xB1 (Block Device Characteristics)
  3. NVMe Identify Controller log page

When dealing with hybrid storage arrays, consider this PowerCLI snippet to properly tag devices:

$ssdDevices = Get-ScsiLun | Where {
    $_.ExtensionData.StorageArrayType -eq "VMW_SATP_LOCAL" -and
    $_.IsSsd -eq $true -and
    $_.CapacityGB -lt 2000 # Filter out large LUNs likely HDD
}
$ssdDevices | Set-ScsiLun -IsSsd $true -Confirm:$false

Key metrics to monitor after SSD marking:

esxtop -d 2
   DEVICE: DQLEN/%USD/%UTIL/DAVG (Queue Depth/Used/Utilization/Avg Latency)
   vsan:  LLCB/aLLCB (Cache hit rates)

When you mark a disk/LUN as SSD in ESXi through the esxcli storage nmp device set command or GUI, you're fundamentally altering how the hypervisor interacts with the storage device at three key layers:

// Example CLI command to mark device as SSD
esxcli storage nmp device set --device naa.6005076801821adf3a1f5b45414e4a42 --claim-rule=enable --ssd-enable

ESXi implements several SSD-specific optimizations when the flag is set:

  • Queue Depth Adjustment: ESXi increases the device queue depth from ~32 (HDD default) to ~64-256 for SSDs
  • I/O Scheduler Behavior: Disables rotational latency compensation in the mptspi/nfbnem/nmlx drivers
  • VMFS-6 Allocation: Enables sub-blocks (8KB vs 64KB) and changes the file block allocation strategy
  • VAAI Primitives: Enables UNMAP/ATS operations that assume flash characteristics

When incorrectly marking spinning disks as SSDs, these optimizations backfire:

Optimization HDD Impact
Increased queue depth Causes seek storm as I/O gets reordered
Disabled latency compensation Breaks HDD head positioning algorithms
Small sub-blocks Excessive metadata operations

Consider this PowerCLI snippet that demonstrates proper SSD designation for a SQL workload:

# Verify actual SSD status first
Get-VMHostStorage -VMHost esx01.prod | Where {$_.IsSSD -eq $true}

# Correctly mark SSD for SQL datastore
$datastore = Get-Datastore "SQL_Tier1"
$devices = $datastore.ExtensionData.Info.Vmfs.Extent.DiskName
foreach ($device in $devices) {
    esxcli storage nmp device set --device $device --claim-rule=enable --ssd-enable
}

Legitimate cases for manual SSD marking include:

  • All-flash SAN arrays presenting as rotational (common with older EMC/NetApp)
  • NVMe drives behind SAS expanders
  • Certain RAID controller configurations that mask SSD characteristics

Always verify with esxcli storage core device list -d naa.id before overriding.

These esxtop metrics indicate problems when HDDs are marked as SSDs:

esxtop -b -n 1 -d 2 | grep -i "DQLEN\|KAVG\|DAVG\|GAVG"

Look for:

  • DQLEN >32 on spinning disks
  • KAVG >2ms with high DAVG
  • Consistent GAVG >20ms