Diagnosing and Resolving High Disk I/O Latency in Linux VMs on VMware ESXi 5.5


2 views

When running a CentOS 6.5 VM on VMware ESXi 5.5 with basic hardware configuration, we encounter a peculiar disk performance pattern. Initial benchmarks show healthy throughput (~105MB/s), but after several consecutive write operations, performance degrades dramatically to ~20-25MB/s with latency spikes reaching 1.5 seconds.

The core test command used was:

for i in {1..10}; do
  dd if=/dev/zero of=/test.img bs=8k count=256k conv=fdatasync
done

This reveals the performance degradation pattern clearly. Additional testing with direct I/O flags:

dd if=/dev/zero of=/test.img bs=16k count=256k conv=fdatasync oflag=direct

The iostat output reveals critical differences between "good" and "bad" states:

  • During good performance: avgqu-sz ~10, await ~5ms
  • During degraded performance: avgqu-sz spikes to 100+, await exceeds 1000ms

Several configuration aspects contribute to this behavior:

  • Single 7200 RPM SATA drive with no RAID
  • VMFS-5 filesystem on ESXi
  • LVM configuration inside CentOS guest
  • Default write-back cache policy

VM Configuration Adjustments

First, verify these ESXi settings:

esxcli storage core device list
esxcli storage nmp device list

Guest OS Tuning

1. IO scheduler configuration:

echo noop > /sys/block/sda/queue/scheduler

2. Filesystem mount options (in /etc/fstab):

UUID=... / ext4 defaults,noatime,nodiratime,data=writeback 0 1

3. VMX file additions:

scsi0:0.virtualSSD = 1
scsi0:0.throughputCap = "off"

For deeper analysis, we used iozone with different tuned profiles:

iozone -g 4G -Rab output_file

Key findings from iozone tests:

  • Performance cliffs at different file sizes depending on tuned profile
  • Random I/O patterns show significant variability
  • No profile completely eliminates the degradation pattern

Based on the evidence, this appears to be a combination of:

  • VMFS write cache exhaustion
  • Disk queue saturation
  • Host-level contention

Implementation steps:
1. First, adjust ESXi disk settings:

esxcli storage core device set --device=naa.xxx --config=Disk.SchedNumReqOutstanding=64

2. Then modify guest configuration:

echo 256 > /sys/block/sda/queue/nr_requests
echo 2048 > /sys/block/sda/queue/max_sectors_kb

3. Finally, add these VM advanced parameters:

Disk.EnableUUID = TRUE
scsi0:0.virtualSSD = 1
scsi0:0.throughputCap = "off"

When running sequential write tests on a CentOS 6.5 VM under ESXi 5.5, we observe a peculiar performance degradation pattern:

# Initial good performance
dd if=/dev/zero of=/test.img bs=8k count=256k conv=fdatasync
2147483648 bytes (2.1 GB) copied, 20.451 s, 105 MB/s

# Later degraded performance
2147483648 bytes (2.1 GB) copied, 103.42 s, 20.8 MB/s

Several approaches were attempted without success:

# Changed I/O scheduler to noop
echo noop > /sys/block/sda/queue/scheduler

# Added direct I/O flag
dd if=/dev/zero of=/test.img bs=16k count=256k conv=fdatasync oflag=direct

# Cleared caches
sync; echo 3 > /proc/sys/vm/drop_caches

The iostat output reveals crucial details about the storage behavior:

# Good performance
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda       0.00   0.00 0.00 840.00 0.00 52.50 128.00 2.00 2.38 1.19 100.00

# Bad performance
sda       0.00   0.00 0.00 240.00 0.00 15.00 128.00 12.00 50.00 4.17 100.00

Several ESXi configuration parameters need verification:

# Check VM disk adapter type
vmx file should show:
scsi0:0.deviceType = "scsi-hardDisk"

# Verify disk mode
scsi0:0.mode = "persistent"

# Check VMXNET3 driver version
ethtool -i eth0 | grep version

Using iozone provides more comprehensive benchmarks:

# Install iozone
yum install iozone -y

# Run comprehensive test
iozone -g 4G -Rab iozone_results.xls

After extensive testing, the root cause was identified as ESXi's disk write cache behavior. The solution involves:

# In ESXi host advanced settings:
esxcli system settings advanced set -o /Disk/EnableDiskUUID -i 1

# In VM configuration file:
disk.EnableUUID = "TRUE"
scsi0:0.ctkEnabled = "TRUE"

# In CentOS guest:
echo "vm.dirty_ratio = 10" >> /etc/sysctl.conf
echo "vm.dirty_background_ratio = 5" >> /etc/sysctl.conf
sysctl -p

Additionally, consider these optimizations:

# Disable atime updates
mount -o remount,noatime /

# Optimize LVM settings
pvchange --metadatacopies 1 /dev/sda2
vgchange --maxphysicalvolumes 128 vg_name