When running a CentOS 6.5 VM on VMware ESXi 5.5 with basic hardware configuration, we encounter a peculiar disk performance pattern. Initial benchmarks show healthy throughput (~105MB/s), but after several consecutive write operations, performance degrades dramatically to ~20-25MB/s with latency spikes reaching 1.5 seconds.
The core test command used was:
for i in {1..10}; do
dd if=/dev/zero of=/test.img bs=8k count=256k conv=fdatasync
done
This reveals the performance degradation pattern clearly. Additional testing with direct I/O flags:
dd if=/dev/zero of=/test.img bs=16k count=256k conv=fdatasync oflag=direct
The iostat output reveals critical differences between "good" and "bad" states:
- During good performance: avgqu-sz ~10, await ~5ms
- During degraded performance: avgqu-sz spikes to 100+, await exceeds 1000ms
Several configuration aspects contribute to this behavior:
- Single 7200 RPM SATA drive with no RAID
- VMFS-5 filesystem on ESXi
- LVM configuration inside CentOS guest
- Default write-back cache policy
VM Configuration Adjustments
First, verify these ESXi settings:
esxcli storage core device list
esxcli storage nmp device list
Guest OS Tuning
1. IO scheduler configuration:
echo noop > /sys/block/sda/queue/scheduler
2. Filesystem mount options (in /etc/fstab):
UUID=... / ext4 defaults,noatime,nodiratime,data=writeback 0 1
3. VMX file additions:
scsi0:0.virtualSSD = 1
scsi0:0.throughputCap = "off"
For deeper analysis, we used iozone with different tuned profiles:
iozone -g 4G -Rab output_file
Key findings from iozone tests:
- Performance cliffs at different file sizes depending on tuned profile
- Random I/O patterns show significant variability
- No profile completely eliminates the degradation pattern
Based on the evidence, this appears to be a combination of:
- VMFS write cache exhaustion
- Disk queue saturation
- Host-level contention
Implementation steps:
1. First, adjust ESXi disk settings:
esxcli storage core device set --device=naa.xxx --config=Disk.SchedNumReqOutstanding=64
2. Then modify guest configuration:
echo 256 > /sys/block/sda/queue/nr_requests
echo 2048 > /sys/block/sda/queue/max_sectors_kb
3. Finally, add these VM advanced parameters:
Disk.EnableUUID = TRUE
scsi0:0.virtualSSD = 1
scsi0:0.throughputCap = "off"
When running sequential write tests on a CentOS 6.5 VM under ESXi 5.5, we observe a peculiar performance degradation pattern:
# Initial good performance
dd if=/dev/zero of=/test.img bs=8k count=256k conv=fdatasync
2147483648 bytes (2.1 GB) copied, 20.451 s, 105 MB/s
# Later degraded performance
2147483648 bytes (2.1 GB) copied, 103.42 s, 20.8 MB/s
Several approaches were attempted without success:
# Changed I/O scheduler to noop
echo noop > /sys/block/sda/queue/scheduler
# Added direct I/O flag
dd if=/dev/zero of=/test.img bs=16k count=256k conv=fdatasync oflag=direct
# Cleared caches
sync; echo 3 > /proc/sys/vm/drop_caches
The iostat output reveals crucial details about the storage behavior:
# Good performance
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 840.00 0.00 52.50 128.00 2.00 2.38 1.19 100.00
# Bad performance
sda 0.00 0.00 0.00 240.00 0.00 15.00 128.00 12.00 50.00 4.17 100.00
Several ESXi configuration parameters need verification:
# Check VM disk adapter type
vmx file should show:
scsi0:0.deviceType = "scsi-hardDisk"
# Verify disk mode
scsi0:0.mode = "persistent"
# Check VMXNET3 driver version
ethtool -i eth0 | grep version
Using iozone provides more comprehensive benchmarks:
# Install iozone
yum install iozone -y
# Run comprehensive test
iozone -g 4G -Rab iozone_results.xls
After extensive testing, the root cause was identified as ESXi's disk write cache behavior. The solution involves:
# In ESXi host advanced settings:
esxcli system settings advanced set -o /Disk/EnableDiskUUID -i 1
# In VM configuration file:
disk.EnableUUID = "TRUE"
scsi0:0.ctkEnabled = "TRUE"
# In CentOS guest:
echo "vm.dirty_ratio = 10" >> /etc/sysctl.conf
echo "vm.dirty_background_ratio = 5" >> /etc/sysctl.conf
sysctl -p
Additionally, consider these optimizations:
# Disable atime updates
mount -o remount,noatime /
# Optimize LVM settings
pvchange --metadatacopies 1 /dev/sda2
vgchange --maxphysicalvolumes 128 vg_name