Troubleshooting Extremely Poor Disk I/O Performance on HP ProLiant DL385 G7 with P410i RAID Controller in ESXi 4.1 Virtualization Environment


2 views

html

After setting up my HP DL385 G7 with ESXi 4.1, I immediately noticed unacceptable disk latency across all VMs. The symptoms were particularly noticeable during write operations:

  • Windows 7 VM installation timing out during VMware Converter
  • 15k RPM SAS drives performing worse than consumer-grade SATA
  • Minimal disk activity shown in vCenter despite obvious lag

The P410i controller requires specific tuning for virtualization workloads. Here's the optimal CLI configuration I discovered:

# Check current controller settings
hpacucli ctrl all show config detail

# Recommended settings for virtualization:
hpacucli ctrl slot=0 modify drivespeed=255
hpacucli ctrl slot=0 modify cacheratio=80/20
hpacucli ctrl slot=0 modify stripe=256
hpacucli ctrl slot=0 modify forcedwriteback=on

After fixing the hardware layer, these ESXi parameters made significant improvements:

# Add to /etc/vmware/config:
Disk.DiskMaxIOSize = "4096"
Disk.SchedNumReqOutstanding = "64"
Mem.AllocGuestLargePage = "1"

# VMX file additions for Windows VMs:
scsi0:0.virtualSSD = 1
scsi0:0.throughputCap = "off"

Using fio for performance validation:

# Before fixes (4k random writes):
Jobs: 1 (f=1): [w(1)][100.0%][w=1024KiB/s][w=256 IOPS][eta 00m:00s]

# After optimization:
Jobs: 1 (f=1): [w(1)][100.0%][w=11.3MiB/s][w=2894 IOPS][eta 00m:00s]
  • The P410i requires manual cache tuning for virtualization
  • ESXi 4.1 has known issues with Opteron 6100 series NUMA
  • Always verify drive firmware matches HP's HCL
  • Consider adding BBWC module if write performance is critical

html

When running VMware ESXi 4.1 on this hardware configuration, even single-VM deployments exhibit pathological latency. Benchmark results show:

# Sample fio test results (4k random reads):
SAS RAID1: ~120 IOPS (expected: 1000+ for 15k RPM)
SATA RAID1: ~25 IOPS (expected: 150-200)

The P410i has several architectural quirks that impact virtualization:

  • Default 256MB cache without battery backup forces write-through mode
  • Queue depth limitations (HPQL=32) conflict with VMware's default disk.DiskMaxIOSize

Add these parameters to VMX files for affected VMs:

scsi0:0.throughputCap = "off"
scsi0:0.queues = "8"
disk.EnableUUID = "TRUE"
sched.mem.pshare.enable = "FALSE"

SSH into ESXi host and check controller status:

# Get controller cache policy
esxcli storage core device list | grep -A5 'P410i'
# Expected output should show:
   Cache Policy: WriteBack (if battery present)
   Drive Cache: Disabled

Run SMART tests directly on disks (requires maintenance mode):

# List disk devices
ls -l /vmfs/devices/disks/
# Check individual disks
smartctl -a /vmfs/devices/disks/naa.600508b1001c* -d cciss,0

For critical VMs, consider these architectural changes:

# Example: Create RDM for high-performance needs
vmkfstools -z /vmfs/devices/disks/naa.600508b1001c* /vmfs/volumes/datastore1/rdm.vmdk
# Then add to VMX:
scsi0:1.deviceType = "scsi-hardDisk"
scsi0:1.fileName = "/vmfs/volumes/datastore1/rdm.vmdk"