Optimal RAID 5 Configuration for ESX/vSphere: When to Split Volumes on HP 2012i SAN


6 views

RAID 5 configurations with large disk counts (especially beyond 6-7 disks) become problematic for ESX/vSphere workloads due to:

  • Extended rebuild times (your 7-disk array took days just for expansion)
  • Degraded performance during rebuilds
  • Increased risk of second disk failure during rebuild

Here's a simple PowerShell snippet to monitor disk latency - a key metric for VM performance:

Get-Counter -Counter "\PhysicalDisk(*)\Avg. Disk sec/Read" -SampleInterval 5 -Continuous | 
Where-Object {$_.CounterSamples.CookedValue -gt 0.02}

Values consistently above 20ms indicate storage performance issues.

For your HP 2012i SAN with potential for 3 additional drives, consider splitting when:

  • Current volume exceeds 6 disks (you're at 7)
  • Rebuild times exceed 24 hours
  • You observe write performance degradation

Instead of one 7+1 RAID 5, consider:

# Proposed configuration for 10 disks (7 existing + 3 new)
2 x RAID 5 (4+1) with 1 global hot spare
OR
1 x RAID 5 (4+1) + 1 x RAID 6 (5+1)

For live migration without downtime, use Storage vMotion. Example CLI command:

Move-VM -VM "VM_Name" -Destination (Get-Datastore "New_Volume") -Datastore (Get-Datastore "New_Volume")

Create a scheduled task to monitor your RAID status:

$raidStatus = Get-WmiObject -Class "MSStorageDriver_FailurePredictStatus" -Namespace "root\wmi"
if ($raidStatus.PredictFailure -eq $true) {
    Send-MailMessage -To "admin@example.com" -Subject "RAID Failure Predicted" -Body "Immediate action required"
}
  • Keep VMDK files under 1TB for easier management
  • Distribute high-I/O VMs across multiple volumes
  • Consider RAID 10 for write-intensive workloads

When working with HP 2012i SANs in VMware environments, I've observed that RAID 5 configurations with 7 disks (plus 1 hot spare) present unique challenges. The multi-day expansion process from 5 to 7 300GB SAS drives suggests we're pushing the boundaries of what's practical for production ESXi workloads.

The rebuild duration isn't just an inconvenience - it's a direct risk measurement. Consider these typical rebuild times:


# Pseudo-code for rebuild time estimation
def estimate_rebuild_time(disk_count, disk_size_gb, controller_speed_mbps):
    raw_data = disk_size_gb * 1024  # Convert to MB
    effective_speed = controller_speed_mbps * 0.8  # Account for overhead
    hours = raw_data / (effective_speed * 3600) * disk_count
    return hours

# Example calculation for 7-disk array
print(estimate_rebuild_time(7, 300, 600))  # ~46 hours theoretical minimum

These indicators suggest you should consider multiple smaller RAID 5 groups:

  • Write performance degradation: When latency exceeds 20ms during normal operations
  • Rebuild windows: If projected rebuild time exceeds your acceptable downtime
  • Capacity utilization: When expansion requires adding more than 2 disks at once

For ESXi environments, I recommend these PowerCLI snippets to monitor performance:


# Monitor datastore latency
Get-Stat -Entity (Get-Datastore) -Stat "disk.totalLatency.average" -Realtime -MaxSamples 10 |
Sort-Object -Property Value -Descending |
Format-Table Entity, Value -AutoSize

# Check for storage queue depth issues
esxcli storage core device list | grep -i "queue"

Based on field experience, the optimal configuration balances risk and performance:

Disks Usable Capacity Risk Factor
4+1 900GB Low
5+1 1.2TB Moderate
6+1 1.5TB High
7+1 1.8TB Very High

When you add those 3 additional drives, consider creating two 5-disk RAID 5 groups (4+1) rather than one 10-disk monstrosity. The performance characteristics will be far more predictable.