RAID 5 configurations with large disk counts (especially beyond 6-7 disks) become problematic for ESX/vSphere workloads due to:
- Extended rebuild times (your 7-disk array took days just for expansion)
- Degraded performance during rebuilds
- Increased risk of second disk failure during rebuild
Here's a simple PowerShell snippet to monitor disk latency - a key metric for VM performance:
Get-Counter -Counter "\PhysicalDisk(*)\Avg. Disk sec/Read" -SampleInterval 5 -Continuous | Where-Object {$_.CounterSamples.CookedValue -gt 0.02}
Values consistently above 20ms indicate storage performance issues.
For your HP 2012i SAN with potential for 3 additional drives, consider splitting when:
- Current volume exceeds 6 disks (you're at 7)
- Rebuild times exceed 24 hours
- You observe write performance degradation
Instead of one 7+1 RAID 5, consider:
# Proposed configuration for 10 disks (7 existing + 3 new) 2 x RAID 5 (4+1) with 1 global hot spare OR 1 x RAID 5 (4+1) + 1 x RAID 6 (5+1)
For live migration without downtime, use Storage vMotion. Example CLI command:
Move-VM -VM "VM_Name" -Destination (Get-Datastore "New_Volume") -Datastore (Get-Datastore "New_Volume")
Create a scheduled task to monitor your RAID status:
$raidStatus = Get-WmiObject -Class "MSStorageDriver_FailurePredictStatus" -Namespace "root\wmi" if ($raidStatus.PredictFailure -eq $true) { Send-MailMessage -To "admin@example.com" -Subject "RAID Failure Predicted" -Body "Immediate action required" }
- Keep VMDK files under 1TB for easier management
- Distribute high-I/O VMs across multiple volumes
- Consider RAID 10 for write-intensive workloads
When working with HP 2012i SANs in VMware environments, I've observed that RAID 5 configurations with 7 disks (plus 1 hot spare) present unique challenges. The multi-day expansion process from 5 to 7 300GB SAS drives suggests we're pushing the boundaries of what's practical for production ESXi workloads.
The rebuild duration isn't just an inconvenience - it's a direct risk measurement. Consider these typical rebuild times:
# Pseudo-code for rebuild time estimation
def estimate_rebuild_time(disk_count, disk_size_gb, controller_speed_mbps):
raw_data = disk_size_gb * 1024 # Convert to MB
effective_speed = controller_speed_mbps * 0.8 # Account for overhead
hours = raw_data / (effective_speed * 3600) * disk_count
return hours
# Example calculation for 7-disk array
print(estimate_rebuild_time(7, 300, 600)) # ~46 hours theoretical minimum
These indicators suggest you should consider multiple smaller RAID 5 groups:
- Write performance degradation: When latency exceeds 20ms during normal operations
- Rebuild windows: If projected rebuild time exceeds your acceptable downtime
- Capacity utilization: When expansion requires adding more than 2 disks at once
For ESXi environments, I recommend these PowerCLI snippets to monitor performance:
# Monitor datastore latency
Get-Stat -Entity (Get-Datastore) -Stat "disk.totalLatency.average" -Realtime -MaxSamples 10 |
Sort-Object -Property Value -Descending |
Format-Table Entity, Value -AutoSize
# Check for storage queue depth issues
esxcli storage core device list | grep -i "queue"
Based on field experience, the optimal configuration balances risk and performance:
Disks | Usable Capacity | Risk Factor |
---|---|---|
4+1 | 900GB | Low |
5+1 | 1.2TB | Moderate |
6+1 | 1.5TB | High |
7+1 | 1.8TB | Very High |
When you add those 3 additional drives, consider creating two 5-disk RAID 5 groups (4+1) rather than one 10-disk monstrosity. The performance characteristics will be far more predictable.