When an SD card or USB drive hosting VMware ESXi fails, you'll typically encounter symptoms like:
Lost connectivity to the device backing the boot filesystem
Embedded Flash/SD-CARD: Error writing media [X], physical block [Y]: Stack Exception
Host configuration changes not persisting
This isn't just theoretical - I've personally experienced three such failures in production environments. The HP ProLiant DL380p Gen8's ILO logs are particularly good at flagging these issues early.
For a vSphere cluster with SAN storage (the ideal scenario):
- Verify VM availability (they should continue running)
- Check vCenter for host connection status
- Review ILO/iDRAC logs for storage errors
- Document current host configuration
Here's the step-by-step recovery process I've standardized:
# First, collect diagnostic data before rebooting
esxcli system syslog config get
esxcli hardware memory get
esxcli storage core device list
Then proceed with:
- Place host in maintenance mode
- Power down gracefully
- Replace failed SD card/USB
- Reinstall ESXi using same version
After fresh ESXi install, use PowerCLI to restore settings:
Connect-VIServer -Server vcenter.example.com
$host = Get-VMHost -Name "esxi-host-01"
# Restore network config
Get-VMHostNetwork -VMHost $host | Set-VMHostNetwork -HostName "esxi-host-01" -Domain "corp.local"
# Reapply storage configuration
$hba = Get-VMHostHba -VMHost $host -Type FibreChannel
$hba | Set-VMHostHba -ScanPolicy "All"
Implement these best practices:
- Use enterprise-grade SD cards (avoid consumer-grade)
- Enable persistent logging to shared storage
- Regularly back up host configurations
- Monitor SD card health via SNMP
Example SNMP monitoring configuration:
# Configure ESXi SNMP for hardware monitoring
esxcli system snmp set --communities "monitoring" --enable true
esxcli system snmp set --targets "snmp.example.com@162/monitoring"
For critical environments, consider:
- Booting from SAN (FC/iSCSI)
- Using SATADOM devices
- Implementing Auto Deploy with stateless caching
Here's a sample Auto Deploy rule:
New-DeployRule -Name "ESXi-7.0-Cluster" -Item (Get-DeployImageProfile "ESXi-7.0.0-xxxxxx-standard") -Pattern "vendor=HP,model=ProLiant DL380p Gen8" -Cluster "Production-Cluster"
When your ESXi host's boot device (USB/SD card) fails, you'll typically encounter symptoms like:
Lost connectivity to the device backing the boot filesystem
Embedded Flash/SD-CARD: Error writing media 0, physical block XXXX: Stack Exception
Host configuration changes not persisting after reboot
For a vSphere cluster with SAN storage, follow this recovery workflow:
# Step 1: Put host in maintenance mode
esxcli system maintenanceMode set --enable true
# Step 2: Verify storage connectivity
esxcli storage core adapter list
esxcli storage core path list
# Step 3: Temporarily preserve configuration
vim-cmd hostsvc/firmware/sync_config
vim-cmd hostsvc/firmware/backup_config
Option A: Replace the failed media
# Create new boot media using VMware CLI
vmkfstools -i /dev/sdX /vmfs/volumes/datastore1/esxi-install.vmdk
Option B: Migrate to more reliable boot options
- Dual SD cards in RAID 1 (for supported servers)
- Booting from SAN LUN
- Internal M.2 SSD with write endurance rating
Create a cron job to regularly backup your ESXi configuration:
# Add to /etc/rc.local.d/local.sh
/bin/auto-backup.sh &
# Script content (/bin/auto-backup.sh)
#!/bin/sh
while true; do
vim-cmd hostsvc/firmware/backup_config
sleep 86400
done
Implement these ESXCLI commands to monitor boot device health:
# Check device wear level
esxcli storage core device smart get -d t10.ATA_____Samsung_SSD_860_PRO_1TB_______________
# Monitor IO errors
esxcli system syslog config get
esxcli system syslog mark --message="Boot device health check"