VMware ESXi implements a failsafe boot mechanism through its dual-partition structure. The /bootbank
partition contains the active hypervisor image, while /altbootbank
serves as a fallback copy. This redundancy is implemented at the filesystem level using the vmkfstools utility.
During system initialization, the bootloader follows this sequence:
1. Attempt to load from /bootbank
2. If checksum validation fails, switch to /altbootbank
3. If both partitions fail, enter recovery mode
The selection happens before the kernel loads, making the process transparent to users.
The system updates /altbootbank
in these scenarios:
- During ESXi patches/upgrades (auto-sync)
- After successful boot from /altbootbank (prompts for repair)
- Manual intervention via CLI commands
Example CLI command to force synchronization:
vim-cmd hostsvc/maintenance_mode_enter
/usr/lib/vmware/esxcli/bin/esxcli system settings advanced set -o /Misc/AlternateBootBank -i 1
reboot
When booting from the alternate partition, ESXi logs this event in three locations:
/var/log/vmkernel.log
/var/log/boot.gz
dmesg output
You'll see entries like:
WARNING: LINUXBOOT: Booting from alternate bank
NOTICE: BOOTBANK: Primary bank checksum failure (0xbadc0de)
To simulate a boot failure scenario:
# Corrupt primary bank (simulating filesystem error)
dd if=/dev/zero of=/bootbank/vmkernel.gz bs=1k count=100
sync
reboot
The system should automatically fail over to /altbootbank
and generate a purple diagnostic screen with the error code APD%20BOOTBANK_CORRUPT
.
1. Regularly check partition health:
esxcli system boot device get
2. Verify synchronization status:
vsish -e get /system/bootMode
3. Monitor for auto-repair attempts in:
grep -i bootbank /var/log/syslog.log
VMware ESXi employs a dual-bank boot system where /bootbank
serves as the primary boot partition while /altbootbank
acts as a failover. This redundancy mechanism ensures system availability even when the primary boot partition becomes corrupted.
The system automatically switches to /altbootbank
under these conditions:
- CRC checksum validation failure in
/bootbank
- Boot loader cannot locate or read the primary partition
- Kernel panic during boot from primary partition
When failover occurs:
# Check current boot partition
esxcli system boot partition get
# Example output:
# Current: altbootbank
# Next Boot: altbootbank
# Active: True
The /altbootbank
isn't static. It gets updated during:
- ESXi patches and upgrades
- Successful boots (after 3 consecutive successful boots from primary)
- Manual sync operations
To force synchronization:
# Sync boot partitions
/sbin/auto-backup.sh
Consider this common troubleshooting sequence when primary boot fails:
# 1. Verify boot attempt history
vim-cmd hostsvc/hosthardware | grep boot
# 2. Compare partition contents (if system boots)
diff -r /bootbank/ /altbootbank/
# 3. Manual recovery if automatic fails
esxcli system boot partition set -p altbootbank
esxcli system shutdown reboot -r "Boot partition recovery"
Implement these checks in your automation scripts:
#!/bin/sh
# Check boot partition health
BOOT_STATUS=$(esxcli system boot partition get | grep Active | awk '{print $2}')
if [ "$BOOT_STATUS" = "False" ]; then
logger -p user.warn "ESXi booting from alternate partition"
# Trigger alerting system here
fi
Regular maintenance should include:
- Monthly partition checksum verification
- Pre-upgrade partition backups
- Post-upgrade partition synchronization