How to Configure a Hot Spare Disk in Linux Software RAID 1 (mdadm) for Automatic Failover


4 views

In a software RAID 1 configuration with mdadm, a hot spare is an inactive disk that automatically replaces a failed member of the array. The key characteristics:

  • Remains idle until a failure occurs
  • Requires no manual intervention for failover
  • Must be equal to or larger than existing array members
  • Automatically synchronizes with working disks when activated

Before proceeding, ensure:

# Verify current RAID status
cat /proc/mdstat
mdadm --detail /dev/md0

# Check disk space (new disk should be ≥ existing members)
lsblk -o NAME,SIZE,ROTA

Assuming your existing array is /dev/md0 with three disks and you're adding /dev/sdd:

# 1. Prepare the new disk (if needed)
parted /dev/sdd mklabel gpt
parted /dev/sdd mkpart primary 1MiB 100%

# 2. Create RAID superblock (if not pre-partitioned)
mdadm --zero-superblock /dev/sdd1

# 3. Add as spare to existing array
mdadm --add /dev/md0 /dev/sdd1

# 4. Verify spare status
mdadm --detail /dev/md0 | grep -A5 'Spare Devices'

After adding the spare, monitor its status:

watch cat /proc/mdstat

# Check detailed array status
mdadm --detail /dev/md0 | grep -E 'State|Spare'

To simulate a disk failure (for testing only):

# Mark disk as faulty
mdadm --manage /dev/md0 --fail /dev/sda1

# Verify automatic replacement
watch -n 1 'mdadm --detail /dev/md0 | grep -A10 "Number"'
  • Add to /etc/mdadm.conf to persist after reboot:
  • ARRAY /dev/md0 metadata=1.2 spares=1 name=myserver:0 UUID=xxxxxxx
  • Monitor email alerts by configuring mdadm.conf mail options
  • Consider adding multiple spares for critical systems

While the spare is idle:

  • No performance impact on active array
  • During resync, expect performance degradation
  • Monitor with: iostat -x 1

If the spare doesn't activate:

  1. Verify spare is properly added to array
  2. Check kernel logs: dmesg | grep md
  3. Ensure mdadm daemon is running

When maintaining a software RAID1 array with three active disks on CentOS 7, adding a hot spare provides automatic failover protection. The hot spare remains inactive until a disk failure occurs, at which point it automatically rebuilds the array using data from the remaining healthy disks.

Before proceeding, ensure:

  • The new disk is properly connected and recognized by the system
  • The disk is at least as large as the smallest disk in the array
  • You have root privileges on the CentOS 7 system
  • Backup of important data exists (recommended)

First, identify the new disk:

lsblk
fdisk -l

Assume the new disk is /dev/sdd and the existing RAID is /dev/md0. Prepare the disk as a hot spare:

mdadm --add /dev/md0 /dev/sdd --spare=1

Check the RAID status to confirm the hot spare is properly configured:

mdadm --detail /dev/md0

You should see output similar to:

Number   Major   Minor   RaidDevice State
   0       8        0        0      active sync   /dev/sda
   1       8       16        1      active sync   /dev/sdb
   2       8       32        2      active sync   /dev/sdc
   3       8       48        -      spare          /dev/sdd

To simulate a disk failure (for testing purposes only):

mdadm --manage /dev/md0 --fail /dev/sda

Monitor the rebuild process:

watch -n 1 cat /proc/mdstat

Configure email alerts for RAID events by editing /etc/mdadm.conf:

MAILADDR admin@example.com
PROGRAM /usr/local/bin/raid-alert

Then update the initramfs:

dracut -f

Remember that after a failover:

  • The failed disk should be replaced with a new spare
  • Rebuild operations are resource-intensive
  • Monitor disk health with SMART tools

Regularly verify your RAID status with:

mdadm --detail --scan >> /etc/mdadm.conf