RAID 1 Hot-Swap Safety: What Happens When You Remove a Disk from Live Array on HP ProLiant?


2 views

During server maintenance or testing scenarios, many sysadmins wonder about the real-world behavior of RAID 1 arrays when disks are removed. On HP ProLiant DL585 G7 servers with hardware RAID controllers, here's what actually happens:

# Check current RAID status (HP Smart Array CLI)
ssacli ctrl all show config detail

# Expected output for healthy RAID 1:
# Logical Drive: 1 (72.0 GB, RAID 1, OK)
#   physicaldrive 1I:1:1 (port 1I:box 1:bay 1, OK)
#   physicaldrive 1I:1:2 (port 1I:box 1:bay 2, OK)

When you pull one disk from an active RAID 1 mirror:

  • The controller logs a "physical drive removed" event
  • Array status changes to "degraded" but remains operational
  • No data loss occurs as all data exists on remaining disk
  • Automatic rebuild initiates when replacement disk is inserted

HP Smart Array controllers implement these protection mechanisms:

# Monitor controller events in real-time:
hpssacli ctrl slot=0 get events

# Sample alert you'd see after removal:
# 2023-11-15 14:23:17 [Critical] Physical Drive 1I:1:2 removed
# 2023-11-15 14:23:18 [Warning] Logical Drive 1 now degraded

When reinserting a disk or adding a replacement:

  1. Allow controller to recognize new hardware (30-60 sec)
  2. Verify disk shows as "unassigned" in array config
  3. Manually start rebuild if automatic doesn't trigger
# Manual rebuild command example:
ssacli ctrl slot=0 array A logicaldrive 1 modify reenable=forced

# Monitor rebuild progress:
ssacli ctrl slot=0 ld 1 show detail | grep -i rebuild
# Rebuild Status: 37% complete, estimated 82 minutes remaining

If the remaining disk fails during degraded state:

  • Immediate data loss occurs
  • Recovery requires professional data services
  • Always maintain backups even with RAID protection

For critical systems:

# Set up email alerts for RAID status changes:
hpssacli ctrl slot=0 modify alerts=enable
hpssacli ctrl slot=0 modify alertmail=admin@example.com
hpssacli ctrl slot=0 modify alertdelay=0

When working with HP ProLiant DL585 G7 servers (or any enterprise hardware), RAID 1 arrays present an interesting test case. Theoretically, you should be able to hot-swap drives, but practical experience often differs from documentation.

Here's what occurs at the kernel level when yanking a disk:

# dmesg output example after hot-removal:
[ 9823.461212] sd 2:0:1:0: [sdb] Synchronizing SCSI cache
[ 9823.461245] sd 2:0:1:0: [sdb] Stopping disk
[ 9823.482911] sd 2:0:1:0: [sdb] READ CAPACITY failed
[ 9823.482915] sd 2:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

The Smart Array controllers in ProLiants handle this differently than software RAID. The critical CLI commands to monitor status:

# Check current RAID status:
ssacli ctrl all show config detail

# Monitor rebuild progress:
ssacli ctrl slot=0 pd all show status

# Expected healthy output:
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK)

For production systems, implement this Python watchdog (requires hpssacli):

import subprocess
import time

def check_raid_status():
    result = subprocess.run(['ssacli', 'ctrl', 'all', 'show', 'config'],
                          stdout=subprocess.PIPE)
    return 'Failed' not in result.stdout.decode()

while True:
    if not check_raid_status():
        # Alert logic here
        print("RAID DEGRADED - Hot spare activation needed")
    time.sleep(300)  # Check every 5 minutes

Instead of brute-force removal:

  1. Mark disk as offline: ssacli ctrl slot=0 pd 1I:1:1 modify led=on
  2. Wait for activity LED to stop blinking
  3. Physically remove drive
  4. Verify array status: ssacli ctrl all show config

If the array becomes degraded:

# Initiate rebuild on replacement drive:
ssacli ctrl slot=0 array A modify drives=1I:1:1,1I:1:2 rebuild

# Monitor progress:
watch -n 60 'ssacli ctrl slot=0 pd all show status'

Remember that RAID 1 only protects against physical disk failure - not controller failure, accidental deletion, or corruption.