How to Safely Replace a Failing SAS Drive in HP Smart Array P400 RAID 1+0 Configuration Using hpacucli Commands


2 views

From the provided hpacucli output, we can see this is an HP Smart Array P400 controller managing a RAID 1+0 array with:

Logical Drive: 1
Size: 273.3 GB
Fault Tolerance: RAID 1+0
Status: OK

The array consists of two mirror groups with 4 physical drives each (total 8 drives). The failing drive is located at 1I:1:8 (port 1I:box 1:bay 8) showing "Predictive Failure".

Before proceeding with drive replacement:

  • Verify array status is "OK" (as shown in output)
  • Confirm you have a compatible replacement drive
  • Ensure you have recent backups
  • Check controller battery/cache status (if applicable)

Your proposed command sequence is correct. Here's the detailed process:

1. Remove the failing drive from array

# Remove drive from array configuration
hpacucli controller slot=1 array A remove drives=1:8

# Verify the drive is now marked as 'Failed' in array status
hpacucli controller slot=1 ld 1 show detail

2. Physically identify the drive

# Turn on locate LED for the failing drive
hpacucli controller slot=1 pd 1:8 modify led=on

# Have datacenter staff remove the drive
# Wait for confirmation the drive has been removed

3. Insert replacement drive

# Verify new drive is detected
hpacucli controller slot=1 pd all show status

# Add the new drive to the array
hpacucli controller slot=1 array A add drives=1:8

4. Monitor rebuild progress

# Check rebuild status (will show "Rebuilding" during process)
hpacucli controller slot=1 ld 1 show detail

# Monitor progress (percentage complete will increment)
hpacucli controller slot=1 show config detail

For this RAID 1+0 configuration:

  • Rebuild will only affect the specific mirror group (Group 0 in this case)
  • The array remains operational during rebuild
  • Rebuild time depends on drive size and system load (typically several hours)
  • Consider setting temporary higher rebuild priority if needed:
    hpacucli controller slot=1 modify rebuildpriority=high
    

After rebuild completes:

# Verify all drives show "OK" status
hpacucli controller slot=1 pd all show status

# Check logical drive status
hpacucli controller slot=1 ld all show status

# Confirm no remaining predictive failures
hpacucli controller slot=1 pd all show detail | grep -i predictive

If issues occur during replacement:

  • New drive not detected? Check physical connection and compatibility
  • Rebuild not starting? Verify array has enough drives for redundancy
  • Slow rebuild? Consider increasing rebuild priority temporarily
  • Persistent errors? Check controller logs:
    hpacucli controller slot=1 show logs
    

When dealing with predictive failure in HP Smart Array configurations, first verify the exact physical drive location using:

hpacucli controller slot=1 ld 1 show detail

In our case, the output shows:

Mirror Group 0:
   physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 72 GB, Predictive Failure)
  • Confirm the array status is "OK" (should show redundant operation)
  • Verify no other drives show predictive failure or offline status
  • Check controller battery/cache status (if applicable)
  • Ensure you have recent backups

1. Remove the failing drive from array

hpacucli controller slot=1 array A remove drives=1I:1:8

2. Activate locate LED for physical identification

hpacucli controller slot=1 pd 1I:1:8 modify led=on

Note: The bay numbering in CLI (1:8) corresponds to physical 1I:1:8 in output

3. Physical replacement by datacenter staff

After drive removal, have technicians:

  1. Physically pull the blinking drive
  2. Insert replacement SAS drive (same or larger capacity)
  3. Confirm drive is seated properly

4. Rebuild process initiation

hpacucli controller slot=1 array A add drives=1I:1:8

Monitor rebuild progress with:

hpacucli controller slot=1 ld 1 show rebuild

For detailed status:

hpacucli controller slot=1 ld 1 show detail
  • Rebuild time varies by array size (estimate 1-4 hours per TB)
  • Avoid array operations during rebuild
  • The system remains operational during rebuild (in redundant configurations)
  • For RAID 1+0, only one drive per mirror group should be replaced at a time

For frequent replacements, consider this bash script template:

#!/bin/bash
CONTROLLER=1
ARRAY="A"
FAILING_DRIVE="1I:1:8"

# Safety checks
if ! hpacucli controller slot=$CONTROLLER ld all show status | grep -q "OK"; then
    echo "CRITICAL: Array not in optimal state!" >&2
    exit 1
fi

# Begin replacement
hpacucli controller slot=$CONTROLLER array $ARRAY remove drives=$FAILING_DRIVE
hpacucli controller slot=$CONTROLLER pd $FAILING_DRIVE modify led=on

echo "Physically replace drive in bay ${FAILING_DRIVE##*:} then press enter"
read -r

hpacucli controller slot=$CONTROLLER array $ARRAY add drives=$FAILING_DRIVE
echo "Rebuild initiated. Monitor with: hpacucli controller slot=$CONTROLLER ld all show rebuild"