How to Diagnose and Fix a Degraded RAID1 Array (/dev/md1) on Linux


4 views

When you receive a "DegradedArray" notification from mdadm regarding /dev/md1, it means one of your RAID1 mirrors has failed. In your case, the kernel logs show:

md1 : active raid1 sdb3[2](F) sda3[1]
      1860516800 blocks [2/1] [_U]

The _U status indicates only /dev/sda3 is active, while (F) marks /dev/sdb3 as failed.

Before attempting repairs, verify the physical drive status:

sudo smartctl -a /dev/sdb | grep -i error
sudo dmesg | grep -i sdb

Your kernel logs reveal UNC (Uncorrectable) errors, which typically indicate physical media issues:

Feb 23 14:55:19 triton1017 kernel: [24036613.378608] ata1.00: error: { UNC }

You followed the correct procedure to re-add the drive:

sudo mdadm --remove /dev/md1 /dev/sdb3
sudo mdadm --add /dev/md1 /dev/sdb3

However, the array remains degraded because the drive keeps failing during resync:

md1 : active raid1 sdb3[2](S) sda3[1]
      1860516800 blocks [2/1] [_U]

For production systems, I recommend:

  1. Immediate backup of critical data from /dev/md1
  2. Drive replacement procedure:
# Mark the drive as failed if not already
sudo mdadm --fail /dev/md1 /dev/sdb3

# Remove from array
sudo mdadm --remove /dev/md1 /dev/sdb3

# After physical replacement, add new drive
sudo mdadm --add /dev/md1 /dev/sdb3

Track rebuild progress with:

watch -n 10 cat /proc/mdstat

Or get detailed status:

sudo mdadm --detail /dev/md1 | grep -i recovery

Add these to your monitoring:

# Check array status
sudo mdadm --monitor --scan --daemonize

# SMART monitoring
sudo smartd --scan

Configure email alerts in /etc/mdadm/mdadm.conf:

MAILADDR your@email.com

When your Linux system emails you about a "DegradedArray event on md device /dev/md1", it indicates one of your RAID1 mirrors has failed. The key indicators are:

md1 : active raid1 sdb3[2](F) sda3[1]
      1860516800 blocks [2/1] [_U]

The [2/1] shows only 1 of 2 drives is active, and [_U] confirms /dev/sdb3 is marked as failed (F).

Kernel logs reveal media errors on the drive:

Feb 23 14:55:21 triton1017 kernel: [24036616.262531] ata1.00: failed command: READ FPDMA QUEUED
Feb 23 14:55:21 triton1017 kernel: [24036616.262540] res 41/40:80:38:5a:b4/00:00:75:00:00/00 Emask 0x409 (media error) <F>

This UNC (Uncorrectable Error) suggests physical sector failure. Check SMART status:

smartctl -a /dev/sdb | less
# Check for:
# - Reallocated_Sector_Ct
# - Current_Pending_Sector 
# - Offline_Uncorrectable

Before replacing hardware, attempt re-adding the drive:

# Remove failed device
sudo mdadm --remove /dev/md1 /dev/sdb3

# Re-add after checking connections
sudo mdadm --add /dev/md1 /dev/sdb3

# Monitor rebuild progress
watch -n 5 cat /proc/mdstat

If rebuild fails repeatedly, you'll see:

md1 : active raid1 sdb3[2](S) sda3[1]
      1860516800 blocks [2/1] [_U]

For persistent media errors:

# 1. Mark drive as failed if not auto-detected
sudo mdadm --fail /dev/md1 /dev/sdb3

# 2. Remove from array
sudo mdadm --remove /dev/md1 /dev/sdb3

# 3. Schedule physical replacement with DC
# 4. After replacement, partition new disk identically:

sudo sfdisk -d /dev/sda | sudo sfdisk /dev/sdb
sudo mdadm --add /dev/md1 /dev/sdb3

Add this to /etc/mdadm/mdadm.conf:

MAILADDR admin@yourdomain.com
ARRAY /dev/md1 metadata=0.90 UUID=ec02d5ce:8554d4ad:7792c71e:7dc17aa4

If both drives fail, force assemble read-only:

sudo mdadm --assemble --readonly /dev/md1 /dev/sda3