Troubleshooting RAID 5 I/O Errors and Question Marks in Linux Directory Listings


2 views

When running ls -alh on my RAID 5 mounted directory, I encountered:

ls: cannot access e6eacc985fea729b2d5bc74078632738: Input/output error
ls: cannot access 257ad35ee0b12a714530c30dccf9210f: Input/output error
total 0
drwxr-xr-x 5 root root 123 2009-08-19 16:33 .
drwxr-xr-x 3 root root  16 2009-08-14 17:15 ..
?????????? ? ?    ?      ?                ? 257ad35ee0b12a714530c30dccf9210f
drwxr-xr-x 3 root root  57 2009-08-19 16:58 9c89a78e93ae6738e01136db9153361b
?????????? ? ?    ?      ?                ? e6eacc985fea729b2d5bc74078632738

The key characteristics of this issue:

  • Files/directories appear with question marks (??????????) instead of proper permissions
  • I/O errors when trying to access these corrupted entries
  • Unmounting fails with "device busy" error
  • Rebooting temporarily resolves but RAID errors appear during shutdown

The affected arrays were configured with:

mkfs.xfs -l size=128m -d agcount=32
mount -t xfs -o noatime,logbufs=8

After extensive testing, several potential factors emerged:

  1. Missing Partition Table: The disks were used raw without partitioning
  2. XFS Filesystem Issues: The combination of XFS parameters might cause problems
  3. RAID Degradation: Possible disk failure or sync issues in the RAID 5 array

Here's what worked for me:

# Check RAID status
cat /proc/mdstat
mdadm --detail /dev/md0

# Force a filesystem check (unmount first)
umount /mnt/raid1
xfs_repair /dev/md0

# Alternative check if normal repair fails
xfs_repair -L /dev/md0  # WARNING: This destroys the log

To avoid recurrence:

# Better mount options for XFS
mount -t xfs -o noatime,nobarrier,logbufs=8,logbsize=256k /dev/md0 /mnt/raid1

# Regular RAID checks
echo 'CHECK' > /sys/block/md0/md/sync_action

# Monitoring script example
#!/bin/bash
RAID_STATUS=$(cat /proc/mdstat | grep -o "_U")
if [ "$RAID_STATUS" != "UUU" ]; then
    echo "RAID degradation detected!" | mail -s "RAID Alert" admin@example.com
fi

After multiple occurrences, I eventually:

  1. Backed up all data
  2. Recreated the arrays with proper partitioning
  3. Switched to RAID 6 for better fault tolerance
  4. Implemented regular filesystem checks via cron
  • Always partition disks before creating RAID arrays
  • Monitor RAID health status regularly
  • Consider using RAID 6 instead of RAID 5 for better reliability
  • XFS requires proper mount options for optimal performance

When running ls -alh on my RAID 5 mounted directory, the output shows disturbing question marks and I/O errors:

jason@box2:/mnt/raid1/cra$ ls -alh
ls: cannot access e6eacc985fea729b2d5bc74078632738: Input/output error
ls: cannot access 257ad35ee0b12a714530c30dccf9210f: Input/output error
total 0
drwxr-xr-x 5 root root 123 2009-08-19 16:33 .
drwxr-xr-x 3 root root  16 2009-08-14 17:15 ..
?????????? ? ?    ?      ?                ? 257ad35ee0b12a714530c30dccf9210f
drwxr-xr-x 3 root root  57 2009-08-19 16:58 9c89a78e93ae6738e01136db9153361b
?????????? ? ?    ?      ?                ? e6eacc985fea729b2d5bc74078632738

These question marks indicate the filesystem cannot read metadata for those entries. The I/O errors confirm hardware-level read failures. In XFS filesystems, this typically means:

  • Corrupted inodes or directory entries
  • Failing disks in the RAID array
  • Memory corruption during writes
  • RAID controller issues

Before attempting repairs, gather critical information:

# Check RAID status
cat /proc/mdstat
mdadm --detail /dev/mdX

# Check XFS health
xfs_repair -n /dev/mdX

# Check disk SMART status
smartctl -a /dev/sdX

For XFS filesystems configured with mkfs.xfs -l size=128m -d agcount=32, we need specialized repair:

# Unmount the filesystem first
umount /mnt/raid1

# Run repair with large log buffers
xfs_repair -L /dev/mdX

# For severe corruption, try:
xfs_repair -v -l size=131072k -d agcount=32 /dev/mdX

Since rebooting temporarily "fixed" the issue, we likely have a failing disk:

# Check for failed disks
mdadm --detail /dev/mdX | grep -i failed

# Re-add any failed disks
mdadm /dev/mdX --re-add /dev/sdX1

# Force a resync if needed
echo repair > /sys/block/mdX/md/sync_action

For production systems, implement these safeguards:

# Add to /etc/smartd.conf
DEVICESCAN -a -o on -S on -n standby,8 -W 4,35,40 -m root@localhost

# Weekly XFS checks
echo "/usr/sbin/xfs_db -c 'check -n' /dev/mdX" | at 2am Sunday

Consider these more resilient mount options:

# In /etc/fstab
/dev/mdX  /mnt/raid1  xfs  noatime,logbufs=8,nobarrier,errors=remount-ro  0  0

The nobarrier option may help with certain RAID controller issues, though test thoroughly before production use.

For unrecoverable corruption, you may need to:

# Backup what you can
xfsdump -l 0 - /mnt/raid1 | xfsrestore - /mnt/temp_backup

# Recreate the filesystem with better alignment
mkfs.xfs -f -l size=128m -d agcount=32 -s size=4096 -b size=4096 /dev/mdX