When running ls -alh
on my RAID 5 mounted directory, I encountered:
ls: cannot access e6eacc985fea729b2d5bc74078632738: Input/output error ls: cannot access 257ad35ee0b12a714530c30dccf9210f: Input/output error total 0 drwxr-xr-x 5 root root 123 2009-08-19 16:33 . drwxr-xr-x 3 root root 16 2009-08-14 17:15 .. ?????????? ? ? ? ? ? 257ad35ee0b12a714530c30dccf9210f drwxr-xr-x 3 root root 57 2009-08-19 16:58 9c89a78e93ae6738e01136db9153361b ?????????? ? ? ? ? ? e6eacc985fea729b2d5bc74078632738
The key characteristics of this issue:
- Files/directories appear with question marks (??????????) instead of proper permissions
- I/O errors when trying to access these corrupted entries
- Unmounting fails with "device busy" error
- Rebooting temporarily resolves but RAID errors appear during shutdown
The affected arrays were configured with:
mkfs.xfs -l size=128m -d agcount=32 mount -t xfs -o noatime,logbufs=8
After extensive testing, several potential factors emerged:
- Missing Partition Table: The disks were used raw without partitioning
- XFS Filesystem Issues: The combination of XFS parameters might cause problems
- RAID Degradation: Possible disk failure or sync issues in the RAID 5 array
Here's what worked for me:
# Check RAID status cat /proc/mdstat mdadm --detail /dev/md0 # Force a filesystem check (unmount first) umount /mnt/raid1 xfs_repair /dev/md0 # Alternative check if normal repair fails xfs_repair -L /dev/md0 # WARNING: This destroys the log
To avoid recurrence:
# Better mount options for XFS mount -t xfs -o noatime,nobarrier,logbufs=8,logbsize=256k /dev/md0 /mnt/raid1 # Regular RAID checks echo 'CHECK' > /sys/block/md0/md/sync_action # Monitoring script example #!/bin/bash RAID_STATUS=$(cat /proc/mdstat | grep -o "_U") if [ "$RAID_STATUS" != "UUU" ]; then echo "RAID degradation detected!" | mail -s "RAID Alert" admin@example.com fi
After multiple occurrences, I eventually:
- Backed up all data
- Recreated the arrays with proper partitioning
- Switched to RAID 6 for better fault tolerance
- Implemented regular filesystem checks via cron
- Always partition disks before creating RAID arrays
- Monitor RAID health status regularly
- Consider using RAID 6 instead of RAID 5 for better reliability
- XFS requires proper mount options for optimal performance
When running ls -alh
on my RAID 5 mounted directory, the output shows disturbing question marks and I/O errors:
jason@box2:/mnt/raid1/cra$ ls -alh ls: cannot access e6eacc985fea729b2d5bc74078632738: Input/output error ls: cannot access 257ad35ee0b12a714530c30dccf9210f: Input/output error total 0 drwxr-xr-x 5 root root 123 2009-08-19 16:33 . drwxr-xr-x 3 root root 16 2009-08-14 17:15 .. ?????????? ? ? ? ? ? 257ad35ee0b12a714530c30dccf9210f drwxr-xr-x 3 root root 57 2009-08-19 16:58 9c89a78e93ae6738e01136db9153361b ?????????? ? ? ? ? ? e6eacc985fea729b2d5bc74078632738
These question marks indicate the filesystem cannot read metadata for those entries. The I/O errors confirm hardware-level read failures. In XFS filesystems, this typically means:
- Corrupted inodes or directory entries
- Failing disks in the RAID array
- Memory corruption during writes
- RAID controller issues
Before attempting repairs, gather critical information:
# Check RAID status cat /proc/mdstat mdadm --detail /dev/mdX # Check XFS health xfs_repair -n /dev/mdX # Check disk SMART status smartctl -a /dev/sdX
For XFS filesystems configured with mkfs.xfs -l size=128m -d agcount=32
, we need specialized repair:
# Unmount the filesystem first umount /mnt/raid1 # Run repair with large log buffers xfs_repair -L /dev/mdX # For severe corruption, try: xfs_repair -v -l size=131072k -d agcount=32 /dev/mdX
Since rebooting temporarily "fixed" the issue, we likely have a failing disk:
# Check for failed disks mdadm --detail /dev/mdX | grep -i failed # Re-add any failed disks mdadm /dev/mdX --re-add /dev/sdX1 # Force a resync if needed echo repair > /sys/block/mdX/md/sync_action
For production systems, implement these safeguards:
# Add to /etc/smartd.conf DEVICESCAN -a -o on -S on -n standby,8 -W 4,35,40 -m root@localhost # Weekly XFS checks echo "/usr/sbin/xfs_db -c 'check -n' /dev/mdX" | at 2am Sunday
Consider these more resilient mount options:
# In /etc/fstab /dev/mdX /mnt/raid1 xfs noatime,logbufs=8,nobarrier,errors=remount-ro 0 0
The nobarrier
option may help with certain RAID controller issues, though test thoroughly before production use.
For unrecoverable corruption, you may need to:
# Backup what you can xfsdump -l 0 - /mnt/raid1 | xfsrestore - /mnt/temp_backup # Recreate the filesystem with better alignment mkfs.xfs -f -l size=128m -d agcount=32 -s size=4096 -b size=4096 /dev/mdX