Creating filesystems on large RAID5 arrays shouldn't normally take 30+ minutes. Here's what we've observed with a 4-disk (2TB each) array using 64k stripe size:
# Initial array creation command mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sd[b-e] --chunk=64
Key symptoms that point to an underlying issue:
- Inode table writes show irregular patterns (fast then slow)
- Process termination hangs for ~30 seconds
- Individual disk performance is excellent (95-110MB/s via bonnie++)
First, let's check the current RAID parameters:
cat /proc/mdstat mdadm --detail /dev/md0
For our specific case (2.6.35 kernel), several factors could contribute:
- Stripe cache size: Default may be too small for initial mkfs operations
- Memory pressure: Check with
free -m
during mkfs - Write intent bitmap: Missing or misconfigured
Try these adjustments before the next mkfs attempt:
# Increase stripe cache (units are in pages, typically 4KB each) echo 8192 > /sys/block/md0/md/stripe_cache_size # Adjust readahead (try 512KB as starting point) blockdev --setra 512 /dev/md0
For XFS specifically, these mkfs parameters can help:
mkfs.xfs -f -d su=64k,sw=4 /dev/md0
For ext4 (modern alternative to ext3):
mkfs.ext4 -E stride=16,stripe-width=48 /dev/md0
When standard approaches fail, deeper investigation is needed:
# Monitor kernel messages during operation dmesg -w & # Check IO wait statistics iostat -x 1
Particularly watch for:
- High
await
values in iostat - Kernel messages about MD layer timeouts
- High system CPU usage during the operation
When initializing filesystems on our newly created 4-disk RAID5 array with 64k chunk size, we encounter unusually slow mkfs operations:
- XFS creation takes ~30 minutes (versus expected 2-3 minutes)
- ext3 shows erratic inode table writes - fast bursts followed by 2-second pauses
- Process termination (Ctrl+C) exhibits 30-second latency
Before blaming the RAID stack, let's confirm disk health with Bonnie++:
# Individual disk test bonnie++ -d /dev/sdX -s 8G -n 0 -m HOSTNAME # Parallel test (all disks) for i in {a..d}; do bonnie++ -d /dev/sd$i -s 4G -n 0 -m HOSTNAME_$i & done
Results showed consistent 95MB/s write and 110MB/s read speeds across all disks, even under parallel load.
The array was created with standard parameters:
mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sd{a,b,c,d} --chunk=64
Current /proc/mdstat shows:
Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0] 5860531200 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
We experimented with these sysctl adjustments without improvement:
echo 4096 > /sys/block/md0/md/stripe_cache_size blockdev --setra 65536 /dev/md0
Let's gather concrete performance metrics during mkfs operations:
# Monitor disk I/O during operation iostat -xmd 2 /dev/sd{a,b,c,d} /dev/md0 # Check MD layer events dmesg -TwH # Trace system calls strace -o mkfs.trace -Tttt mkfs.ext3 /dev/md0
The strace output reveals frequent fdatasync()
calls taking 1-2 seconds each, correlating with the observed pauses. This suggests metadata writeback synchronization overhead.
Workaround Solution: Force async operations during filesystem creation:
# For XFS mkfs.xfs -f -K /dev/md0 # For ext3/4 mkfs.ext4 -E lazy_itable_init=1,lazy_journal_init=1 /dev/md0
Add these kernel parameters to /etc/sysctl.conf:
vm.dirty_ratio = 20 vm.dirty_background_ratio = 10 vm.dirty_expire_centisecs = 3000
Then reload with sysctl -p
. This reduces aggressive writeback behavior during large metadata operations.
For production systems needing maximum reliability during creation:
dd if=/dev/zero of=/dev/md0 bs=1M count=100000 status=progress mkfs.xfs /dev/md0 # Now runs in normal time