Optimizing Linux RAID Performance: Understanding and Tuning stripe_cache_size in mdadm


3 views

When working with Linux software RAID (mdadm), the stripe_cache_size parameter plays a crucial role in performance optimization for RAID5/6 arrays. This kernel parameter controls the size of the stripe cache - a memory buffer used to optimize write operations by reducing the read-modify-write overhead inherent in parity-based RAID levels.

The stripe cache acts as a write-back cache for partial stripe writes. When enabled:

  • Small writes are accumulated in memory until a full stripe can be written
  • Reduces the number of expensive read-modify-write cycles
  • Improves sequential write performance significantly
# View current stripe_cache_size value
cat /sys/block/md0/md/stripe_cache_size

As observed in the example, increasing stripe_cache_size from default (typically 256 or 512) to 16384 doubled the sync rate from 71MB/s to 143MB/s. However, this comes with increased RAM usage - approximately 64KB per cache entry (for 4KB chunks).

The optimal value depends on:

  • Available system memory
  • Workload characteristics (random vs sequential writes)
  • Number of disks in the array

To set the value temporarily (until reboot):

echo 16384 > /sys/block/md0/md/stripe_cache_size

For permanent configuration, add to /etc/rc.local or create a udev rule:

# Example udev rule
ACTION=="add|change", KERNEL=="md0", ATTR{md/stripe_cache_size}="16384"

After modification, verify the change took effect:

cat /sys/block/md0/md/stripe_cache_size

Monitor performance impact through:

cat /proc/mdstat
iostat -x 1
dstat -td --disk-util

For NUMA systems, you may need to adjust stripe_cache_active as well. The ratio between active and inactive cache can affect performance on large systems.

Remember that extremely large values can:

  • Cause memory pressure
  • Lead to longer recovery times after crashes
  • Increase latency for some workloads

The stripe_cache_size is a tunable parameter in Linux's software RAID (md) subsystem that controls the size of the stripe cache - a memory buffer used to optimize write operations in RAID5/6 arrays. This cache temporarily stores data before it's written to disks, helping to mitigate the "write hole" problem and improve performance.

When writing to a RAID5/6 array, the system needs to:

  1. Read existing data and parity
  2. Compute new parity
  3. Write new data and parity

The stripe cache stores these intermediate computations in RAM, reducing the number of disk I/O operations required. A larger cache can hold more stripes in memory, potentially improving performance for sequential writes.

As you've observed, increasing stripe_cache_size from the default (usually 256 or 512) to 16384 can significantly improve sync speeds. The improvement comes from:

  • Reduced disk seeks (more operations can be batched)
  • Better sequential write patterns
  • Less time waiting for disk I/O

To view current value:

cat /sys/block/md0/md/stripe_cache_size

To set a new value (requires root):

echo 16384 > /sys/block/md0/md/stripe_cache_size

To make it persistent across reboots, add to /etc/rc.local:

#!/bin/sh
echo 16384 > /sys/block/md0/md/stripe_cache_size
exit 0

The optimal value depends on:

  • Available RAM (each entry uses ~8KB)
  • Workload characteristics (sequential vs random writes)
  • Number of disks in the array

A good starting point is 4096 for arrays with 4-6 disks, scaling up to 32768 for larger arrays with sufficient RAM.

Check performance before/after changes:

cat /proc/mdstat
iostat -x 1

Monitor memory usage:

free -m
cat /proc/meminfo

While not extensively documented, some references exist in:

  • Linux kernel source (drivers/md/md.c)
  • mdadm man pages
  • Kernel documentation (Documentation/md.txt)