Optimizing ZFS Pool Performance: Diagnosing Slow Sequential Read Speeds in Mirrored Stripe Configuration


4 views

When testing a 4-disk ZFS pool (2 mirrored vdevs striped in RAID10) with WD Red drives, sequential read speeds plateau at ~260MB/s despite theoretical expectations of ~550MB/s. Bonnie++ benchmarks reveal:

# Single disk performance
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
igor            63G   101  99 115288  30 49781  14   326  97 138250  13 111.6   8

# Pool performance (4 disks)
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
igor            63G   103  99 207518  43 108810  24   342  98 302350  26 256.4  18

The test environment features:

  • ZFS on Linux (zfs-0.6.5.7)
  • WD20EFRX (2TB Red) drives with 147MB/s raw throughput
  • ashift=12 (confirmed via zdb)
  • ARC cache limited to 16GB (zfs_arc_max=17179869184)
  • Default recordsize=128K

During read operations, zpool iostat shows:

# zpool iostat -v 1
pool2                                         1.27T  2.35T  2.68K     32   339M   141K
  mirror                                       651G  1.18T  1.34K     20   169M  90.0K
    ata-WDC_WD20EFRX-68AX9N0_WD-WMC300004469      -      -    748      9  92.5M  96.8K
    ata-WDC_WD20EFRX-68EUZN0_WD-WCC4MLK57MVX      -      -    623     10  76.8M  96.8K

Concurrently, disk utilization never reaches 100%:

# iostat -x 1
sdb               0.60     0.00  661.30    6.00 83652.80    49.20   250.87     2.32    3.47    3.46    4.87   1.20  79.76
sdd               0.80     0.00  735.40    5.30 93273.20    49.20   251.98     2.60    3.51    3.51    4.15   1.20  89.04

Adjusting recordsize significantly improved throughput:

# Create dataset with optimized recordsize
zfs create -o recordsize=1M pool2/largeblocks

# Benchmark results:
bonnie++: 392MB/s read (vs 260MB/s with default 128K)
dd if=/dev/zero bs=1M: 392MB/s sustained

However, ZVOL performance remained suboptimal:

# ZVOL configuration
zfs create -V 10G -o volblocksize=8K pool2/zvoltest
mkfs.ext4 /dev/zvol/pool2/zvoltest

# Performance:
dd if=/dev/zero bs=1M: 107MB/s read

For optimal sequential throughput:

# /etc/modprobe.d/zfs.conf
options zfs zfs_prefetch_disable=0
options zfs zfs_vdev_max_pending=32
options zfs zfs_vdev_async_write_max_active=10
options zfs zfs_vdev_async_read_max_active=10

# Dataset settings
zfs set primarycache=metadata pool2/media
zfs set compression=lz4 pool2
zfs set atime=off pool2

The performance bottleneck stemmed primarily from recordsize misalignment with large sequential I/O patterns. For workloads dominated by large files (video, backups, VM images), 1M recordsize delivers near-optimal performance. ZVOL limitations appear related to block size alignment and the additional abstraction layer.


When benchmarking my ZFS pool (RAID 10 with 4x WD RED 2TB drives), I observed sequential read speeds of ~260MB/s instead of the expected ~550MB/s. The pool configuration:

# zpool status
  pool: pool2
 state: ONLINE
  scan: none requested
config:
    NAME                                     STATE     READ WRITE CKSUM
    pool2                                    ONLINE       0     0     0
      mirror-0                               ONLINE       0     0     0
        ata-WDC_WD20EFRX-68AX9N0_WD-WMC300004469  ONLINE       0     0     0
        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4MLK57MVX  ONLINE       0     0     0
      mirror-1                               ONLINE       0     0     0
        ata-WDC_WD20EFRX-68AX9N0_WD-WCC1T0429536  ONLINE       0     0     0
        ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M0VYKFCE  ONLINE       0     0     0

Benchmark comparisons revealed:

  • Single disk: 138MB/s read
  • RAID 10 pool: 260MB/s read (expected 4x = ~550MB/s)
  • Disk utilization never reached 100% during reads (79-89%)

The breakthrough came when adjusting recordsize for large files:

# Create optimized dataset
zfs create -o recordsize=1M -o compression=off pool2/largefiles

# Benchmark results after tuning:
bonnie++: 392MB/s read (vs original 260MB/s)
dd if=/dev/zero of=testfile bs=1M count=10000: 392MB/s

The poor zvol performance (128MB write, 107MB read) appears related to block alignment:

# Create zvol with matching block size
zfs create -V 100G -b 128K -o volblocksize=128K pool2/zvol1

# Format with optimal settings
mkfs.ext4 -E stride=32,stripe-width=64 /dev/zvol/pool2/zvol1

To identify bottlenecks, I used these tools:

# Monitor ARC efficiency
arcstat.py 1

# Check queue depths
zpool iostat -v 1

# Detailed disk stats
iostat -xmtz 1

The optimal settings for my workload:

# /etc/modprobe.d/zfs.conf
options zfs zfs_prefetch_disable=0
options zfs zfs_vdev_max_active=32
options zfs zfs_arc_max=17179869184

# Dataset properties
zfs set recordsize=1M pool2
zfs set primarycache=all pool2
zfs set atime=off pool2