When testing a 4-disk ZFS pool (2 mirrored vdevs striped in RAID10) with WD Red drives, sequential read speeds plateau at ~260MB/s despite theoretical expectations of ~550MB/s. Bonnie++ benchmarks reveal:
# Single disk performance Version 1.97 ------Sequential Output------ --Sequential Input- --Random- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP igor 63G 101 99 115288 30 49781 14 326 97 138250 13 111.6 8 # Pool performance (4 disks) Version 1.97 ------Sequential Output------ --Sequential Input- --Random- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP igor 63G 103 99 207518 43 108810 24 342 98 302350 26 256.4 18
The test environment features:
- ZFS on Linux (zfs-0.6.5.7)
- WD20EFRX (2TB Red) drives with 147MB/s raw throughput
- ashift=12 (confirmed via zdb)
- ARC cache limited to 16GB (zfs_arc_max=17179869184)
- Default recordsize=128K
During read operations, zpool iostat shows:
# zpool iostat -v 1 pool2 1.27T 2.35T 2.68K 32 339M 141K mirror 651G 1.18T 1.34K 20 169M 90.0K ata-WDC_WD20EFRX-68AX9N0_WD-WMC300004469 - - 748 9 92.5M 96.8K ata-WDC_WD20EFRX-68EUZN0_WD-WCC4MLK57MVX - - 623 10 76.8M 96.8K
Concurrently, disk utilization never reaches 100%:
# iostat -x 1 sdb 0.60 0.00 661.30 6.00 83652.80 49.20 250.87 2.32 3.47 3.46 4.87 1.20 79.76 sdd 0.80 0.00 735.40 5.30 93273.20 49.20 251.98 2.60 3.51 3.51 4.15 1.20 89.04
Adjusting recordsize significantly improved throughput:
# Create dataset with optimized recordsize zfs create -o recordsize=1M pool2/largeblocks # Benchmark results: bonnie++: 392MB/s read (vs 260MB/s with default 128K) dd if=/dev/zero bs=1M: 392MB/s sustained
However, ZVOL performance remained suboptimal:
# ZVOL configuration zfs create -V 10G -o volblocksize=8K pool2/zvoltest mkfs.ext4 /dev/zvol/pool2/zvoltest # Performance: dd if=/dev/zero bs=1M: 107MB/s read
For optimal sequential throughput:
# /etc/modprobe.d/zfs.conf options zfs zfs_prefetch_disable=0 options zfs zfs_vdev_max_pending=32 options zfs zfs_vdev_async_write_max_active=10 options zfs zfs_vdev_async_read_max_active=10 # Dataset settings zfs set primarycache=metadata pool2/media zfs set compression=lz4 pool2 zfs set atime=off pool2
The performance bottleneck stemmed primarily from recordsize misalignment with large sequential I/O patterns. For workloads dominated by large files (video, backups, VM images), 1M recordsize delivers near-optimal performance. ZVOL limitations appear related to block size alignment and the additional abstraction layer.
When benchmarking my ZFS pool (RAID 10 with 4x WD RED 2TB drives), I observed sequential read speeds of ~260MB/s instead of the expected ~550MB/s. The pool configuration:
# zpool status pool: pool2 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool2 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-WDC_WD20EFRX-68AX9N0_WD-WMC300004469 ONLINE 0 0 0 ata-WDC_WD20EFRX-68EUZN0_WD-WCC4MLK57MVX ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 ata-WDC_WD20EFRX-68AX9N0_WD-WCC1T0429536 ONLINE 0 0 0 ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M0VYKFCE ONLINE 0 0 0
Benchmark comparisons revealed:
- Single disk: 138MB/s read
- RAID 10 pool: 260MB/s read (expected 4x = ~550MB/s)
- Disk utilization never reached 100% during reads (79-89%)
The breakthrough came when adjusting recordsize for large files:
# Create optimized dataset zfs create -o recordsize=1M -o compression=off pool2/largefiles # Benchmark results after tuning: bonnie++: 392MB/s read (vs original 260MB/s) dd if=/dev/zero of=testfile bs=1M count=10000: 392MB/s
The poor zvol performance (128MB write, 107MB read) appears related to block alignment:
# Create zvol with matching block size zfs create -V 100G -b 128K -o volblocksize=128K pool2/zvol1 # Format with optimal settings mkfs.ext4 -E stride=32,stripe-width=64 /dev/zvol/pool2/zvol1
To identify bottlenecks, I used these tools:
# Monitor ARC efficiency arcstat.py 1 # Check queue depths zpool iostat -v 1 # Detailed disk stats iostat -xmtz 1
The optimal settings for my workload:
# /etc/modprobe.d/zfs.conf options zfs zfs_prefetch_disable=0 options zfs zfs_vdev_max_active=32 options zfs zfs_arc_max=17179869184 # Dataset properties zfs set recordsize=1M pool2 zfs set primarycache=all pool2 zfs set atime=off pool2