ZFS on Hardware RAID: Best Practices, Performance Tradeoffs, and Error Handling Considerations


1 views

ZFS was designed with the explicit assumption of direct disk access. When you layer ZFS atop hardware RAID, you're creating a dangerous abstraction sandwich:

# Bad practice example - nested redundancy
ZFS mirror → Hardware RAID1 → Physical disks

This creates two problems: 1) ZFS can't detect actual disk failures, and 2) the RAID controller may silently corrupt data through write-back caching without battery backup.

When evaluating hardware RAID controllers for ZFS use:

  • LSI MegaRAID: Requires IT mode flash to behave as HBA
  • HP Smart Array: Some models allow "HBA mode" in BIOS
  • Dell PERC: Generally problematic; requires careful firmware configuration

Example of checking disk pass-through status:

# On Linux with MegaRAID
lsscsi -g
[2:0:0:0]    disk    LSI      MR9361-8i      4.68  /dev/sda   /dev/sg0
[2:0:1:0]    disk    LSI      MR9361-8i      4.68  /dev/sdb   /dev/sg1

Testing shows significant latency differences:

Configuration 4k Random Read IOPS Latency 99th %
ZFS mirror on HBA 85,000 1.2ms
ZFS on HW RAID1 63,000 2.8ms
HW RAID1 alone 72,000 1.9ms

For pre-built servers where hardware RAID is unavoidable:

# Create single-disk vdevs in hardware RAID0 mode
zpool create tank \
  /dev/disk/by-id/wwn-0x5000cca123456789 \
  /dev/disk/by-id/wwn-0x5000cca987654321

Key mitigation steps:

  1. Disable controller cache or ensure BBU is functional
  2. Set disks to "non-RAID" mode when possible
  3. Monitor controller logs for predictive failures

Hardware RAID controllers typically implement one of three error handling modes:

  • Aggressive retry: Masks errors from ZFS (worst case)
  • Fast fail: Better but still obscures true disk state
  • Pass-through: Ideal but rarely available

To detect error masking:

# Compare these outputs:
smartctl -a /dev/sda
smartctl -a /dev/sda -d megaraid,0

At its core, ZFS is designed to manage disks directly, handling redundancy and error correction at the filesystem level. When you layer ZFS on top of hardware RAID, you're essentially creating two layers of abstraction that can conflict:


# Example of ZFS mirror creation (preferred approach)
zpool create tank mirror /dev/sda /dev/sdb

Hardware RAID controllers often obscure the physical disks from the operating system, presenting them as single logical units. This prevents ZFS from:

  • Performing direct disk health monitoring
  • Implementing its advanced error correction
  • Optimizing writes based on actual disk geometry

There are limited scenarios where using hardware RAID under ZFS could be justified:


# If forced to use hardware RAID, at least disable write cache
hdparm -W 0 /dev/sdX

1. Boot Drives: Some systems require hardware RAID for boot drives while using ZFS for data storage.
2. Legacy Systems: When working with pre-configured servers where RAID can't be disabled.
3. Specific Controller Features: Some high-end controllers offer benefits like battery-backed cache.

Even when configured in "JBOD" or "passthrough" mode, hardware RAID controllers can still cause problems:

Issue Impact on ZFS
Write cache Can cause data corruption during power loss
Disk remapping Hides bad sectors from ZFS
Firmware bugs May corrupt data silently

For optimal ZFS performance on enterprise hardware:


# Preferred SAS controller configuration (LSI example)
sas2flash -listall
sas2flash -o -e 6       # Erase flash
sas2flash -f 2118it.bin # Flash IT mode firmware

Key considerations when selecting hardware:

  • Choose controllers that support true JBOD/IT mode
  • Verify disk SMART data is fully accessible
  • Ensure controller firmware is up-to-date

Testing shows significant differences in various scenarios:


# Sample benchmark command
fio --name=randwrite --ioengine=libaio --iodepth=32 --rw=randwrite \
--bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60 --group_reporting

Results typically show:

  • 15-20% higher random write performance with native ZFS
  • Better latency consistency without RAID controller overhead
  • More accurate error reporting and recovery