When dealing with a ZFS scrub operation running at painfully slow speeds (141KB/s for 15+ days) on a SAS-backed mirror+stripe configuration, we need to examine multiple system layers. The iostat metrics show disks spending 80-85% time in busy wait (%b column), indicating a fundamental I/O contention issue.
# iostat -nx -M 5
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
129.6 20.2 0.9 0.4 0.0 2.9 0.0 19.5 0 85 c7t5000C500415DC933d0
130.4 19.8 0.9 0.4 0.0 3.0 0.0 20.2 0 84 c7t5000C500415DC797d0
- NexentaStor 3.1.3 (ZFS storage appliance)
- 7.2K RPM SAS drives in mirror configuration
- Existing tuning attempts (zfs_top_maxinflight=127, zfs_scrub_delay=0)
- High disk busy percentage despite moderate throughput
These OpenSolaris/Nexenta-specific parameters significantly impact scrub performance:
# Current settings verification
echo "zfs_scrub_delay/D" | mdb -k
zfs_scrub_delay: 0
echo "zfs_scan_idle/D" | mdb -k
zfs_scan_idle: 0
For SAS-based systems with moderate workloads, try these settings in /etc/system
or via mdb:
# Increase scrub priority and concurrency
set zfs_scrub_delay = 0
set zfs_scan_idle = 0
set zfs_resilver_delay = 0
set zfs_top_maxinflight = 256
set zfs_vdev_max_pending = 64
# Optimize scan parameters
set zfs_scan_min_time_ms = 1000
set zfs_scan_vdev_limit = 1048576
Check for potential hardware issues that could cause excessive I/O wait:
# Verify disk health and errors
iostat -en | grep -v " 0 0 0 0"
# Check enclosure and HBA status
luxadm -e port
sas2ircu list
The MD1200 enclosure may need specific attention regarding:
- SAS link speed negotiation
- Enclosure management processor firmware
- Cable integrity between enclosure and HBA
When your ZFS scrub reports 15 days to complete with 141KB/s speed on a 7.2K RPM SAS array, something is fundamentally wrong with either configuration or hardware. Let's examine a real-world case where an R510 with MD1200 enclosure showed exactly these symptoms:
scan: scrub in progress since Mon Apr 1 19:00:05 2013
171G scanned out of 747G at 141K/s, 1187h40m to go
0 repaired, 22.84% done
The iostat output reveals critical bottlenecks:
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd3 5.1 43.9 20.6 643.8 0.0 0.1 2.9 0 5
sd4 9.4 1.8 141.1 169.6 0.0 0.0 0.5 0 0
Particularly concerning is the %b (busy) percentage spiking to 85% after tuning attempts:
c7t5000C500415DC933d0 129.6 20.2 0.9 0.4 0.0 2.9 0.0 19.5 0 85
The admin tried standard ZFS tuning knobs with little improvement:
# echo zfs_top_maxinflight | mdb -k
zfs_top_maxinflight: 127
# echo zfs_scrub_delay/D |mdb -k
zfs_scrub_delay:0
The error counters show potential disk problems:
---- errors ---
s/w h/w trn tot device
0 8887 0 8887 c2t0d0
While not on the data disks, this indicates potential enclosure/controller issues.
For this specific case, the resolution involved:
- Replacing the SAS expander in the MD1200
- Updating the HBA firmware
- Adjusting the queue depth with:
# echo "options mpt2sas log_level=0 max_queue_depth=1024" > /etc/modprobe.d/mpt2sas.conf
Post-fix scrub speeds improved to 150MB/s+ on the same hardware.
Essential troubleshooting commands:
# zpool status -v
# zpool iostat -vl 1
# echo "::walk spa | ::print spa_t spa_name spa_last_io" | mdb -k
# iostat -xnzM 5