When working with long-lived ZFS pools, we often encounter this scenario: older files remain compressed with outdated algorithms (like LZJB) even after upgrading to superior methods (like LZ4). This creates suboptimal storage efficiency since LZ4 typically provides better compression ratios and faster operations.
Two common approaches exist, both with significant drawbacks:
# Option 1: Full pool backup and restore
zfs send pool/old | zfs receive pool/new
# Requires substantial temporary storage and downtime
# Option 2: Selective file recopying
find /pool/old -type f -mtime +3650 -exec cp {} /pool/new \;
# Risky due to potential filename/special character issues
While ZFS doesn't directly offer "recompress" functionality, we can leverage these technical approaches:
# Method 1: ZFS send/receive with compression forcing
zfs set compression=lz4 pool/dataset
zfs send pool/dataset@snap | zfs receive -F pool/dataset
# Method 2: In-place rewrite with compression
sudo zfs set compression=off pool/dataset
sudo dd if=/dev/zero of=/pool/dataset/zero bs=1M
sudo rm /pool/dataset/zero
sudo zfs set compression=lz4 pool/dataset
To analyze compression distribution across your dataset:
zdb -b pool | grep -E 'LZJB|LZ4'
# For detailed block-level analysis:
zdb -vvvv pool | grep -A 5 'COMPRESS'
When implementing this:
- Always create snapshots before major operations
- Monitor system load during recompression
- Consider doing this during maintenance windows
- Test with non-critical datasets first
For minimal downtime:
zfs bookmark pool/dataset@snap pool/dataset#mark
zfs send --compressed -i pool/dataset#mark | \
zfs receive -u -o compression=lz4 pool/dataset
Many ZFS administrators face this scenario: your pool was created years ago using lzjb
compression, and later upgraded to lz4
. Now you've got a mixed bag of compression formats with varying efficiency.
# Current compression setting (likely shows lz4 now) zfs get compression poolname
The difference between lzjb and lz4 isn't just academic - in real-world datasets, lz4 typically provides:
- 10-20% better compression ratios
- 2-5x faster compression speeds
- 3-10x faster decompression speeds
Option 1: ZFS Send/Receive (Recommended)
This creates new blocks with current compression settings:
# Create temporary snapshot zfs snapshot poolname/dataset@compressfix # Send with -w flag to preserve raw blocks zfs send -w poolname/dataset@compressfix | zfs recv -F poolname/dataset_new # Verify and replace zfs rename poolname/dataset poolname/dataset_old zfs rename poolname/dataset_new poolname/dataset
Option 2: In-place Rewrite
For cases where send/receive isn't feasible:
#!/bin/bash # Rewrite files older than specific date find /poolname/dataset -type f -mtime +3650 -print0 | while IFS= read -r -d '' file do mv "$file" "$file.temp" && \ cp -p "$file.temp" "$file" && \ rm "$file.temp" done
Checking Compression Distribution
While ZFS doesn't directly report per-algorithm stats, we can estimate:
# Compare compressed size vs logical size for old files zfs list -o name,compressratio,used,logicalused | grep poolname
Automated Dataset Processing
For large environments, consider this Python approach:
import subprocess import re def get_datasets(pool): cmd = f"zfs list -H -o name -r {pool}" return subprocess.check_output(cmd, shell=True).decode().splitlines() for ds in get_datasets("poolname"): # Skip snapshots if "@" in ds: continue subprocess.run(f"zfs set compression=lz4 {ds}", shell=True) subprocess.run(f"zfs recv -F {ds} < /dev/null", shell=True) # Trigger rewrite
When running these operations:
- Monitor ARC hit rate (
arcstat.py
) - Consider doing this during low-usage periods
- For large pools, process datasets sequentially
- Keep an eye on ZIL and transaction group commits