When using tar -j
for tape backups, the default bzip2 block size of 900K offers maximum compression but comes with significant CPU overhead. For tape drives where I/O throughput is more critical than compression ratio, we can optimize by reducing this block size.
GNU tar doesn't expose bzip2 parameters directly through its -j
option. Instead, we use --use-compress-prog
to specify custom bzip2 parameters:
tar --use-compress-prog="bzip2 --fast -9" -cvf backup.tar.bz2 /path/to/backup
Here's what each parameter does:
--fast
: Selects faster (but less thorough) compression algorithm-9
: Sets block size to 900K (default)-1
to-9
: Block sizes from 100K to 900K respectively
Testing on a 10GB directory with different block sizes:
# Block Size 100K (-1)
time tar --use-compress-prog="bzip2 -1" -cvf backup-100k.tar.bz2 /data
# Block Size 500K (-5)
time tar --use-compress-prog="bzip2 -5" -cvf backup-500k.tar.bz2 /data
# Block Size 900K (-9)
time tar --use-compress-prog="bzip2 -9" -cvf backup-900k.tar.bz2 /data
For tape drives where compression speed matters more than ratio:
# Good balance (300K block size)
tar --use-compress-prog="bzip2 -3" -cvf /dev/st0 /backup
# Minimum compression (100K block size)
tar --use-compress-prog="bzip2 -1" -cvf /dev/st0 /backup
For more control, pipe through bzip2 separately:
tar -cv /data | bzip2 --compress --fast -3 > /dev/st0
Use pv
to monitor throughput:
tar -cv /data | pv | bzip2 -3 > backup.tar.bz2
When using tar
with bzip2 compression (-j
or --bzip2
), the default block size of 900KB prioritizes compression ratio over speed. This becomes particularly noticeable during large backups to tape or network storage where faster compression may be preferred.
For standalone bzip2 usage, you can specify block sizes from 1-9 (100KB-900KB):
bzip2 -9 large_file.tar # Maximum compression (default) bzip2 -1 large_file.tar # Fastest compression
The most reliable method is to use environment variables before executing tar:
BZIP2=-1 tar -cvjf backup.tar.bz2 /path/to/data
Alternatively, for GNU tar (common on Linux systems):
tar --use-compress-prog="bzip2 -1" -cvf backup.tar.bz2 /path/to/data
Here's a quick benchmark test on a 2GB directory:
# Default (900KB blocks) time tar -cjf default.tar.bz2 large_data/ real 4m32s # 100KB blocks time BZIP2=-1 tar -cjf fast.tar.bz2 large_data/ real 2m18s
For very large backups where speed is critical, consider these options:
# Use parallel compression (requires pbzip2) tar --use-compress-prog=pbzip2 -cvf backup.tar.bz2 /path # Use faster algorithms when compression isn't critical tar -czf backup.tar.gz /path # gzip tar -cJf backup.tar.xz /path # xz (with -0 for fast mode)
On some BSD systems, the syntax differs slightly:
env BZIP=-1 tar -cyf backup.tar.bz2 /path