How to Configure bzip2 Block Size in GNU Tar for Optimal Tape Backup Performance


2 views

When using tar -j for tape backups, the default bzip2 block size of 900K offers maximum compression but comes with significant CPU overhead. For tape drives where I/O throughput is more critical than compression ratio, we can optimize by reducing this block size.

GNU tar doesn't expose bzip2 parameters directly through its -j option. Instead, we use --use-compress-prog to specify custom bzip2 parameters:

tar --use-compress-prog="bzip2 --fast -9" -cvf backup.tar.bz2 /path/to/backup

Here's what each parameter does:

  • --fast: Selects faster (but less thorough) compression algorithm
  • -9: Sets block size to 900K (default)
  • -1 to -9: Block sizes from 100K to 900K respectively

Testing on a 10GB directory with different block sizes:

# Block Size 100K (-1)
time tar --use-compress-prog="bzip2 -1" -cvf backup-100k.tar.bz2 /data

# Block Size 500K (-5) 
time tar --use-compress-prog="bzip2 -5" -cvf backup-500k.tar.bz2 /data

# Block Size 900K (-9)
time tar --use-compress-prog="bzip2 -9" -cvf backup-900k.tar.bz2 /data

For tape drives where compression speed matters more than ratio:

# Good balance (300K block size)
tar --use-compress-prog="bzip2 -3" -cvf /dev/st0 /backup

# Minimum compression (100K block size) 
tar --use-compress-prog="bzip2 -1" -cvf /dev/st0 /backup

For more control, pipe through bzip2 separately:

tar -cv /data | bzip2 --compress --fast -3 > /dev/st0

Use pv to monitor throughput:

tar -cv /data | pv | bzip2 -3 > backup.tar.bz2

When using tar with bzip2 compression (-j or --bzip2), the default block size of 900KB prioritizes compression ratio over speed. This becomes particularly noticeable during large backups to tape or network storage where faster compression may be preferred.

For standalone bzip2 usage, you can specify block sizes from 1-9 (100KB-900KB):

bzip2 -9 large_file.tar   # Maximum compression (default)
bzip2 -1 large_file.tar   # Fastest compression

The most reliable method is to use environment variables before executing tar:

BZIP2=-1 tar -cvjf backup.tar.bz2 /path/to/data

Alternatively, for GNU tar (common on Linux systems):

tar --use-compress-prog="bzip2 -1" -cvf backup.tar.bz2 /path/to/data

Here's a quick benchmark test on a 2GB directory:

# Default (900KB blocks)
time tar -cjf default.tar.bz2 large_data/
real    4m32s

# 100KB blocks
time BZIP2=-1 tar -cjf fast.tar.bz2 large_data/
real    2m18s

For very large backups where speed is critical, consider these options:

# Use parallel compression (requires pbzip2)
tar --use-compress-prog=pbzip2 -cvf backup.tar.bz2 /path

# Use faster algorithms when compression isn't critical
tar -czf backup.tar.gz /path  # gzip
tar -cJf backup.tar.xz /path  # xz (with -0 for fast mode)

On some BSD systems, the syntax differs slightly:

env BZIP=-1 tar -cyf backup.tar.bz2 /path