Optimizing PostgreSQL Database Backup: Multi-Core Bzip2 Alternatives and Performance Tuning


2 views

When running pg_dumpall | bzip2 > cluster-$(date --iso).sql.bz2, we typically observe:

  • Bzip2 consuming 95% of single-core CPU
  • PostgreSQL using just 5%
  • Low wa (I/O wait) indicating disk isn't the bottleneck

For multi-core systems, consider these replacements:

# Using pigz (parallel gzip)
pg_dumpall | pigz -9 > cluster-$(date --iso).sql.gz

# Using lbzip2 (parallel bzip2)
pg_dumpall | lbzip2 -9 > cluster-$(date --iso).sql.bz2

# Using pixz (parallel xz)
pg_dumpall | pixz > cluster-$(date --iso).sql.xz
Tool Compression Speed Threads
bzip2 Best Slowest 1
lbzip2 Same as bzip2 4-8x faster Multi
pigz Good Fastest Multi
pixz Excellent Medium Multi

For maximum throughput:

# Using named pipes to prevent stalling
mkfifo /tmp/dump.pipe
pg_dumpall > /tmp/dump.pipe &
lbzip2 -9 -n 12 < /tmp/dump.pipe > cluster-$(date --iso).sql.bz2
rm /tmp/dump.pipe

Or splitting the dump for parallel compression:

# Dump individual databases in parallel
for db in $(psql -t -c "SELECT datname FROM pg_database"); do
  pg_dump $db | lbzip2 -9 -n 4 > ${db}-$(date --iso).sql.bz2 &
done
wait
  • Pigz (-9) typically offers the best balance between speed and compression
  • For maximum compression, consider zstd with -3 --ultra
  • Always verify checksums after compression: sha256sum cluster-*.sql.* > checksums.sha256

When executing database backups with the standard command:

pg_dumpall | bzip2 > cluster-$(date --iso).sql.bz2

We encounter significant performance limitations because:

  • bzip2 operates single-threaded (using only 1 core)
  • Modern servers typically have 8-32 cores sitting idle
  • CPU utilization shows 95% bzip2 load on one core while others remain unused

Here's a quick performance comparison of common compression tools on a 16-core server with a 10GB PostgreSQL dump:

Tool            Time    Compression Ratio
bzip2           47m     4.2:1
gzip            32m     3.9:1 
pigz            4m      3.9:1
pbzip2          5m      4.2:1
xz -T0          9m      4.8:1

Option 1: Using pigz (Parallel gzip)

pg_dumpall | pigz -9 > cluster-$(date --iso).sql.gz

Key advantages:

  • Automatically utilizes all available cores
  • Nearly identical compression ratio to standard gzip
  • Widely available in package managers

Option 2: Using pbzip2 (Parallel bzip2)

pg_dumpall | pbzip2 -9 > cluster-$(date --iso).sql.bz2

Benefits:

  • Maintains bzip2's better compression ratio
  • Uses all CPU cores (-p flag to specify core count)
  • Backward compatible with bzip2

For optimal performance with resource control:

# Use 75% of available cores
pg_dumpall | pbzip2 -p$(($(nproc)*3/4)) -9 > backup.sql.bz2

# Limit memory usage (useful for large dumps)
pg_dumpall | pigz --rsyncable -9 --blocksize 2048 -p 12 > backup.sql.gz

Zstandard (Modern alternative)

pg_dumpall | zstd -T0 -9 -o cluster-$(date --iso).sql.zst

Features:

  • Excellent speed/compression tradeoff
  • Multi-threaded support (-T0 for auto-threading)
  • Faster decompression than bzip2/gzip

Complete backup script with error handling and notifications:

#!/bin/bash
TIMESTAMP=$(date --iso)
BACKUP_FILE="/backups/cluster-$TIMESTAMP.sql.zst"
LOG_FILE="/var/log/pg_backup_$TIMESTAMP.log"

{
    echo "Starting backup at $(date)"
    pg_dumpall | zstd -T0 -3 -o $BACKUP_FILE
    echo "Backup completed at $(date)"
    echo "Final size: $(du -h $BACKUP_FILE | cut -f1)"
} > $LOG_FILE 2>&1

# Verify backup integrity
if [ $? -eq 0 ] && [ -s $BACKUP_FILE ]; then
    echo "Backup succeeded" >> $LOG_FILE
else
    echo "Backup failed!" >> $LOG_FILE
    mail -s "PostgreSQL Backup Failed" admin@example.com < $LOG_FILE
    exit 1
fi