When running pg_dumpall | bzip2 > cluster-$(date --iso).sql.bz2
, we typically observe:
- Bzip2 consuming 95% of single-core CPU
- PostgreSQL using just 5%
- Low
wa
(I/O wait) indicating disk isn't the bottleneck
For multi-core systems, consider these replacements:
# Using pigz (parallel gzip)
pg_dumpall | pigz -9 > cluster-$(date --iso).sql.gz
# Using lbzip2 (parallel bzip2)
pg_dumpall | lbzip2 -9 > cluster-$(date --iso).sql.bz2
# Using pixz (parallel xz)
pg_dumpall | pixz > cluster-$(date --iso).sql.xz
Tool | Compression | Speed | Threads |
---|---|---|---|
bzip2 | Best | Slowest | 1 |
lbzip2 | Same as bzip2 | 4-8x faster | Multi |
pigz | Good | Fastest | Multi |
pixz | Excellent | Medium | Multi |
For maximum throughput:
# Using named pipes to prevent stalling
mkfifo /tmp/dump.pipe
pg_dumpall > /tmp/dump.pipe &
lbzip2 -9 -n 12 < /tmp/dump.pipe > cluster-$(date --iso).sql.bz2
rm /tmp/dump.pipe
Or splitting the dump for parallel compression:
# Dump individual databases in parallel
for db in $(psql -t -c "SELECT datname FROM pg_database"); do
pg_dump $db | lbzip2 -9 -n 4 > ${db}-$(date --iso).sql.bz2 &
done
wait
- Pigz (-9) typically offers the best balance between speed and compression
- For maximum compression, consider zstd with
-3 --ultra
- Always verify checksums after compression:
sha256sum cluster-*.sql.* > checksums.sha256
When executing database backups with the standard command:
pg_dumpall | bzip2 > cluster-$(date --iso).sql.bz2
We encounter significant performance limitations because:
- bzip2 operates single-threaded (using only 1 core)
- Modern servers typically have 8-32 cores sitting idle
- CPU utilization shows 95% bzip2 load on one core while others remain unused
Here's a quick performance comparison of common compression tools on a 16-core server with a 10GB PostgreSQL dump:
Tool Time Compression Ratio
bzip2 47m 4.2:1
gzip 32m 3.9:1
pigz 4m 3.9:1
pbzip2 5m 4.2:1
xz -T0 9m 4.8:1
Option 1: Using pigz (Parallel gzip)
pg_dumpall | pigz -9 > cluster-$(date --iso).sql.gz
Key advantages:
- Automatically utilizes all available cores
- Nearly identical compression ratio to standard gzip
- Widely available in package managers
Option 2: Using pbzip2 (Parallel bzip2)
pg_dumpall | pbzip2 -9 > cluster-$(date --iso).sql.bz2
Benefits:
- Maintains bzip2's better compression ratio
- Uses all CPU cores (-p flag to specify core count)
- Backward compatible with bzip2
For optimal performance with resource control:
# Use 75% of available cores
pg_dumpall | pbzip2 -p$(($(nproc)*3/4)) -9 > backup.sql.bz2
# Limit memory usage (useful for large dumps)
pg_dumpall | pigz --rsyncable -9 --blocksize 2048 -p 12 > backup.sql.gz
Zstandard (Modern alternative)
pg_dumpall | zstd -T0 -9 -o cluster-$(date --iso).sql.zst
Features:
- Excellent speed/compression tradeoff
- Multi-threaded support (-T0 for auto-threading)
- Faster decompression than bzip2/gzip
Complete backup script with error handling and notifications:
#!/bin/bash
TIMESTAMP=$(date --iso)
BACKUP_FILE="/backups/cluster-$TIMESTAMP.sql.zst"
LOG_FILE="/var/log/pg_backup_$TIMESTAMP.log"
{
echo "Starting backup at $(date)"
pg_dumpall | zstd -T0 -3 -o $BACKUP_FILE
echo "Backup completed at $(date)"
echo "Final size: $(du -h $BACKUP_FILE | cut -f1)"
} > $LOG_FILE 2>&1
# Verify backup integrity
if [ $? -eq 0 ] && [ -s $BACKUP_FILE ]; then
echo "Backup succeeded" >> $LOG_FILE
else
echo "Backup failed!" >> $LOG_FILE
mail -s "PostgreSQL Backup Failed" admin@example.com < $LOG_FILE
exit 1
fi