Efficient Rsync Strategy: Maintaining Compressed Backups While Syncing Uncompressed Live Files


2 views

Many sysadmins face this common backup challenge: You need compressed archives for storage efficiency, but rsync works best with uncompressed files. The standard approaches all have drawbacks:

  • Compressing before rsync (rsync -z) hurts delta-transfer efficiency
  • Storing uncompressed files wastes precious disk space
  • Manual compress-sync-decompress workflows are impractical at scale

Here's a robust method using named pipes (FIFOs) to create a compression-transparent rsync tunnel:

#!/bin/bash
# Create FIFOs for our pipeline
mkfifo /tmp/rsync_in /tmp/rsync_out

# Compression worker process
gzip -c </tmp/rsync_in >/backup/foo.gz &

# Decompression worker process
if [[ -f /backup/foo.gz ]]; then
    gunzip -c /backup/foo.gz >/tmp/rsync_out &
else
    touch /tmp/rsync_out &
fi

# Perform the rsync transfer
rsync --progress --inplace --no-whole-file \
    --temp-dir=/tmp \
    /live/foo \
    /tmp/rsync_out >/tmp/rsync_in

# Clean up
rm /tmp/rsync_in /tmp/rsync_out

For production use, consider these enhancements:

# Parallel processing with xargs
find /live -type f -print0 | xargs -0 -P$(nproc) -I{} bash -c '
    file="{}"
    backup="/backup/$(basename "$file").gz"
    tmp="/tmp/$(basename "$file").pipe"
    
    mkfifo "$tmp"
    gzip -c <"$tmp" >"$backup" &
    
    if [[ -f "$backup" ]]; then
        gunzip -c "$backup" &
    else
        cat /dev/null &
    fi | rsync --inplace "$file" - >"$tmp"
    
    rm "$tmp"
'

If you're using ZFS, consider this more elegant solution:

# On source:
zfs snapshot pool/live@$(date +%Y%m%d)
zfs send pool/live@$(date +%Y%m%d) | gzip | ssh backup "gunzip | zfs receive pool/backup"

# With incremental:
zfs send -i pool/live@yesterday pool/live@today | gzip | ssh backup "gunzip | zfs receive pool/backup"

Remember these implementation details:

  • FIFOs require available disk space in /tmp
  • Compression level impacts CPU usage (adjust with gzip -1 to -9)
  • For large files, consider splitting into chunks with split
  • Monitor open file descriptors in high-volume environments

When maintaining compressed backups (.gz files) while syncing from uncompressed source files, we face a dilemma: rsync works best with uncompressed data, but storage constraints demand compression. Traditional approaches like rsync -z or pre-compressed --rsyncable gzip files reduce sync efficiency.

Here's a script that handles the compression/decompression transparently during transfer:

#!/bin/bash
SOURCE_DIR="/path/to/source"
DEST_DIR="/path/to/backup"

find "$SOURCE_DIR" -type f | while read -r src_file; do
    rel_path="${src_file#$SOURCE_DIR}"
    dest_file="$DEST_DIR${rel_path}.gz"
    
    # Create destination directory structure
    mkdir -p "$(dirname "$dest_file")"
    
    # Decompress existing backup if exists
    if [ -f "$dest_file" ]; then
        gunzip -c "$dest_file" > "/tmp/tempfile"
        rsync -a "$src_file" "/tmp/tempfile"
        gzip -c "/tmp/tempfile" > "$dest_file"
        rm "/tmp/tempfile"
    else
        gzip -c "$src_file" > "$dest_file"
    fi
done

This approach:

  • Maintains rsync's delta-transfer efficiency
  • Only requires temporary space for one file at a time
  • Preserves compression ratios in the final archive

For large-scale deployments, consider this Python solution using tempfile.NamedTemporaryFile:

import gzip
import os
import tempfile
import shutil

def sync_with_compression(src, dst):
    with tempfile.NamedTemporaryFile() as tmp:
        # Handle existing compressed backup
        if os.path.exists(dst):
            with gzip.open(dst, 'rb') as f_in:
                shutil.copyfileobj(f_in, tmp)
            tmp.flush()
            os.system(f'rsync -a {src} {tmp.name}')
        
        # Compress to final destination
        with gzip.open(dst, 'wb') as f_out:
            with open(src, 'rb') if not tmp.closed else open(tmp.name, 'rb') as f_in:
                shutil.copyfileobj(f_in, f_out)

While this method works well for individual files, consider these alternatives for specific scenarios:

  • ZFS/BTRFS: Use filesystem-level compression
  • Large directories: Implement parallel processing
  • Network transfer: Combine with ssh pipes