Many sysadmins face this common backup challenge: You need compressed archives for storage efficiency, but rsync works best with uncompressed files. The standard approaches all have drawbacks:
- Compressing before rsync (
rsync -z
) hurts delta-transfer efficiency - Storing uncompressed files wastes precious disk space
- Manual compress-sync-decompress workflows are impractical at scale
Here's a robust method using named pipes (FIFOs) to create a compression-transparent rsync tunnel:
#!/bin/bash
# Create FIFOs for our pipeline
mkfifo /tmp/rsync_in /tmp/rsync_out
# Compression worker process
gzip -c </tmp/rsync_in >/backup/foo.gz &
# Decompression worker process
if [[ -f /backup/foo.gz ]]; then
gunzip -c /backup/foo.gz >/tmp/rsync_out &
else
touch /tmp/rsync_out &
fi
# Perform the rsync transfer
rsync --progress --inplace --no-whole-file \
--temp-dir=/tmp \
/live/foo \
/tmp/rsync_out >/tmp/rsync_in
# Clean up
rm /tmp/rsync_in /tmp/rsync_out
For production use, consider these enhancements:
# Parallel processing with xargs
find /live -type f -print0 | xargs -0 -P$(nproc) -I{} bash -c '
file="{}"
backup="/backup/$(basename "$file").gz"
tmp="/tmp/$(basename "$file").pipe"
mkfifo "$tmp"
gzip -c <"$tmp" >"$backup" &
if [[ -f "$backup" ]]; then
gunzip -c "$backup" &
else
cat /dev/null &
fi | rsync --inplace "$file" - >"$tmp"
rm "$tmp"
'
If you're using ZFS, consider this more elegant solution:
# On source:
zfs snapshot pool/live@$(date +%Y%m%d)
zfs send pool/live@$(date +%Y%m%d) | gzip | ssh backup "gunzip | zfs receive pool/backup"
# With incremental:
zfs send -i pool/live@yesterday pool/live@today | gzip | ssh backup "gunzip | zfs receive pool/backup"
Remember these implementation details:
- FIFOs require available disk space in /tmp
- Compression level impacts CPU usage (adjust with gzip -1 to -9)
- For large files, consider splitting into chunks with
split
- Monitor open file descriptors in high-volume environments
When maintaining compressed backups (.gz
files) while syncing from uncompressed source files, we face a dilemma: rsync works best with uncompressed data, but storage constraints demand compression. Traditional approaches like rsync -z
or pre-compressed --rsyncable
gzip files reduce sync efficiency.
Here's a script that handles the compression/decompression transparently during transfer:
#!/bin/bash
SOURCE_DIR="/path/to/source"
DEST_DIR="/path/to/backup"
find "$SOURCE_DIR" -type f | while read -r src_file; do
rel_path="${src_file#$SOURCE_DIR}"
dest_file="$DEST_DIR${rel_path}.gz"
# Create destination directory structure
mkdir -p "$(dirname "$dest_file")"
# Decompress existing backup if exists
if [ -f "$dest_file" ]; then
gunzip -c "$dest_file" > "/tmp/tempfile"
rsync -a "$src_file" "/tmp/tempfile"
gzip -c "/tmp/tempfile" > "$dest_file"
rm "/tmp/tempfile"
else
gzip -c "$src_file" > "$dest_file"
fi
done
This approach:
- Maintains rsync's delta-transfer efficiency
- Only requires temporary space for one file at a time
- Preserves compression ratios in the final archive
For large-scale deployments, consider this Python solution using tempfile.NamedTemporaryFile
:
import gzip
import os
import tempfile
import shutil
def sync_with_compression(src, dst):
with tempfile.NamedTemporaryFile() as tmp:
# Handle existing compressed backup
if os.path.exists(dst):
with gzip.open(dst, 'rb') as f_in:
shutil.copyfileobj(f_in, tmp)
tmp.flush()
os.system(f'rsync -a {src} {tmp.name}')
# Compress to final destination
with gzip.open(dst, 'wb') as f_out:
with open(src, 'rb') if not tmp.closed else open(tmp.name, 'rb') as f_in:
shutil.copyfileobj(f_in, f_out)
While this method works well for individual files, consider these alternatives for specific scenarios:
- ZFS/BTRFS: Use filesystem-level compression
- Large directories: Implement parallel processing
- Network transfer: Combine with
ssh
pipes