How to Efficiently rsync to Multiple Destinations Using a Single Filelist


2 views

When dealing with large directories (like our example with ~12,000 files in /junk), performing sequential rsync operations to multiple destinations becomes painfully inefficient. The main bottleneck occurs during the filelist generation phase, where rsync must:

  • Scan the entire directory structure
  • Compare file metadata
  • Build the delta transfer list
# Traditional approach (inefficient)
rsync -Pav /junk user@host1:/backup
rsync -Pav /junk user@host2:/backup
rsync -Pav /junk user@host3:/backup

The most reliable method involves creating a filelist once and reusing it:

# Generate filelist once
find /junk -type f > filelist.txt

# Parallel execution using GNU parallel
parallel -j 3 rsync -Pav --files-from=filelist.txt / user@{}:/backup ::: host1 host2 host3

Key advantages:

  • Single directory scan for all transfers
  • Flexible parallel execution control
  • Option to manually edit filelist if needed

For large-scale deployments, consider setting up rsync in daemon mode:

# /etc/rsyncd.conf sample
[junk_backup1]
    path = /backup/host1
    auth users = backup_user
    secrets file = /etc/rsyncd.secrets

[junk_backup2]
    path = /backup/host2
    auth users = backup_user
    secrets file = /etc/rsyncd.secrets

Then execute with:

rsync -Pav --files-from=filelist.txt / junk::junk_backup1 junk::junk_backup2

For maximum performance on systems with tmpfs:

# Create memory-backed filelist
mount -t tmpfs -o size=128M tmpfs /mnt/ramdisk
find /junk -type f > /mnt/ramdisk/filelist.txt

# Share the same memory-mapped file
for host in host1 host2 host3; do
    rsync -Pav --files-from=/mnt/ramdisk/filelist.txt / user@$host:/backup &
done
wait
Method Directory Scan Transfer Time Memory Usage
Sequential 3x Slowest Low
--files-from 1x Fastest Medium
Daemon Mode 1x Fast High

For most use cases, Solution 1 provides the best balance between simplicity and performance. The --files-from approach reduces the directory scanning overhead while maintaining rsync's powerful delta-transfer capabilities.


When dealing with large directories (especially on slow storage devices), repeatedly scanning the same file structure for multiple rsync operations becomes a significant bottleneck. The initial filelist generation phase often takes longer than the actual data transfer.

While the sequential approach works:

rsync -Pav /source user@host1:/dest
rsync -Pav /source user@host2:/dest
rsync -Pav /source user@host3:/dest

We can optimize this by either:

# Generate filelist once
find /source -type f > filelist.txt

# Use the same filelist for multiple destinations
rsync -Pav --files-from=filelist.txt / user@host1:/dest
rsync -Pav --files-from=filelist.txt / user@host2:/dest
rsync -Pav --files-from=filelist.txt / user@host3:/dest

For true parallel transfers while maintaining a single file scan:

# Single scan, parallel transfers
find /source -type f > filelist.txt
parallel -j 3 rsync -Pav --files-from=filelist.txt / user@{}:/dest ::: host1 host2 host3

For backup scenarios where you want to maintain hardlinks between destinations:

rsync -Pav --link-dest=/previous_backup --files-from=filelist.txt / user@host1:/current_backup
rsync -Pav --link-dest=/previous_backup --files-from=filelist.txt / user@host2:/current_backup
  • The --files-from method typically reduces total runtime by 40-60% for large directories
  • Parallel execution adds network bandwidth overhead but can complete transfers faster
  • For very large filelists (>100K files), consider splitting into chunks

When scripting these solutions, implement proper error checking:

if ! find /source -type f > filelist.txt; then
    echo "Filelist generation failed" >&2
    exit 1
fi

for host in host1 host2 host3; do
    if ! rsync -Pav --files-from=filelist.txt / user@$host:/dest; then
        echo "rsync to $host failed" >&2
    fi
done