When dealing with large directories (like our example with ~12,000 files in /junk), performing sequential rsync operations to multiple destinations becomes painfully inefficient. The main bottleneck occurs during the filelist generation phase, where rsync must:
- Scan the entire directory structure
- Compare file metadata
- Build the delta transfer list
# Traditional approach (inefficient)
rsync -Pav /junk user@host1:/backup
rsync -Pav /junk user@host2:/backup
rsync -Pav /junk user@host3:/backup
The most reliable method involves creating a filelist once and reusing it:
# Generate filelist once
find /junk -type f > filelist.txt
# Parallel execution using GNU parallel
parallel -j 3 rsync -Pav --files-from=filelist.txt / user@{}:/backup ::: host1 host2 host3
Key advantages:
- Single directory scan for all transfers
- Flexible parallel execution control
- Option to manually edit filelist if needed
For large-scale deployments, consider setting up rsync in daemon mode:
# /etc/rsyncd.conf sample
[junk_backup1]
path = /backup/host1
auth users = backup_user
secrets file = /etc/rsyncd.secrets
[junk_backup2]
path = /backup/host2
auth users = backup_user
secrets file = /etc/rsyncd.secrets
Then execute with:
rsync -Pav --files-from=filelist.txt / junk::junk_backup1 junk::junk_backup2
For maximum performance on systems with tmpfs:
# Create memory-backed filelist
mount -t tmpfs -o size=128M tmpfs /mnt/ramdisk
find /junk -type f > /mnt/ramdisk/filelist.txt
# Share the same memory-mapped file
for host in host1 host2 host3; do
rsync -Pav --files-from=/mnt/ramdisk/filelist.txt / user@$host:/backup &
done
wait
Method | Directory Scan | Transfer Time | Memory Usage |
---|---|---|---|
Sequential | 3x | Slowest | Low |
--files-from | 1x | Fastest | Medium |
Daemon Mode | 1x | Fast | High |
For most use cases, Solution 1 provides the best balance between simplicity and performance. The --files-from approach reduces the directory scanning overhead while maintaining rsync's powerful delta-transfer capabilities.
When dealing with large directories (especially on slow storage devices), repeatedly scanning the same file structure for multiple rsync operations becomes a significant bottleneck. The initial filelist generation phase often takes longer than the actual data transfer.
While the sequential approach works:
rsync -Pav /source user@host1:/dest
rsync -Pav /source user@host2:/dest
rsync -Pav /source user@host3:/dest
We can optimize this by either:
# Generate filelist once
find /source -type f > filelist.txt
# Use the same filelist for multiple destinations
rsync -Pav --files-from=filelist.txt / user@host1:/dest
rsync -Pav --files-from=filelist.txt / user@host2:/dest
rsync -Pav --files-from=filelist.txt / user@host3:/dest
For true parallel transfers while maintaining a single file scan:
# Single scan, parallel transfers
find /source -type f > filelist.txt
parallel -j 3 rsync -Pav --files-from=filelist.txt / user@{}:/dest ::: host1 host2 host3
For backup scenarios where you want to maintain hardlinks between destinations:
rsync -Pav --link-dest=/previous_backup --files-from=filelist.txt / user@host1:/current_backup
rsync -Pav --link-dest=/previous_backup --files-from=filelist.txt / user@host2:/current_backup
- The --files-from method typically reduces total runtime by 40-60% for large directories
- Parallel execution adds network bandwidth overhead but can complete transfers faster
- For very large filelists (>100K files), consider splitting into chunks
When scripting these solutions, implement proper error checking:
if ! find /source -type f > filelist.txt; then
echo "Filelist generation failed" >&2
exit 1
fi
for host in host1 host2 host3; do
if ! rsync -Pav --files-from=filelist.txt / user@$host:/dest; then
echo "rsync to $host failed" >&2
fi
done