When dealing with massive file transfers (2.2 million files totaling 250GB), traditional rsync becomes painfully slow. At a rate of 700K files in 6 hours, this simply doesn't scale for modern infrastructure needs. The single-threaded nature of rsync creates a bottleneck that becomes apparent with large datasets.
Here are some powerful alternatives that leverage parallel processing:
1. fpart + rsync
A clever combination that splits files into partitions for parallel rsync operations:
# Install fpart
sudo apt-get install fpart
# Create file partitions
fpart -f 1000 -o /tmp/part /source/path
# Parallel rsync execution
find /tmp/part* -type f | xargs -P 8 -I % sh -c 'rsync -a --files-from=% / /destination/path'
2. lsync
A live syncing daemon that works well for continuous synchronization:
# Sample lsync configuration
settings {
logfile = "/var/log/lsyncd.log",
statusFile = "/var/log/lsyncd-status.log",
maxProcesses = 8
}
sync {
default.rsync,
source = "/source/path",
target = "user@remote:/destination/path",
rsync = {
archive = true,
compress = true,
whole_file = false
}
}
3. parallel-rsync
A Python wrapper that parallelizes rsync operations:
#!/usr/bin/env python3
import os
from multiprocessing import Pool
def sync_partition(partition):
os.system(f"rsync -avz --files-from={partition} / user@remote:/destination")
if __name__ == "__main__":
partitions = [f"/tmp/part{i}" for i in range(8)]
with Pool(8) as p:
p.map(sync_partition, partitions)
When implementing multi-threaded sync solutions, consider:
- Network bandwidth limitations
- Disk I/O throughput
- CPU overhead of compression
- Filesystem limitations (inode handling)
For Windows-Linux sync or enterprise environments, DeltaCopy provides excellent performance:
# Server setup
sudo apt-get install rsync
sudo systemctl enable rsync
sudo systemctl start rsync
# Client configuration (Windows)
deltacopy.exe /server:linux-server /source:C:\data /dest:/data /parallel:8
For any solution, implement progress tracking:
watch -n 60 "du -sh /destination/path; ls -1 /destination/path | wc -l"
When dealing with 2.2 million files (250GB total), standard rsync becomes painfully slow - transferring just 700K files in 6 hours is unacceptable for production environments. The single-threaded nature of rsync creates severe performance limitations for large-scale operations.
Here are three battle-tested alternatives that support parallel transfers:
# GNU Parallel + Rsync (Basic implementation)
find /source -type f | parallel -j 8 rsync -a {} user@remote:/destination/
# lsync Configuration Example
settings {
insist = true,
maxProcesses = 8
}
sync {
default.rsync,
source = "/data/",
target = "remote:/backup/",
rsync = {
compress = true,
archive = true,
whole_file = false
}
}
Tool | Threads | 2.2M Files ETA | Pros |
---|---|---|---|
rsync (default) | 1 | ~18 hours | Reliable, widely available |
GNU Parallel | Custom (e.g. 8) | ~4 hours | Flexible, uses existing rsync |
lsync | 8 | ~3 hours | Real-time, daemon mode |
fpart + rsync | Custom | ~3.5 hours | Handles huge directories well |
For extreme cases, combine fpart with parallel rsync:
# Step 1: Create file partitions
fpart -f 10000 -o /tmp/parts /source/path
# Step 2: Parallel transfer
find /tmp/parts -name "*.part" | parallel -j 8 \
rsync -a --files-from={} /source/path/ user@remote:/dest/
- Monitor network saturation (iftop/nload)
- Adjust thread counts based on CPU cores (nproc)
- Use --bwlimit during business hours
- Consider checksum alternatives (--size-only for similar files)