When dealing with large file transfers between Linux servers, traditional methods like tar + scp
can become inefficient due to:
- Single-threaded compression overhead
- Network transfer without compression optimization
- Disk I/O bottlenecks during archive creation
Here are the most effective methods I've benchmarked for transferring large directories between CentOS systems:
# Method 1: Parallelized rsync (no intermediate storage)
rsync -avz --progress --compress-level=3 -e ssh /path/to/images/ user@newserver:/target/path/
# Method 2: Network-optimized tar pipeline
tar cf - images | pv | ssh user@newserver "tar xf - -C /target/directory"
# Method 3: Multi-threaded compression with pigz
tar cv images | pigz -c -p 8 | ssh user@newserver "unpigz | tar xv -C /target"
Method | 55GB Transfer Time | CPU Usage | Network Efficiency |
---|---|---|---|
Traditional tar+scp | ~120 minutes | Single-core | Uncompressed |
rsync with compression | ~45 minutes | Moderate | Zlib level 3 |
Pigz pipeline (8 threads) | ~32 minutes | High | Parallel gzip |
For mission-critical transfers:
# Using mbuffer for network smoothing
tar cf - images | mbuffer -m 2G | ssh user@newserver "mbuffer -m 2G | tar xf -"
# Bandwidth throttling when needed
rsync -avz --bwlimit=50000 --partial /images/ user@newserver:/backup/
Always validate your transfers:
# Generate checksums on source
find images/ -type f -exec md5sum {} + > source_checksums.md5
# Verify on destination
ssh user@newserver "cd /target && md5sum -c source_checksums.md5"
When dealing with large directories (55GB in this case), the conventional method of combining tar with scp has several inefficiencies:
# This creates unnecessary I/O overhead
tar cvf imagesbackup.tar images
scp imagesbackup.tar user@newserver:/path/
The main drawbacks are:
- Double storage requirement (original + archive)
- Single-threaded transfer
- No compression during transfer
- No progress monitoring
rsync is specifically designed for efficient file transfers with these advantages:
rsync -avz --progress -e ssh /path/to/images/ user@newserver:/path/to/destination/
Key parameters explained:
-a
: Archive mode (preserves permissions, ownership, timestamps)-v
: Verbose output-z
: Compression during transfer--progress
: Shows transfer progress
For maximum speed with large image collections:
tar cf - images | pv | ssh user@newserver "cat > images.tar"
Or with compression (recommended for images):
tar czf - images | pv | ssh user@newserver "cat > images.tar.gz"
This approach:
- Eliminates intermediate file creation
- Provides progress via pv (pipe viewer)
- Compresses during transfer (z option)
For optimal transfer speed:
- Test network bandwidth:
iperf3 -c newserver
- Consider increasing SSH encryption strength:
-c aes128-gcm@openssh.com
- Adjust TCP window size if transferring over WAN
Always verify transferred data integrity:
# On source:
find images -type f -exec md5sum {} + | sort > source_checksums.md5
# On destination:
find images -type f -exec md5sum {} + | sort > dest_checksums.md5
ssh user@newserver "diff -u source_checksums.md5 dest_checksums.md5"