Optimizing Large-Scale File Sync Between Linux Servers: Multi-Threaded Rsync Alternatives


2 views

When dealing with massive file transfers (2.2 million files totaling 250GB), traditional rsync becomes painfully slow. At a rate of 700K files in 6 hours, this simply doesn't scale for modern infrastructure needs. The single-threaded nature of rsync creates a bottleneck that becomes apparent with large datasets.

Here are some powerful alternatives that leverage parallel processing:

1. fpart + rsync

A clever combination that splits files into partitions for parallel rsync operations:


# Install fpart
sudo apt-get install fpart

# Create file partitions
fpart -f 1000 -o /tmp/part /source/path

# Parallel rsync execution
find /tmp/part* -type f | xargs -P 8 -I % sh -c 'rsync -a --files-from=% / /destination/path'

2. lsync

A live syncing daemon that works well for continuous synchronization:


# Sample lsync configuration
settings {
    logfile = "/var/log/lsyncd.log",
    statusFile = "/var/log/lsyncd-status.log",
    maxProcesses = 8
}

sync {
    default.rsync,
    source = "/source/path",
    target = "user@remote:/destination/path",
    rsync = {
        archive = true,
        compress = true,
        whole_file = false
    }
}

3. parallel-rsync

A Python wrapper that parallelizes rsync operations:


#!/usr/bin/env python3
import os
from multiprocessing import Pool

def sync_partition(partition):
    os.system(f"rsync -avz --files-from={partition} / user@remote:/destination")

if __name__ == "__main__":
    partitions = [f"/tmp/part{i}" for i in range(8)]
    with Pool(8) as p:
        p.map(sync_partition, partitions)

When implementing multi-threaded sync solutions, consider:

  • Network bandwidth limitations
  • Disk I/O throughput
  • CPU overhead of compression
  • Filesystem limitations (inode handling)

For Windows-Linux sync or enterprise environments, DeltaCopy provides excellent performance:


# Server setup
sudo apt-get install rsync
sudo systemctl enable rsync
sudo systemctl start rsync

# Client configuration (Windows)
deltacopy.exe /server:linux-server /source:C:\data /dest:/data /parallel:8

For any solution, implement progress tracking:


watch -n 60 "du -sh /destination/path; ls -1 /destination/path | wc -l"

When dealing with 2.2 million files (250GB total), standard rsync becomes painfully slow - transferring just 700K files in 6 hours is unacceptable for production environments. The single-threaded nature of rsync creates severe performance limitations for large-scale operations.

Here are three battle-tested alternatives that support parallel transfers:


# GNU Parallel + Rsync (Basic implementation)
find /source -type f | parallel -j 8 rsync -a {} user@remote:/destination/

# lsync Configuration Example
settings {
    insist = true,
    maxProcesses = 8
}

sync {
    default.rsync,
    source = "/data/",
    target = "remote:/backup/",
    rsync = {
        compress = true,
        archive = true,
        whole_file = false
    }
}
Tool Threads 2.2M Files ETA Pros
rsync (default) 1 ~18 hours Reliable, widely available
GNU Parallel Custom (e.g. 8) ~4 hours Flexible, uses existing rsync
lsync 8 ~3 hours Real-time, daemon mode
fpart + rsync Custom ~3.5 hours Handles huge directories well

For extreme cases, combine fpart with parallel rsync:


# Step 1: Create file partitions
fpart -f 10000 -o /tmp/parts /source/path

# Step 2: Parallel transfer
find /tmp/parts -name "*.part" | parallel -j 8 \
    rsync -a --files-from={} /source/path/ user@remote:/dest/
  • Monitor network saturation (iftop/nload)
  • Adjust thread counts based on CPU cores (nproc)
  • Use --bwlimit during business hours
  • Consider checksum alternatives (--size-only for similar files)