Parallel Rsync Safety: Can Multiple Instances Transfer Same Directory Without Conflicts?


4 views

When dealing with large-scale data transfers, administrators often consider parallelizing rsync operations to accelerate the process. The fundamental question revolves around whether multiple rsync instances can safely operate on the same source and destination paths simultaneously when using rsyncd.

Let's examine what happens at the protocol level when running parallel rsync instances:


# Example of parallel rsync commands (potentially unsafe)
rsync -av /home/data/ backups@remote::Storage &
rsync -av /home/data/ backups@remote::Storage &
rsync -av /home/data/ backups@remote::Storage &

1. File Locking Conflicts: Rsyncd doesn't implement file-level locking across multiple instances
2. Checksum Calculation Overhead: Each instance will independently calculate checksums
3. Network Contention: Multiple TCP connections competing for bandwidth

Instead of running identical rsync commands, consider these approaches:


# Split by directory structure
rsync -av /home/data/project1/ backups@remote::Storage/project1/ &
rsync -av /home/data/project2/ backups@remote::Storage/project2/ &

# Or use --include/--exclude patterns
rsync -av --include='*.jpg' --exclude='*' /home/data backups@remote::Storage &
rsync -av --include='*.mp4' --exclude='*' /home/data backups@remote::Storage &

In limited scenarios with:
- Read-only source files that won't change during transfer
- Destination supporting atomic writes (like ZFS or Btrfs)
- Small files where transfer completes before overlap occurs

To empirically test the impact:


time rsync -av /home/data/ backups@remote::Storage
# vs
time parallel -j 3 rsync -av /home/data/ backups@remote::Storage ::: {1..3}

When managing large-scale backups, many sysadmins consider parallel rsync execution as a potential performance booster. The fundamental question revolves around whether multiple instances of:

rsync -av /home/directory/ backups@1.1.1.1::Home

can safely run simultaneously against the same source and destination.

While rsync itself won't crash when running multiple instances, several critical factors affect the actual outcome:

  • File Locking Mechanisms: Modern filesystems implement proper locking, but race conditions can still occur during metadata updates
  • Network Saturation: Multiple streams may compete for bandwidth without proper throttling
  • rsyncd Considerations: The daemon's max connections setting and lock files behavior directly impact parallel operations

Instead of blindly running identical commands, consider partitioning the workload:

# Split by subdirectories
rsync -av /home/directory/folder1/ backups@1.1.1.1::Home/folder1/ &
rsync -av /home/directory/folder2/ backups@1.1.1.1::Home/folder2/ &

# Or use --include/--exclude patterns
rsync -av --include '2023*/' --exclude '*' /home/directory/ backups@1.1.1.1::Home &
rsync -av --include '2024*/' --exclude '*' /home/directory/ backups@1.1.1.1::Home &

For enterprise environments, these approaches prove more reliable:

# Using parallel with file lists
find /home/directory -type f | parallel -j 4 rsync -a {} backups@1.1.1.1::Home

# With checksum verification
parallel -j 4 rsync -acv ::: \
  /home/directory/{folder1,folder2,folder3} \
  backups@1.1.1.1::Home

Modify your rsyncd.conf for better parallel handling:

[Home]
path = /backup/home
max connections = 10
lock file = /var/run/rsyncd.lock
use chroot = no
read only = no