Optimized Bidirectional File Synchronization Between Remote Linux Servers: Lsyncd+Csync2 Implementation Guide


2 views

Synchronizing large file trees (200k+ files) between geographically distributed Linux servers presents unique technical constraints:

  • Bi-directional awareness: Changes may originate from either endpoint
  • Resource efficiency: VPS environments limit kernel modifications and RAM usage
  • Real-time responsiveness: Traditional cron-based sync creates unnecessary overhead

After evaluating multiple solutions, the lsyncd+csync2 combination addresses these requirements effectively:

# lsyncd monitors filesystem events and triggers csync2
lsyncd.conf:
sync {
    default.rsync,
    source = "/data",
    target = "remote_server:/data",
    delay = 1,
    rsync = {
        archive = true,
        compress = true
    }
}

Step 1: Install required packages

# Ubuntu/Debian
sudo apt-get install lsyncd csync2 sqlite3

Step 2: Configure csync2 for bidirectional sync

# /etc/csync2/csync2.cfg
group mycluster {
    host server1.example.com;
    host server2.example.com;
    
    key /etc/csync2/key_mycluster;
    include /data;
    auto younger;
    backup-directory /var/backups/csync2;
    backup-generations 3;
}

The system needs special handling for:

  • File conflicts (auto-younger flag in csync2)
  • Network interruptions (built-in retry mechanisms)
  • Permission preservation (maintain uid/gid mapping)
# Tune lsyncd for large file trees
settings {
    logfile = "/var/log/lsyncd.log",
    statusFile = "/var/log/lsyncd-status.log",
    maxDelays = 1000,
    nodaemon = false,
    inotifyMode = "Modify"
}

Monitoring can be implemented through csync2's status checks:

csync2 -T -v -N mycluster

The architecture easily extends to additional servers by:

  1. Adding hosts to csync2 group configuration
  2. Configuring new lsyncd endpoints
  3. Implementing proper key distribution
Solution Bi-directional Real-time VPS Friendly
Unison Yes No Yes
GlusterFS Yes Yes No
DRBD Yes Yes No

When dealing with a constantly growing file tree (~200k files) across geographically distant Linux servers, traditional solutions like cron-based rsync fall short. The core requirements are:

  • Bidirectional sync capability
  • Low-latency change propagation
  • Minimal resource consumption
  • VPS-friendly operation

After extensive testing, these approaches emerged as viable options:

1. Lsyncd + Csync2 Architecture

This combination provides event-driven synchronization with conflict resolution:


# Example lsyncd configuration (lsyncd.conf.lua)
settings {
    logfile = "/var/log/lsyncd.log",
    statusFile = "/var/log/lsyncd-status.log",
    maxProcesses = 5
}

sync {
    default.rsync,
    source = "/path/to/source",
    target = "remote_user@remote_host::module",
    rsync = {
        archive = true,
        compress = true,
        acls = true,
        xattrs = true
    },
    delay = 5
}

Csync2 handles the actual synchronization with its configuration:


# csync2.cfg
group mycluster
{
    host server1.example.com;
    host server2.example.com;

    key /etc/csync2.key;
    include /path/to/shared/folder;
    exclude *~ .*;

    backup-directory /var/backups/csync2;
    backup-generations 3;
    auto younger;
}

2. The Unison Alternative

While no longer actively developed, Unison remains stable for many use cases:


# Sample Unison profile (~/.unison/default.prf)
root = /path/to/local/folder
root = ssh://remote_user@remote_host//path/to/remote/folder

auto = true
batch = true
confirmbigdel = false
fastcheck = true
prefer = newer
retry = 3

Key lessons from production deployments:

  • For VPS environments, csync2's memory footprint (typically <100MB) is preferable to GlusterFS
  • Lsyncd's delay parameter (5-15 sec) balances responsiveness and batching
  • Always implement file locking for applications writing to synchronized directories

Essential monitoring commands:


# Check lsyncd status
systemctl status lsyncd

# Verify csync2 synchronization
csync2 -xv

# Monitor file change events
inotifywait -r -m -e modify,create,delete /sync/path

The same architecture extends to multiple nodes by:

  1. Adding entries to csync2's host list
  2. Configuring lsyncd to monitor all relevant directories
  3. Implementing a star topology for better performance

For environments expecting rapid growth, consider adding a Redis-based change journal to track file modifications across the cluster.