Synchronizing large file trees (200k+ files) between geographically distributed Linux servers presents unique technical constraints:
- Bi-directional awareness: Changes may originate from either endpoint
- Resource efficiency: VPS environments limit kernel modifications and RAM usage
- Real-time responsiveness: Traditional cron-based sync creates unnecessary overhead
After evaluating multiple solutions, the lsyncd+csync2 combination addresses these requirements effectively:
# lsyncd monitors filesystem events and triggers csync2
lsyncd.conf:
sync {
default.rsync,
source = "/data",
target = "remote_server:/data",
delay = 1,
rsync = {
archive = true,
compress = true
}
}
Step 1: Install required packages
# Ubuntu/Debian
sudo apt-get install lsyncd csync2 sqlite3
Step 2: Configure csync2 for bidirectional sync
# /etc/csync2/csync2.cfg
group mycluster {
host server1.example.com;
host server2.example.com;
key /etc/csync2/key_mycluster;
include /data;
auto younger;
backup-directory /var/backups/csync2;
backup-generations 3;
}
The system needs special handling for:
- File conflicts (auto-younger flag in csync2)
- Network interruptions (built-in retry mechanisms)
- Permission preservation (maintain uid/gid mapping)
# Tune lsyncd for large file trees
settings {
logfile = "/var/log/lsyncd.log",
statusFile = "/var/log/lsyncd-status.log",
maxDelays = 1000,
nodaemon = false,
inotifyMode = "Modify"
}
Monitoring can be implemented through csync2's status checks:
csync2 -T -v -N mycluster
The architecture easily extends to additional servers by:
- Adding hosts to csync2 group configuration
- Configuring new lsyncd endpoints
- Implementing proper key distribution
Solution | Bi-directional | Real-time | VPS Friendly |
---|---|---|---|
Unison | Yes | No | Yes |
GlusterFS | Yes | Yes | No |
DRBD | Yes | Yes | No |
When dealing with a constantly growing file tree (~200k files) across geographically distant Linux servers, traditional solutions like cron-based rsync fall short. The core requirements are:
- Bidirectional sync capability
- Low-latency change propagation
- Minimal resource consumption
- VPS-friendly operation
After extensive testing, these approaches emerged as viable options:
1. Lsyncd + Csync2 Architecture
This combination provides event-driven synchronization with conflict resolution:
# Example lsyncd configuration (lsyncd.conf.lua)
settings {
logfile = "/var/log/lsyncd.log",
statusFile = "/var/log/lsyncd-status.log",
maxProcesses = 5
}
sync {
default.rsync,
source = "/path/to/source",
target = "remote_user@remote_host::module",
rsync = {
archive = true,
compress = true,
acls = true,
xattrs = true
},
delay = 5
}
Csync2 handles the actual synchronization with its configuration:
# csync2.cfg
group mycluster
{
host server1.example.com;
host server2.example.com;
key /etc/csync2.key;
include /path/to/shared/folder;
exclude *~ .*;
backup-directory /var/backups/csync2;
backup-generations 3;
auto younger;
}
2. The Unison Alternative
While no longer actively developed, Unison remains stable for many use cases:
# Sample Unison profile (~/.unison/default.prf)
root = /path/to/local/folder
root = ssh://remote_user@remote_host//path/to/remote/folder
auto = true
batch = true
confirmbigdel = false
fastcheck = true
prefer = newer
retry = 3
Key lessons from production deployments:
- For VPS environments, csync2's memory footprint (typically <100MB) is preferable to GlusterFS
- Lsyncd's delay parameter (5-15 sec) balances responsiveness and batching
- Always implement file locking for applications writing to synchronized directories
Essential monitoring commands:
# Check lsyncd status
systemctl status lsyncd
# Verify csync2 synchronization
csync2 -xv
# Monitor file change events
inotifywait -r -m -e modify,create,delete /sync/path
The same architecture extends to multiple nodes by:
- Adding entries to csync2's host list
- Configuring lsyncd to monitor all relevant directories
- Implementing a star topology for better performance
For environments expecting rapid growth, consider adding a Redis-based change journal to track file modifications across the cluster.