Most Linux backup tools like rsync or tar need to scan the entire filesystem to identify changed files, which becomes painfully slow as data volumes grow. Even "incremental" backup solutions often rely on filesystem timestamps or full-tree comparisons.
The Linux kernel's inotify API provides exactly what we need - real-time notifications of file system events. Instead of scanning, we can get notified immediately when files change:
# Example inotifywait command to monitor a directory
inotifywait -m -r /path/to/watch \
-e create -e modify -e delete -e move \
--format '%w%f' | while read FILE
do
echo "File changed: $FILE"
# Add to backup queue here
done
Several mature projects implement this approach:
- incron: Like cron but triggered by inotify events instead of time
- lsyncd: Lightweight live sync daemon with inotify support
- fswatch: Cross-platform file change monitor with multiple backends
For those who need more control, here's a Python example using pyinotify:
import pyinotify
import subprocess
class EventHandler(pyinotify.ProcessEvent):
def process_IN_CREATE(self, event):
self.backup_file(event.pathname)
def process_IN_MODIFY(self, event):
self.backup_file(event.pathname)
def backup_file(self, path):
subprocess.run(["rsync", "-az", path, "nas:/backup/"])
wm = pyinotify.WatchManager()
handler = EventHandler()
notifier = pyinotify.Notifier(wm, handler)
wdd = wm.add_watch('/important/data', pyinotify.ALL_EVENTS)
notifier.loop()
While inotify is efficient, there are limitations to consider:
- inotify watches consume kernel memory (adjust /proc/sys/fs/inotify/max_user_watches if needed)
- Network filesystems may not support inotify properly
- For very high write volumes, consider batching changes
You can combine inotify with traditional tools for a hybrid approach:
# Weekly full backup + inotify-driven incrementals
0 3 * * 0 rsync -a /data /backup/full
* * * * * inotifywait -r -q -e create,modify,move --format '%w%f' /data | xargs -I{} rsync -a {} /backup/incremental
Most Linux backup tools like rsync or tar crawl the entire filesystem during each backup operation. This becomes painfully slow when dealing with large datasets, even when performing incremental backups. The fundamental issue is that these tools need to scan every file to determine changes, rather than knowing exactly which files were modified.
The Linux kernel's inotify API provides exactly what we need - real-time file system event monitoring. Instead of scanning everything, we can:
# Example inotifywait command
inotifywait -m -r --format '%w%f' -e modify,create,delete /path/to/watch |
while read FILE
do
echo "File changed: $FILE"
# Add your backup logic here
done
Several mature tools combine inotify with backup functionality:
- lsyncd: Lightweight daemon using inotify and rsync
- incron: Like cron but triggered by inotify events
- watchexec: General-purpose file watching utility
For developers who need more control, here's a Python implementation using pyinotify:
import pyinotify
import subprocess
class EventHandler(pyinotify.ProcessEvent):
def process_IN_MODIFY(self, event):
self.backup_file(event.pathname)
def backup_file(self, path):
subprocess.run(['rsync', '-az', path, 'nas:/backup/'])
wm = pyinotify.WatchManager()
handler = EventHandler()
notifier = pyinotify.Notifier(wm, handler)
wdd = wm.add_watch('/data', pyinotify.IN_MODIFY, rec=True)
notifier.loop()
When implementing inotify-based backups:
- Watch limits: Increase fs.inotify.max_user_watches if needed
- Event queue: Handle bursts of filesystem events properly
- Network delays: Implement proper error handling for remote backups
For mission-critical systems, consider combining inotify with:
- Database dumps before file backup
- Checksum verification
- Versioned backups using hardlinks