Efficient Linux Backup Using inotify: Real-time File Change Detection for Faster Incremental Backups


3 views

Most Linux backup tools like rsync or tar need to scan the entire filesystem to identify changed files, which becomes painfully slow as data volumes grow. Even "incremental" backup solutions often rely on filesystem timestamps or full-tree comparisons.

The Linux kernel's inotify API provides exactly what we need - real-time notifications of file system events. Instead of scanning, we can get notified immediately when files change:

# Example inotifywait command to monitor a directory
inotifywait -m -r /path/to/watch \
    -e create -e modify -e delete -e move \
    --format '%w%f' | while read FILE
do
    echo "File changed: $FILE"
    # Add to backup queue here
done

Several mature projects implement this approach:

  • incron: Like cron but triggered by inotify events instead of time
  • lsyncd: Lightweight live sync daemon with inotify support
  • fswatch: Cross-platform file change monitor with multiple backends

For those who need more control, here's a Python example using pyinotify:

import pyinotify
import subprocess

class EventHandler(pyinotify.ProcessEvent):
    def process_IN_CREATE(self, event):
        self.backup_file(event.pathname)
    
    def process_IN_MODIFY(self, event):
        self.backup_file(event.pathname)
    
    def backup_file(self, path):
        subprocess.run(["rsync", "-az", path, "nas:/backup/"])

wm = pyinotify.WatchManager()
handler = EventHandler()
notifier = pyinotify.Notifier(wm, handler)
wdd = wm.add_watch('/important/data', pyinotify.ALL_EVENTS)

notifier.loop()

While inotify is efficient, there are limitations to consider:

  • inotify watches consume kernel memory (adjust /proc/sys/fs/inotify/max_user_watches if needed)
  • Network filesystems may not support inotify properly
  • For very high write volumes, consider batching changes

You can combine inotify with traditional tools for a hybrid approach:

# Weekly full backup + inotify-driven incrementals
0 3 * * 0 rsync -a /data /backup/full
* * * * * inotifywait -r -q -e create,modify,move --format '%w%f' /data | xargs -I{} rsync -a {} /backup/incremental

Most Linux backup tools like rsync or tar crawl the entire filesystem during each backup operation. This becomes painfully slow when dealing with large datasets, even when performing incremental backups. The fundamental issue is that these tools need to scan every file to determine changes, rather than knowing exactly which files were modified.

The Linux kernel's inotify API provides exactly what we need - real-time file system event monitoring. Instead of scanning everything, we can:

# Example inotifywait command
inotifywait -m -r --format '%w%f' -e modify,create,delete /path/to/watch |
while read FILE
do
    echo "File changed: $FILE"
    # Add your backup logic here
done

Several mature tools combine inotify with backup functionality:

  • lsyncd: Lightweight daemon using inotify and rsync
  • incron: Like cron but triggered by inotify events
  • watchexec: General-purpose file watching utility

For developers who need more control, here's a Python implementation using pyinotify:

import pyinotify
import subprocess

class EventHandler(pyinotify.ProcessEvent):
    def process_IN_MODIFY(self, event):
        self.backup_file(event.pathname)
    
    def backup_file(self, path):
        subprocess.run(['rsync', '-az', path, 'nas:/backup/'])

wm = pyinotify.WatchManager()
handler = EventHandler()
notifier = pyinotify.Notifier(wm, handler)
wdd = wm.add_watch('/data', pyinotify.IN_MODIFY, rec=True)
notifier.loop()

When implementing inotify-based backups:

  • Watch limits: Increase fs.inotify.max_user_watches if needed
  • Event queue: Handle bursts of filesystem events properly
  • Network delays: Implement proper error handling for remote backups

For mission-critical systems, consider combining inotify with:

  • Database dumps before file backup
  • Checksum verification
  • Versioned backups using hardlinks