Monitoring All File Accesses During a Process Lifetime: System-Level Tracing Techniques


1 views

When debugging complex applications or investigating security incidents, developers often need to track every file a process accesses throughout its entire execution. While lsof shows currently open files, it doesn't provide historical access data. Here are more comprehensive approaches:

The most direct method is tracing system calls using strace:

strace -f -e trace=open,openat,close,read,write -o /tmp/trace.log ./your_program

This will log all file operations including:

  • File openings (open, openat)
  • File closings (close)
  • Read/write operations (optional)

For enterprise-grade monitoring, configure audit rules:

# Monitor all files opened by specific process
auditctl -a exit,always -F arch=b64 -S openat -F pid=1234

# Or monitor files in specific directory
auditctl -w /path/to/watch -p war -k file_access

View logs using ausearch or aureport utilities.

For applications you control, consider adding inotify hooks:

# Python example using pyinotify
import pyinotify

wm = pyinotify.WatchManager()
mask = pyinotify.IN_OPEN | pyinotify.IN_CLOSE

class EventHandler(pyinotify.ProcessEvent):
    def process_IN_OPEN(self, event):
        print(f"File opened: {event.pathname}")

    def process_IN_CLOSE(self, event):
        print(f"File closed: {event.pathname}")

notifier = pyinotify.Notifier(wm, EventHandler())
wdd = wm.add_watch('/path/to/watch', mask, rec=True)
notifier.loop()

For advanced users, kernel probes can trace VFS operations:

# Trace all open() syscalls
echo 'p:myprobe do_sys_open filename=+0(%si):string' > /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
cat /sys/kernel/debug/tracing/trace_pipe

Remember that comprehensive tracing adds overhead:

  • strace can slow execution by 10-100x
  • auditd has significant memory impact at scale
  • Inotify works best for targeted directories

Other utilities worth considering:

  • fatrace - Filesystem activity monitor
  • opensnoop from BPF tools
  • sysdig - Container-aware system exploration

When debugging complex applications or investigating security issues, developers often need to track every file accessed by a process throughout its entire lifetime. While tools like lsof show currently open files, they don't provide historical access data.

The most reliable method is using strace to monitor system calls:

strace -f -e trace=open,openat,close,creat,execve \
       -o process_trace.log \
       -s 1024 \
       ./your_application

This command will log all file-related operations including:

  • File openings (open, openat)
  • File creations (creat)
  • Execution of new programs (execve)

For production systems, Linux's audit subsystem provides more robust tracking:

# Install auditd if not present
sudo apt install auditd

# Add a watch rule for a specific PID
sudo auditctl -a exit,always -F arch=b64 -S openat -F pid=1234

# View the logs
sudo ausearch -p 1234 -i

For applications where you can control the execution environment, intercepting file operations via library preloading can be effective:

// file_tracer.c
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>

typedef int (*orig_open_func_t)(const char *pathname, int flags, ...);

int open(const char *pathname, int flags, ...) {
    orig_open_func_t orig_open;
    orig_open = (orig_open_func_t)dlsym(RTLD_NEXT, "open");
    fprintf(stderr, "File accessed: %s\n", pathname);
    return orig_open(pathname, flags);
}

// Compile with: gcc -shared -fPIC -ldl -o file_tracer.so file_tracer.c
// Usage: LD_PRELOAD=./file_tracer.so ./your_application

While these methods are powerful, they impact performance:

  • strace can slow execution by 10-100x
  • Auditd adds moderate overhead but is more efficient
  • LD_PRELOAD has the least overhead but requires recompilation

For complex applications, processing the logs can reveal valuable patterns:

# Count file accesses by type
awk '/openat/ {print $NF}' process_trace.log | sort | uniq -c | sort -nr

# Generate a timeline
grep 'openat' process_trace.log | awk '{print $1, $NF}' > access_timeline.dat