Optimizing lsof Performance on High-Traffic Linux Servers: Diagnosing File Access Bottlenecks in ext3 Filesystems


4 views

When running sudo lsof /tmp/incoming_data.txt on a busy server, the two-minute execution time isn't just an inconvenience - it's a symptom of deeper system interaction issues. The ext3 filesystem's metadata handling combined with lsof's default behavior creates this perfect storm of inefficiency.

# Typical output showing the bottleneck
$ time sudo lsof /tmp/incoming_data.txt
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
python3 30425 root    3r   REG  253,0 14372341 393217 /tmp/incoming_data.txt

real    2m15.43s
user    0m12.11s
sys     1m58.27s

For targeted file access checks, consider these lower-overhead approaches:

# Option 1: Using /proc directly (fastest)
find /proc/[0-9]*/fd -ls 2>/dev/null | grep "/tmp/incoming_data.txt"

# Option 2: Kernel inotify monitoring (persistent solution)
inotifywait -m /tmp -e open | grep incoming_data.txt

# Option 3: eBPF-based tracing (modern kernels)
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { 
    if (str(args->filename) == "/tmp/incoming_data.txt") { 
        printf("%s[%d] opened %s\\n", comm, pid, str(args->filename)); 
    } 
}'

For ext3 filesystems, these kernel parameters can help:

# Reduce inode cache pressure
echo 50 > /proc/sys/vm/vfs_cache_pressure

# Increase dentry cache size
echo 100000 > /proc/sys/fs/file-max

If you absolutely need lsof functionality, these flags improve performance:

# Skip network files and restrict to local FS
sudo lsof -b -w -n /tmp/incoming_data.txt

# Analyze only specific process types
sudo lsof -c python -c java -c ruby /tmp/incoming_data.txt

For persistent monitoring needs, consider implementing a dedicated file access service using these components:

# Sample Go watcher using fanotify
package main

import (
    "golang.org/x/sys/unix"
    "log"
)

func main() {
    fd, err := unix.FanotifyInit(unix.FAN_CLOEXEC|unix.FAN_CLASS_CONTENT,
                                unix.O_RDONLY|unix.O_LARGEFILE)
    if err != nil {
        log.Fatal(err)
    }
    
    err = unix.FanotifyMark(fd, unix.FAN_MARK_ADD|unix.FAN_MARK_MOUNT,
                           unix.FAN_OPEN_PERM, unix.AT_FDCWD, "/tmp")
    if err != nil {
        log.Fatal(err)
    }
    
    // Event processing loop would go here
}

When running sudo lsof /tmp/incoming_data.txt on my production server with heavy TCP traffic, I noticed two concerning behaviors:

  • Execution time exceeding 2 minutes
  • Consistent 99% CPU utilization during operation

Traditional alternatives like fuser showed similar performance characteristics. This suggests a systemic issue with how Linux handles file descriptor enumeration under heavy loads.

The root cause lies in how lsof gathers information:

1. Traverses /proc filesystem for all processes
2. Parses network socket information
3. Cross-references file descriptors
4. Filters results against your path argument

On servers with thousands of TCP connections (common in web servers, databases, etc.), steps 2 and 3 become particularly expensive.

Option 1: Targeted /proc Inspection

find /proc/[0-9]*/fd -ls 2>/dev/null | grep '/tmp/incoming_data.txt'

This skips network socket overhead by directly checking file descriptors.

Option 2: Kernel inotify Monitoring

#!/bin/bash
inotifywait -m /tmp -e open |
while read path action file; do
    if [[ "$file" == "incoming_data.txt" ]]; then
        echo "File opened at $(date)"
    fi
done

Option 3: lsof With Network Filtering

lsof -n -i4,6 -F n 2>/dev/null >/dev/null &
lsof /tmp/incoming_data.txt
Method Execution Time CPU Usage
Standard lsof 120s 99%
/proc inspection 0.8s 15%
inotify setup 0.1s (initial) <1%

For monitoring scenarios:

  • Use auditd rules for security-conscious environments
  • Implement eBPF tracing for low-overhead monitoring
  • Consider FUSE-based solutions for specialized cases