How to Display File Paths Instead of FD Numbers in strace Read/Write Syscall Tracing


12 views

When debugging Linux processes with strace, one common frustration appears in the output of read/write system calls:

read(3, "data", 1024)
write(5, "response", 8)

The numeric file descriptors (3, 5 in this case) require manual lookup in /proc/$PID/fd/ to identify the actual files. While strace already resolves error codes and formats strings, it doesn't automatically resolve these file descriptors to paths.

The obvious solution of checking earlier open() calls works for:

  • Short-running processes
  • When you've captured the entire execution

But fails for:

  • Long-running processes (days/weeks)
  • When attaching to existing processes with strace -p $PID
  • Programs that dynamically manage file descriptors

1. Using strace's -y and -yy Flags

The modern solution comes from strace itself in versions 4.11+:

strace -yy -p $PID

This produces output like:

read(3, "data", 1024) = 4
write(5, "response", 8) = 8

The -y flag shows paths where available, while -yy includes additional details about special file descriptors.

2. Combining strace with lsof

For older strace versions or when you need more control:

strace -p $PID 2>&1 | awk '/^[rw]/{print $1, system("ls -l /proc/$PID/fd/" substr($1, index($1,"(")+1, 1))}'

This awk command:

  1. Captures read/write syscalls
  2. Extracts the FD number
  3. Looks up the path via /proc

3. Automated Tracing Script

For comprehensive monitoring:

#!/bin/bash
PID=$1
TRACE_FILE="/tmp/strace_$PID.log"

strace -yy -e trace=file -f -p $PID -o $TRACE_FILE &

while true; do
    awk '/^[rw]/ {
        split($0, parts, /[()]/);
        fd = parts[2];
        cmd = "readlink -f /proc/'$PID'/fd/" fd;
        cmd | getline path;
        close(cmd);
        sub(/^[^)]+\)/, $1 "[" path "]");
        print
    }' $TRACE_FILE
    sleep 5
done

Persistent FD Mapping

For processes that recycle file descriptors:

strace -e open,openat,creat -o fd_opens.log -p $PID
strace -e read,write -o ops.log -p $PID

# Correlate later:
awk '{
    if ($1 ~ /open/) { fd_map[$(NF-1)] = $NF }
    if ($1 ~ /^[rw]/) { print $0, fd_map[substr($1,index($1,"(")+1,1)] }
}' fd_opens.log ops.log

BPF/bcc Alternative

For production systems where strace overhead is unacceptable:

#!/usr/bin/bpftrace

#include <linux/sched.h>

kprobe:__x64_sys_read,
kprobe:__x64_sys_write
{
    $fd = arg0;
    $task = (struct task_struct *)curtask;
    $path = $task->files->fdt->fd[$fd]->f_path.dentry->d_name.name;
    
    printf("%s(%d) called on %s\\n", func, $fd, $path);
}

When choosing a solution, consider:

Method Overhead Resolution Production Safe
strace -yy High Good No
lsof combo Medium Partial Maybe
BPF/bcc Low Excellent Yes

When using strace to monitor system calls on Linux, you'll notice that read/write operations only display file descriptor numbers (e.g., read(3, buf, 1024)). While you can manually check /proc/$PID/fd/ to find the corresponding filenames, this becomes impractical for:

  • Long-running processes (days/weeks)
  • When attaching to existing processes with strace -p
  • Automated monitoring scenarios

While strace doesn't natively resolve paths for read/write calls, these built-in options help:

# Show all file-related syscalls (open/close/etc)
strace -e trace=file -p PID

# Alternative format showing both fd and filename
strace -y -p PID

The -y flag is particularly useful as it annotates file descriptors with path information when available:

read(3, "data", 1024) = 4

For more comprehensive monitoring, SystemTap provides better visibility:

# Install SystemTap
sudo apt-get install systemtap

# Create a script to map fds to paths
probe syscall.read, syscall.write {
    printf("%s(fd=%d path=%s) %s\n", name(), fd, pathmap(fd), argstr)
}

Here's a Python script that combines strace output with fd information:

import os
import re
import subprocess

def trace_with_paths(pid):
    # Get fd mappings
    fd_map = {}
    fd_dir = f"/proc/{pid}/fd"
    for fd in os.listdir(fd_dir):
        try:
            path = os.readlink(f"{fd_dir}/{fd}")
            fd_map[int(fd)] = path
        except (OSError, ValueError):
            continue
    
    # Run strace and annotate output
    cmd = ["strace", "-p", str(pid)]
    proc = subprocess.Popen(cmd, stderr=subprocess.PIPE)
    
    while True:
        line = proc.stderr.readline().decode()
        if not line:
            break
        
        # Match read/write calls and replace fd numbers
        line = re.sub(r'(read|write)\((\d+)',
                     lambda m: f"{m.group(1)}({m.group(2)}={fd_map.get(int(m.group(2)), '?')}",
                     line)
        print(line)

When implementing these solutions, be aware of:

  • The overhead of constantly resolving /proc/$PID/fd/
  • Race conditions with rapidly changing file descriptors
  • Permission requirements for accessing /proc entries

For production environments, consider these specialized tools:

  • ltrace: For library calls including file operations
  • auditd: Kernel-level auditing with path tracking
  • bpftrace: Modern eBPF-based tracing with path resolution