When debugging Linux processes with strace
, one common frustration appears in the output of read/write system calls:
read(3, "data", 1024)
write(5, "response", 8)
The numeric file descriptors (3, 5 in this case) require manual lookup in /proc/$PID/fd/
to identify the actual files. While strace already resolves error codes and formats strings, it doesn't automatically resolve these file descriptors to paths.
The obvious solution of checking earlier open()
calls works for:
- Short-running processes
- When you've captured the entire execution
But fails for:
- Long-running processes (days/weeks)
- When attaching to existing processes with
strace -p $PID
- Programs that dynamically manage file descriptors
1. Using strace's -y and -yy Flags
The modern solution comes from strace itself in versions 4.11+:
strace -yy -p $PID
This produces output like:
read(3, "data", 1024) = 4
write(5, "response", 8) = 8
The -y
flag shows paths where available, while -yy
includes additional details about special file descriptors.
2. Combining strace with lsof
For older strace versions or when you need more control:
strace -p $PID 2>&1 | awk '/^[rw]/{print $1, system("ls -l /proc/$PID/fd/" substr($1, index($1,"(")+1, 1))}'
This awk command:
- Captures read/write syscalls
- Extracts the FD number
- Looks up the path via /proc
3. Automated Tracing Script
For comprehensive monitoring:
#!/bin/bash
PID=$1
TRACE_FILE="/tmp/strace_$PID.log"
strace -yy -e trace=file -f -p $PID -o $TRACE_FILE &
while true; do
awk '/^[rw]/ {
split($0, parts, /[()]/);
fd = parts[2];
cmd = "readlink -f /proc/'$PID'/fd/" fd;
cmd | getline path;
close(cmd);
sub(/^[^)]+\)/, $1 "[" path "]");
print
}' $TRACE_FILE
sleep 5
done
Persistent FD Mapping
For processes that recycle file descriptors:
strace -e open,openat,creat -o fd_opens.log -p $PID
strace -e read,write -o ops.log -p $PID
# Correlate later:
awk '{
if ($1 ~ /open/) { fd_map[$(NF-1)] = $NF }
if ($1 ~ /^[rw]/) { print $0, fd_map[substr($1,index($1,"(")+1,1)] }
}' fd_opens.log ops.log
BPF/bcc Alternative
For production systems where strace overhead is unacceptable:
#!/usr/bin/bpftrace
#include <linux/sched.h>
kprobe:__x64_sys_read,
kprobe:__x64_sys_write
{
$fd = arg0;
$task = (struct task_struct *)curtask;
$path = $task->files->fdt->fd[$fd]->f_path.dentry->d_name.name;
printf("%s(%d) called on %s\\n", func, $fd, $path);
}
When choosing a solution, consider:
Method | Overhead | Resolution | Production Safe |
---|---|---|---|
strace -yy | High | Good | No |
lsof combo | Medium | Partial | Maybe |
BPF/bcc | Low | Excellent | Yes |
When using strace
to monitor system calls on Linux, you'll notice that read/write operations only display file descriptor numbers (e.g., read(3, buf, 1024)
). While you can manually check /proc/$PID/fd/
to find the corresponding filenames, this becomes impractical for:
- Long-running processes (days/weeks)
- When attaching to existing processes with
strace -p
- Automated monitoring scenarios
While strace doesn't natively resolve paths for read/write calls, these built-in options help:
# Show all file-related syscalls (open/close/etc)
strace -e trace=file -p PID
# Alternative format showing both fd and filename
strace -y -p PID
The -y
flag is particularly useful as it annotates file descriptors with path information when available:
read(3, "data", 1024) = 4
For more comprehensive monitoring, SystemTap provides better visibility:
# Install SystemTap
sudo apt-get install systemtap
# Create a script to map fds to paths
probe syscall.read, syscall.write {
printf("%s(fd=%d path=%s) %s\n", name(), fd, pathmap(fd), argstr)
}
Here's a Python script that combines strace output with fd information:
import os
import re
import subprocess
def trace_with_paths(pid):
# Get fd mappings
fd_map = {}
fd_dir = f"/proc/{pid}/fd"
for fd in os.listdir(fd_dir):
try:
path = os.readlink(f"{fd_dir}/{fd}")
fd_map[int(fd)] = path
except (OSError, ValueError):
continue
# Run strace and annotate output
cmd = ["strace", "-p", str(pid)]
proc = subprocess.Popen(cmd, stderr=subprocess.PIPE)
while True:
line = proc.stderr.readline().decode()
if not line:
break
# Match read/write calls and replace fd numbers
line = re.sub(r'(read|write)\((\d+)',
lambda m: f"{m.group(1)}({m.group(2)}={fd_map.get(int(m.group(2)), '?')}",
line)
print(line)
When implementing these solutions, be aware of:
- The overhead of constantly resolving /proc/$PID/fd/
- Race conditions with rapidly changing file descriptors
- Permission requirements for accessing /proc entries
For production environments, consider these specialized tools:
- ltrace: For library calls including file operations
- auditd: Kernel-level auditing with path tracking
- bpftrace: Modern eBPF-based tracing with path resolution