When troubleshooting disk I/O bottlenecks on Linux systems, standard iostat
shows device-level metrics but lacks process visibility. This becomes problematic when you need to identify if a specific process is causing excessive disk activity.
While iostat doesn't natively support per-process monitoring, these alternatives provide the granularity you need:
1. Using pidstat (from sysstat package)
pidstat -d -p [PID] 1 5
# Example monitoring PID 1234:
pidstat -d -p 1234 1 5
# Output shows kB_read/s and kB_wrtn/s per process
2. Leveraging iotop
sudo iotop -o -p [PID]
# Example for PID 5678:
sudo iotop -o -p 5678
# Displays real-time disk I/O per process
For deeper analysis, combine these tools with process monitoring:
# Find disk-intensive processes
sudo iotop -bot -d 5
# Then drill down with strace
sudo strace -p [PID] -e trace=open,read,write,close
Key metrics to watch for:
- kB_read/s > 1MB/s consistently may indicate heavy reads
- kB_wrtn/s spikes often correlate with write-heavy operations
- Await time > 10ms suggests disk contention
# First identify PostgreSQL PID
pgrep -x postgres
# Then monitor its I/O
pidstat -d -p $(pgrep -x postgres) 1 10
Remember that sustained high I/O might indicate poor query patterns or missing indexes rather than hardware issues.
When troubleshooting performance issues or investigating disk-intensive processes, standard tools like iostat
show system-wide disk metrics but don't provide process-level visibility. This becomes critical when you need to identify which specific process is causing excessive disk I/O.
Linux exposes process-specific I/O statistics through the /proc/[pid]/io
interface. This file contains counters for:
- read_bytes: Bytes read from storage
- write_bytes: Bytes written to storage
- cancelled_write_bytes: Bytes not written due to truncation
The pidstat
tool from sysstat package provides the most convenient way to monitor process I/O:
# Install sysstat if needed
sudo apt-get install sysstat
# Monitor disk I/O for process ID 1234 every 2 seconds
pidstat -d -p 1234 2
For more control, here's a bash script that samples process I/O over time:
#!/bin/bash
PID=$1
INTERVAL=5 # seconds
while true; do
if [ -f /proc/$PID/io ]; then
echo "===== $(date) ====="
cat /proc/$PID/io | grep -E 'read_bytes|write_bytes'
sleep $INTERVAL
else
echo "Process $PID not found"
exit 1
fi
done
When analyzing the output:
- Compare read/write rates against your disk's capabilities (check with
hdparm -Tt /dev/sdX
) - Look for sustained high values rather than temporary spikes
- Combine with
iotop
to see real-time rankings
For complex investigations, SystemTap can trace I/O operations at the syscall level:
# Trace read/write operations for process 1234
stap -e 'probe syscall.read, syscall.write {
if (pid() == 1234) {
printf("%s(%d) %s %d\n", execname(), pid(), name, $count)
}
}'
- Not accounting for filesystem caching (add
--disable-cache
to your tests if needed) - Ignoring indirect I/O from child processes
- Missing temporary spikes by using too long sampling intervals