How to Identify Process-Specific Disk I/O Bottlenecks in Linux: Tools and Techniques


2 views

When troubleshooting Linux performance issues, disk I/O often becomes the bottleneck. While tools like sar and sysstat show aggregate statistics, identifying the specific process causing high I/O requires different approaches.

The most effective real-time tool is iotop, which provides per-process I/O statistics:

sudo iotop -oP

This shows only active I/O processes with percentages. For continuous monitoring:

watch -n 1 iotop -oP

For post-mortem analysis, SystemTap can track historical I/O by process:

stap -e 'probe vfs.read.return, vfs.write.return {
    printf("%s %d %s %d\n", execname(), pid(), probefunc(), $return)
}'

1. Using pidstat:

pidstat -d 1

Shows per-process disk I/O with 1-second intervals.

2. Using /proc Filesystem:

for pid in $(ls /proc | grep '^[0-9]'); do 
    if [ -f /proc/$pid/io ]; then 
        echo -n "$pid: "; 
        cat /proc/$pid/io | grep '^read_bytes'; 
    fi; 
done

For persistent monitoring, consider setting up atop with logging:

atop -w /var/log/atop.log 60

This creates rotating logs showing disk I/O per process at 60-second intervals.

While sar doesn't show per-process data directly, you can correlate its output with process data collected during the same period. For your specific case with high utilization at 13:01:

grep '13:01' /var/log/sysstat/sa*

Combine this with process data collected around that time.


When your Linux system experiences sudden performance degradation, disk I/O often becomes the bottleneck. As shown in the sar -d output, the system experienced severe I/O waits between 13:00-13:01 with 88.59% utilization and 141ms await time - clear indicators of an I/O-bound system.

While sar shows aggregate disk activity, these tools help identify culprit processes:

# 1. iotop (requires root)
sudo iotop -oP

# 2. pidstat (from sysstat package)
pidstat -dl 1

# 3. dstat (shows per-process I/O)
dstat --top-io

Although sar itself doesn't track per-process stats, you can correlate its timestamps with process accounting data:

# Check process accounting logs
ausearch -ts "12:58:00" -te "13:01:00" -k disk_io

# Or parse audit logs
grep "avc.*io" /var/log/audit/audit.log | 
  awk -F'pid=' '{print $2}' | 
  cut -d' ' -f1 | 
  xargs -I{} ps -p {} -o cmd

For persistent I/O issues, consider these approaches:

# 1. strace heavy processes
strace -f -e trace=file -p $(pgrep problematic_process)

# 2. bcc-tools for deep inspection
/usr/share/bcc/tools/biosnoop

# 3. SystemTap script for I/O profiling
probe ioblock.request {
  printf("%d %s %d %s\n", pid(), execname(), devno, 
         bio_flags_str(flags))
}

Create a cron job to capture I/O offenders during peaks:

#!/bin/bash
LOG=/var/log/io_peaks.log
echo "$(date) - Checking I/O intensive processes" >> $LOG
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | 
  head -n 10 >> $LOG
iotop -botqqk -n 1 >> $LOG

Configure this to run every 5 minutes in crontab.

Key metrics to analyze:

  • High await + high %util = Disk bottleneck
  • Consistent rd_sec/s spikes = Read-heavy process
  • Sudden wr_sec/s bursts = Write-intensive operation