When troubleshooting Linux performance issues, disk I/O often becomes the bottleneck. While tools like sar
and sysstat
show aggregate statistics, identifying the specific process causing high I/O requires different approaches.
The most effective real-time tool is iotop
, which provides per-process I/O statistics:
sudo iotop -oP
This shows only active I/O processes with percentages. For continuous monitoring:
watch -n 1 iotop -oP
For post-mortem analysis, SystemTap can track historical I/O by process:
stap -e 'probe vfs.read.return, vfs.write.return {
printf("%s %d %s %d\n", execname(), pid(), probefunc(), $return)
}'
1. Using pidstat:
pidstat -d 1
Shows per-process disk I/O with 1-second intervals.
2. Using /proc Filesystem:
for pid in $(ls /proc | grep '^[0-9]'); do
if [ -f /proc/$pid/io ]; then
echo -n "$pid: ";
cat /proc/$pid/io | grep '^read_bytes';
fi;
done
For persistent monitoring, consider setting up atop
with logging:
atop -w /var/log/atop.log 60
This creates rotating logs showing disk I/O per process at 60-second intervals.
While sar
doesn't show per-process data directly, you can correlate its output with process data collected during the same period. For your specific case with high utilization at 13:01:
grep '13:01' /var/log/sysstat/sa*
Combine this with process data collected around that time.
When your Linux system experiences sudden performance degradation, disk I/O often becomes the bottleneck. As shown in the sar -d
output, the system experienced severe I/O waits between 13:00-13:01 with 88.59% utilization and 141ms await time - clear indicators of an I/O-bound system.
While sar
shows aggregate disk activity, these tools help identify culprit processes:
# 1. iotop (requires root)
sudo iotop -oP
# 2. pidstat (from sysstat package)
pidstat -dl 1
# 3. dstat (shows per-process I/O)
dstat --top-io
Although sar
itself doesn't track per-process stats, you can correlate its timestamps with process accounting data:
# Check process accounting logs
ausearch -ts "12:58:00" -te "13:01:00" -k disk_io
# Or parse audit logs
grep "avc.*io" /var/log/audit/audit.log |
awk -F'pid=' '{print $2}' |
cut -d' ' -f1 |
xargs -I{} ps -p {} -o cmd
For persistent I/O issues, consider these approaches:
# 1. strace heavy processes
strace -f -e trace=file -p $(pgrep problematic_process)
# 2. bcc-tools for deep inspection
/usr/share/bcc/tools/biosnoop
# 3. SystemTap script for I/O profiling
probe ioblock.request {
printf("%d %s %d %s\n", pid(), execname(), devno,
bio_flags_str(flags))
}
Create a cron job to capture I/O offenders during peaks:
#!/bin/bash
LOG=/var/log/io_peaks.log
echo "$(date) - Checking I/O intensive processes" >> $LOG
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu |
head -n 10 >> $LOG
iotop -botqqk -n 1 >> $LOG
Configure this to run every 5 minutes in crontab.
Key metrics to analyze:
- High
await
+ high%util
= Disk bottleneck - Consistent
rd_sec/s
spikes = Read-heavy process - Sudden
wr_sec/s
bursts = Write-intensive operation