Linux Performance: How to Identify Processes Blocked on Disk I/O Wait


2 views

When your Linux server shows high load average but CPU usage appears normal and there's no swapping activity, disk I/O contention is often the culprit. The challenge lies in identifying which processes are actually blocked waiting for I/O operations to complete.

While iotop shows active I/O operations, we need different tools to reveal blocked processes:


# 1. Using dstat for comprehensive I/O metrics
dstat -td --disk-util --top-bio

# 2. Checking process states with ps
ps -eo stat,pid,user,command | awk '$1 ~ /D/ {print $0}'

For advanced analysis, SystemTap provides detailed I/O wait tracing:


# SystemTap script to track I/O wait
probe scheduler.iosched.wait {
    printf("%d %s %d\n", pid(), execname(), $rw)
}

Let's examine a real MySQL server case:


# First identify D-state processes
$ ps -eo state,pid,cmd | grep '^D'
D 29843 /usr/sbin/mysqld

# Then check I/O wait with pidstat
$ pidstat -d -p 29843 1 5
Linux 5.4.0-91-generic (db-server)     02/20/2023     _x86_64_    (8 CPU)

02:30:01 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
02:30:02 PM   112     29843      0.00   1024.00      0.00  mysqld

Other useful techniques include:

  • vmstat 1 to check wa CPU time
  • iostat -x 1 for device-level metrics
  • biosnoop from BCC tools for low-level tracing

Key indicators of I/O wait bottlenecks:


# High await time in iostat output
Device:   rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda         0.00     5.00   10.00  150.00    80.00  1200.00    16.00     2.50   15.00    5.00   16.00   6.00  96.00

When your Linux server shows high load averages without corresponding CPU or memory pressure, disk I/O contention is often the culprit. Unlike CPU-bound processes that show up clearly in top or htop, I/O wait states can be more elusive to diagnose.

While iotop shows active I/O operations, these alternatives reveal processes waiting for I/O:


# 1. Using dstat for real-time monitoring
dstat -ta --top-io

# 2. Checking process states with ps
ps -eo state,pid,cmd | grep "^D"

# 3. Comprehensive view with pidstat
pidstat -d 1

Processes in "D" state (uninterruptible sleep) in ps output are typically waiting for I/O. Common scenarios include:

  • Database transactions waiting on slow storage
  • Log rotation processes blocked on large file operations
  • Backup jobs competing for disk access

For deeper investigation, combine these approaches:


# Monitor per-process I/O wait with perf
sudo perf top -e sched:sched_stat_iowait

# Check I/O scheduler queues
cat /sys/block/sdX/queue/nr_requests

# Identify contended files with lsof +D
sudo lsof +D /var/lib/mysql

Consider this troubleshooting session for a stuck backup process:


$ ps -eo state,pid,cmd | grep "^D"
D 29847 /usr/bin/rsync -avz /data backup-server:/backups

$ sudo strace -p 29847
[...]
read(4, 0x7ffd3a1f2000, 131072)    = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[...]

$ sudo lsof -p 29847
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
rsync   29847 root    4r   REG   8,16 4294967296 123456 /data/large_vm_image.qcow2

This reveals the process is blocked trying to read a massive VM image file.

When you identify I/O-bound processes:

  • Prioritize critical processes with ionice
  • Consider filesystem tuning (noatime, data=writeback)
  • Implement rate limiting for non-critical jobs
  • Evaluate storage upgrade options