Alternative Methods to Identify I/O Bound Processes When iotop is Unavailable


14 views

When troubleshooting system performance, encountering processes in the uninterruptible sleep state (D state) often indicates I/O contention. While iotop is the go-to tool for monitoring I/O-bound processes, systems with numerous D state processes may require alternative diagnostic approaches.

The Linux /proc filesystem provides raw I/O statistics that we can parse:

for pid in $(ls /proc | grep -E '^[0-9]+$'); do
    if [ -f /proc/$pid/io ]; then
        read_bytes=$(awk '/read_bytes/ {print $2}' /proc/$pid/io 2>/dev/null)
        write_bytes=$(awk '/write_bytes/ {print $2}' /proc/$pid/io 2>/dev/null)
        if [ -n "$read_bytes" ] || [ -n "$write_bytes" ]; then
            cmd=$(ps -p $pid -o comm=)
            echo "$pid $cmd Read: $read_bytes Write: $write_bytes"
        fi
    fi
done | sort -k5 -nr | head -n 10

For more sophisticated analysis, SystemTap provides kernel-level instrumentation:

probe vfs.read.return {
    if (bytes_read > 0) {
        io_count[pid(), execname()] += bytes_read
    }
}

probe vfs.write.return {
    if (bytes_written > 0) {
        io_count[pid(), execname()] += bytes_written
    }
}

probe timer.s(5) {
    foreach ([pid, exec] in io_count-) {
        printf("%d %s: %d bytes\n", pid, exec, io_count[pid, exec])
    }
    delete io_count
}

When investigating a slow PostgreSQL database, combine multiple tools:

# Monitor overall disk I/O
vmstat 1 10

# Check process-specific I/O (requires root)
sudo strace -p $(pgrep postgres) -e trace=read,write -tt 2>&1 | \
    grep -v "0 bytes" | \
    awk '{print $2, $1, $3}'

# Alternative using perf
sudo perf record -e block:block_rq_issue -a sleep 10
sudo perf script | awk '{print $5}' | sort | uniq -c | sort -nr

Key metrics to consider when analyzing I/O-bound processes:

  • Consistent high read/write operations in /proc/[pid]/io
  • Frequent I/O system calls in strace output
  • Block device latency in perf traces
  • Correlation between D state processes and disk utilization

When troubleshooting system performance issues, identifying I/O bound processes is crucial. While iotop is the go-to tool, systems sometimes show numerous processes in D (uninterruptible sleep) state, making iotop output difficult to interpret.

Here are several effective methods to identify I/O intensive processes:

Using vmstat with awk

vmstat -n 1 | awk '{print $1,$2,$10,$11,$12,$13,$14,$15,$16,$17}'

This provides real-time I/O statistics including bi (blocks in) and bo (blocks out) columns.

dstat for Comprehensive Monitoring

dstat -td --disk-util --top-io --top-bio

This combination shows disk utilization along with top I/O processes.

pidstat for Process-Level Metrics

pidstat -d 1

Sample output:

03:15:42 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
03:15:43 PM     0      1045      0.00    152.00      0.00  jbd2/sda1-8
03:15:43 PM  1000      2256    128.00      0.00      0.00  mysqld

The Linux kernel exposes I/O statistics through procfs. This script identifies processes with high I/O:

#!/bin/bash
for pid in $(ls /proc | grep '^[0-9]'); do
  if [ -d /proc/$pid ]; then
    io=$(cat /proc/$pid/io 2>/dev/null)
    if [ $? -eq 0 ]; then
      read_bytes=$(echo "$io" | grep '^read_bytes' | awk '{print $2}')
      write_bytes=$(echo "$io" | grep '^write_bytes' | awk '{print $2}')
      cmd=$(ps -p $pid -o cmd=)
      echo "$pid $read_bytes $write_bytes $cmd"
    fi
  fi
done | sort -k2 -nr | head -n 10
  • For containerized environments, check /sys/fs/cgroup/blkio/
  • In Kubernetes, use kubectl top pod --containers with metrics-server installed
  • For historical analysis, configure sar (sysstat package) with appropriate intervals

For modern kernels, consider these BPF tools:

# Install bcc-tools
sudo apt install bpfcc-tools

# Monitor block I/O
sudo biosnoop

# Summarize I/O by process
sudo biotop