How to Monitor Real-Time IOPS on a Running Linux Server: Tools and Methods


2 views

While theoretical IOPS values (e.g., 90 for SATA, 180 for 10K SAS) provide baseline expectations, production systems often exhibit radically different patterns. Unlike storage benchmarking, live IOPS monitoring helps identify:

  • Actual workload patterns during peak hours
  • Storage subsystem saturation points
  • Application-level IO bottlenecks

The iostat tool (from sysstat package) provides the closest approximation when properly interpreted:

iostat -dx 1 5 | grep -E 'Device|sd[a-z]|nvme[0-9]'

Key columns:

  • r/s + w/s = Total IOPS
  • rkB/s + wkB/s = Throughput
  • await = Average response time (ms)
  • For persistent monitoring, create a script like:

    #!/bin/bash
    INTERVAL=5
    DEVICE=sda
    
    while true; do
      IOPS=$(iostat -d $INTERVAL 2 $DEVICE | tail -1 | awk '{print $2 + $3}')
      echo "$(date '+%FT%T') $IOPS" >> /var/log/iops_monitor.log
    done

    1. SystemTap for Kernel-Level Tracing:

    stap -e 'global io;
    probe ioblock.request {
      io[devname] <<< 1
    }
    probe timer.s(1) {
      foreach(dev in io) {
        printf("%s: %d IOPS\\n", dev, @count(io[dev]))
      }
      delete io
    }'

    2. BPF-based Tools (Requires Linux 4.4+):

    biosnoop -D  # Shows per-process IO with timestamps
    biotop -d 1  # IOPS leaderboard by process

    For distributed systems:

    • Prometheus + node_exporter (metrics labeled by device)
    • Grafana dashboards with per-device IOPS alerts
    • Datadog's storage integration with anomaly detection

    Remember that:

    • RAID configurations multiply physical IOPS
    • Cached writes appear instantaneous (check /proc/meminfo Dirty pages)
    • SSDs show different patterns under sustained loads

    When managing production Linux servers, understanding your actual IOPS (Input/Output Operations Per Second) consumption is crucial for performance tuning and capacity planning. Unlike benchmarking tools that measure maximum theoretical performance, we need practical methods to monitor real-time IOPS.

    The simplest way to get IOPS metrics is through iostat from the sysstat package. While it doesn't directly show "IOPS", we can calculate it from the reported data:

    iostat -dx 1 5
    

    Key columns to watch:

    • r/s: Read operations per second
    • w/s: Write operations per second
    • r_await: Average read latency (ms)
    • w_await: Average write latency (ms)

    For more precise monitoring, we can parse /proc/diskstats directly:

    #!/bin/bash
    DEVICE="sda"
    
    # Get initial values
    read -r _ _ _ _ _ _ _ _ reads1 _ _ _ writes1 _ < <(grep $DEVICE /proc/diskstats)
    sleep 1
    read -r _ _ _ _ _ _ _ _ reads2 _ _ _ writes2 _ < <(grep $DEVICE /proc/diskstats)
    
    # Calculate IOPS
    echo "Read IOPS: $((reads2 - reads1))"
    echo "Write IOPS: $((writes2 - writes1))"
    

    For production environments, consider these robust tools:

    • sysdig: Real-time system monitoring with IOPS metrics
    • atop: Advanced performance monitor with disk pressure indicators
    • Prometheus + node_exporter: For long-term IOPS tracking and visualization

    Here's a Python script that continuously monitors IOPS:

    import time
    
    def get_iops(device):
        with open('/proc/diskstats') as f:
            for line in f:
                if device in line:
                    parts = line.split()
                    return int(parts[3]), int(parts[7])
        return 0, 0
    
    device = 'sda'
    prev_reads, prev_writes = get_iops(device)
    
    while True:
        time.sleep(1)
        curr_reads, curr_writes = get_iops(device)
        read_iops = curr_reads - prev_reads
        write_iops = curr_writes - prev_writes
        print(f"IOPS - Reads: {read_iops}, Writes: {write_iops}")
        prev_reads, prev_writes = curr_reads, curr_writes
    

    When analyzing IOPS data:

    • Compare against your disk's specifications (90 for SATA, 180 for 10k SAS/FC)
    • Watch for sustained high IOPS that may indicate performance bottlenecks
    • Consider both read and write patterns when evaluating storage performance