Analyzing High %iowait in iostat Output: IBM Server iSCSI Storage Performance Investigation


2 views

Your iostat output reveals several critical performance indicators:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.79    0.01    4.53   72.22    0.00   10.45

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              95.63        48.88       240.95  485589164 2393706728
sdb              29.20       350.49       402.08 3481983365 3994494696

The 72.22% iowait suggests your system is spending excessive time waiting for I/O operations to complete. This explains why file transfers to sdb (your iSCSI target) are slow. For reference:

  • Healthy systems typically show <5% iowait
  • 20-30% indicates moderate contention
  • Over 50% represents severe I/O bottlenecks

Comparing sda (local disk) and sdb (iSCSI):

Metric sda sdb
Transactions/s 95.63 29.20
Read throughput 48.88 Blk/s 350.49 Blk/s
Write throughput 240.95 Blk/s 402.08 Blk/s

For your IBM server connected via iSCSI, consider these diagnostic commands:

# Check iSCSI session details
iscsiadm -m session -P 3

# Monitor network throughput
sar -n DEV 1 10

# Check multipath status (if configured)
multipath -ll

# Verify queue depths
cat /sys/block/sdb/queue/nr_requests

Based on your output, focus on:

  1. Network configuration: Check MTU settings, ensure jumbo frames are properly configured end-to-end
  2. SCSI queue depth: Increase device queue depth if default is too low
  3. Filesystem alignment: Verify proper partition alignment for iSCSI volumes
  4. TCP/IP tuning: Consider adjusting TCP window sizes for WAN iSCSI

Before making changes, establish a baseline:

# Sequential write test
dd if=/dev/zero of=/mnt/iscsi/testfile bs=1M count=1024 oflag=direct

# Random IO test
fio --name=randwrite --ioengine=libaio --iodepth=32 \
--rw=randwrite --bs=4k --direct=1 --size=1G --numjobs=4 \
--runtime=60 --group_reporting

For RHEL 5 systems with similar issues, these commands often help:

# Increase SCSI command timeout
echo 180 > /sys/block/sdb/device/timeout

# Adjust read-ahead
blockdev --setra 8192 /dev/sdb

# Modify I/O scheduler (try deadline for iSCSI)
echo deadline > /sys/block/sdb/queue/scheduler

# Increase TCP buffers
sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"

Create a continuous monitoring script to track improvements:

#!/bin/bash
while true; do
    date >> /var/log/iscsi_perf.log
    iostat -xk 1 5 >> /var/log/iscsi_perf.log
    iscsiadm -m session -P 3 >> /var/log/iscsi_perf.log
    sleep 300
done

The provided iostat output reveals critical performance bottlenecks:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.79    0.01    4.53   72.22    0.00   10.45

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              95.63        48.88       240.95  485589164 2393706728
sdb              29.20       350.49       402.08 3481983365 3994494696
  • 72.22% iowait: The CPU is spending excessive time waiting for I/O completion
  • sdb performance: Lower transactions per second (29.20 tps) compared to sda (95.63 tps) despite higher block transfers
  • Throughput imbalance: 350.49 Blk_read/s vs 402.08 Blk_wrtn/s on sdb

Use these commands to gather more details about your iSCSI connection:

# Check iSCSI session details
iscsiadm -m session -P 3

# Monitor real-time network traffic
iftop -i ethX -P  # Replace with your iSCSI interface

# Check multipathing configuration (if applicable)
multipath -ll

1. TCP Parameter Tuning:

# Add to /etc/sysctl.conf
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.core.rmem_max = 4194304
net.core.wmem_max = 4194304

2. iSCSI Queue Depth Adjustment:

# Check current queue_depth
cat /sys/block/sdb/device/queue_depth

# Temporarily increase (example for testing)
echo 64 > /sys/block/sdb/device/queue_depth

Use fio to test raw performance:

# Sequential write test
fio --name=seqwrite --filename=/dev/sdb --bs=1M --size=10G \
    --rw=write --direct=1 --ioengine=libaio

# Random read test
fio --name=randread --filename=/dev/sdb --bs=4k --size=10G \
    --rw=randread --direct=1 --ioengine=libaio --numjobs=4

Create a monitoring script to track performance over time:

#!/bin/bash
LOG_FILE="/var/log/iscsi_perf_$(date +%Y%m%d).log"

echo "Timestamp,Device,tps,read_kB/s,write_kB/s,await,util" > $LOG_FILE
while true; do
    iostat -d -x -k sdb 1 1 | awk '/sdb/ {print strftime("%Y-%m-%d %H:%M:%S"),",",$1",",$2",",$3",",$4",",$10",",$14}' >> $LOG_FILE
    sleep 30
done