Diagnosing and Resolving kworker High IO Usage with Zero Disk Write on Apache AWS Servers


1 views

When running Apache on AWS Linux AMI with EBS storage, you might encounter a situation where kworker processes consume excessive IO (90%+) while showing zero actual disk activity. This manifests as high load averages (>8) despite normal memory availability.

# Typical iotop output showing the anomaly:
Total DISK READ: 0.00 B/s | Total DISK WRITE: 2.37 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND             
 3730 be/4 root        0.00 B      0.00 B     0.00 % 91.98 % [kworker/u8:1]
  774 be/3 root        0.00 B   1636.00 K     0.00 % 15.77 % [jbd2/xvda1-8]

The critical observation is that these kernel workers become inactive when Apache is stopped. This points to filesystem/journaling operations triggered by web server activity rather than actual disk bottlenecks.

The perf report shows significant time spent in Xen hypervisor scheduling:

Samples: 114K of event 'cpu-clock'
-  83.58% swapper [kernel.kallsyms] [k] xen_hypercall_sched_op
   + xen_hypercall_sched_op
   + default_idle
   + arch_cpu_idle
   - cpu_startup_entry
        70.16% cpu_bringup_and_idle
      - 29.84% rest_init

Try these diagnostic commands to identify the root cause:

# Check filesystem activity
sudo strace -p [kworker_PID] -e trace=file

# Monitor kernel workqueue events
echo 1 > /sys/kernel/debug/tracing/events/workqueue/enable
cat /sys/kernel/debug/tracing/trace_pipe

# Check journaling activity
sudo dmesg | grep -i jbd2

Based on similar cases, these adjustments often help:

# Adjust dirty_ratio to reduce writeback pressure
echo 10 > /proc/sys/vm/dirty_ratio

# Modify ext4 journaling parameters (for xvda1)
tune2fs -o journal_data_writeback /dev/xvda1
tune2fs -O ^has_journal /dev/xvda1

# Optimize Apache's file handling
<IfModule mpm_prefork_module>
    MaxRequestWorkers        150
    MaxConnectionsPerChild   1000
</IfModule>

For deeper analysis, use this SystemTap script to track kworker activity:

probe kernel.function("process_one_work") {
    if (execname() == "kworker/u8:1") {
        printf("Work item: %s\n", kernel_string($work->func));
    }
}

AWS EBS configurations that can alleviate this:

# Use provisioned IOPS for consistent performance
aws ec2 modify-volume --volume-id vol-12345 --iops 4000

# Consider gp3 instead of gp2 for better baseline
aws ec2 modify-volume --volume-id vol-12345 --volume-type gp3

Recently, I encountered a puzzling performance issue on our AWS Linux AMI running Apache with EBS storage. The server showed:

  • Consistently high load average (>8)
  • kworker processes consuming >90% IO shown in iotop
  • Zero disk read but significant disk write activity
  • Apache processes showing substantial disk writes
# Sample iotop output
Total DISK READ: 0.00 B/s | Total DISK WRITE: 2.37 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 3730 be/4 root        0.00 B      0.00 B    0.00 % 91.98 % [kworker/u8:1]
  774 be/3 root        0.00 B   1636.00 K    0.00 % 15.77 % [jbd2/xvda1-8]

The first clue was that both kworker and jbd2 processes disappeared when Apache was stopped. This pointed to some filesystem interaction triggered by Apache activity.

Using perf to analyze kernel activity revealed significant time spent in Xen hypercalls and filesystem operations:

# perf report snippet
-  83.58%  swapper  [kernel.kallsyms]  [k] xen_hypercall_sched_op
+   1.73%  httpd   [kernel.kallsyms]  [k] __d_lookup_rcu
+   1.08%  httpd   [kernel.kallsyms]  [k] xen_hypercall_xen_version

After digging deeper, I identified several potential culprits:

  1. Journaling filesystem overhead: The jbd2 process indicates ext4 journal activity
  2. Metadata operations: Apache creating/deleting many small files (logs, sessions, cache)
  3. EBS performance characteristics: High IOPS for metadata operations

To gather more evidence, I used these commands:

# Check filesystem metadata operations
sudo strace -p $(pgrep httpd | head -1) -e trace=file -f 2>&1 | grep -v ENOENT

# Monitor inode operations
sudo inotifywait -rme modify,attrib,move,create,delete /var/www/html/

# Check filesystem mount options
mount | grep xvda1

Based on the findings, I implemented these changes:

# /etc/fstab modifications for EBS volume
/dev/xvda1  /  ext4  defaults,noatime,nodiratime,data=writeback,commit=60  0  1

# Apache configuration for better filesystem interaction
<IfModule mpm_prefork_module>
    # Reduce keepalive to minimize persistent connections
    KeepAlive On
    KeepAliveTimeout 2
    MaxKeepAliveRequests 100
    
    # Adjust for session handling
    php_value session.save_handler redis
    php_value session.save_path "tcp://redis.example.com:6379"
</IfModule>

For more severe cases, consider:

  • Moving session storage to Redis/Memcached
  • Using tmpfs for temporary files
  • Implementing a CDN for static assets
  • Upgrading to EBS Provisioned IOPS volumes

After implementing these changes, the kworker IO wait dropped significantly, and server load returned to normal levels.