Linux High Load Average Diagnosis: CPU Underutilization with Zero I/O Wait in Ruby/Unicorn Environment


2 views

When load averages exceed CPU core counts while showing healthy CPU idle percentages (65-70%) and negligible I/O wait (0.02-0.03%), we're dealing with one of Linux's more subtle performance puzzles. The traditional wait queue explanation doesn't apply here, as ps shows no processes in D (uninterruptible sleep) state.

# Quick load check
uptime
 15:42:30 up 3 days,  2:15,  1 user,  load average: 4.61, 4.46, 4.48

# Process state distribution
ps -eo state | sort | uniq -c
    173 R
      1 S
      2 Z

The combination of Ruby 1.9's Global VM Lock and Unicorn's forking model creates unique scheduling behaviors:

# Sample Unicorn worker stuck analysis
strace -p $(pgrep -f 'unicorn.*worker' | head -1) -T -tt -o /tmp/unicorn-trace.log

# Common patterns found:
# 1. Contended mutexes in Ruby land
# 2. PGconn blocking calls (despite fast DB response)
# 3. Garbage collection pauses

ESXi scheduling can introduce hidden latency spikes. Check for CPU ready time:

# ESXTOP metrics to monitor (host-side)
CPU: %RDY - Should be < 5%
CPU: %MLMTD - Should be 0%
MEM: %ACTV - Compare with guest view

Despite modest network throughput (200KB/s), TCP stack issues can manifest as load:

# Socket allocation status
ss -s
Total: 174 (kernel 199)
TCP:   154 (estab 72, closed 66, orphaned 0, synrecv 0, timewait 65/0), ports 0

# Connection latency measurement
tcpping -C -x 5 postgres-host 5432

Here's an actionable checklist to isolate the issue:

#!/bin/bash
# Load correlation script
while true; do
  echo "$(date '+%H:%M:%S') $(uptime | awk -F'[a-z]:' '{print $2}') \
  $(vmstat 1 2 | tail -1 | awk '{print $1}')" >> /tmp/load_r.txt
  sleep 5
done

# Generate flame graphs
perf record -F 99 -p $(pgrep -f unicorn) -g -- sleep 30
perf script | stackcollapse-perf.pl | flamegraph.pl > unicorn.svg

Tune these Unicorn and OS parameters in your environment:

# /etc/sysctl.conf additions
kernel.sched_latency_ns = 10000000
kernel.sched_migration_cost_ns = 5000000
vm.dirty_ratio = 10
vm.swappiness = 1

# unicorn.rb adjustments
worker_processes 3 # For 2-core VM
preload_app true
GC.respond_to?(:copy_on_write_friendly=) && 
  GC.copy_on_write_friendly = true

Even with fast queries, connection setup can cause queuing:

# Install pg_bouncer config snippet
[databases]
maindb = host=primary-db port=5432 dbname=app_prod pool_mode=transaction

# Connection pool check
SELECT client_addr, state, count(*) 
FROM pg_stat_activity 
GROUP BY 1,2;

When examining Linux server performance metrics, we typically expect load average to correlate with CPU utilization. However, in this case we observe load averages consistently exceeding the 2-core capacity (peaking at 6.50) while CPU remains at 55-70% idle. The absence of significant I/O wait (0.02-0.03% iowait) or disk utilization (0.2-0.37% disk util) makes this particularly puzzling.

# Sample sar output showing the discrepancy
12:45:01 PM     all     33.44% CPU     ldavg-5: 3.31
01:25:01 PM     all     41.60% CPU     ldavg-5: 6.50

The Ruby 1.9 + Unicorn combination suggests several investigation paths:

# Check Unicorn worker status
$ ps aux | grep unicorn
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
app      12345  2.3  4.7 1023456 245678 ?      S    09:00   1:23 unicorn_rails worker[1]
app      12346  2.1  4.6 1022156 244321 ?      S    09:00   1:19 unicorn_rails worker[2]

While network throughput appears normal (200KB/s), latency between application and PostgreSQL could cause workers to stall:

# Measure PostgreSQL connection latency
require 'benchmark'
Benchmark.measure { ActiveRecord::Base.connection.execute("SELECT 1") }
# => 0.000500   0.000020   0.000520 (  0.510520)

The ESXi 5.1 virtualization layer introduces potential CPU scheduling issues:

# Check for CPU steal time
$ grep -E 'model name|steal' /proc/cpuinfo
model name      : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
steal           : 0

When standard metrics don't reveal the issue, we need deeper tools:

# Install and run perf for CPU profiling
$ sudo apt-get install linux-tools-common linux-tools-generic
$ sudo perf top -p $(pgrep -f unicorn)

# Sample output showing Ruby GC activity
  23.45%  ruby       [.] gc_mark_children
  18.76%  ruby       [.] rb_vm_exec
  12.34%  ruby       [.] rb_yield_0

Tuning Unicorn and Ruby parameters can help mitigate load issues:

# unicorn.rb configuration adjustments
worker_processes 2 # Match vCPU count
timeout 30         # Reduce from default 60
preload_app true   # Enable copy-on-write

# Ruby GC tuning
ENV['RUBY_GC_HEAP_GROWTH_FACTOR'] = '1.1'
ENV['RUBY_GC_MALLOC_LIMIT'] = '40000000'

Traditional tools might miss Ruby-specific bottlenecks:

# Install rbtrace for live process inspection
$ gem install rbtrace
$ rbtrace -p UNICORN_PID -m 'GC.stat'

# Sample output showing memory pressure
{
  :count=>23,
  :heap_allocated_pages=>100,
  :heap_sorted_length=>132,
  :total_allocated_objects=>4523567,
  :malloc_increase_bytes=>1203456
}