Java Process Blocked for 120+ Seconds: Diagnosing High Load Kernel Hangs on Linux Servers


5 views

When Java processes consistently show 400%+ CPU usage and the kernel reports tasks being blocked for over 120 seconds, we're dealing with severe system contention. The key indicators are:

[timestamp] INFO: task java:21547 blocked for more than 120 seconds.
[timestamp] INFO: task kjournald:190 blocked for more than 120 seconds.
[timestamp] INFO: task flush-202:0:709 blocked for more than 120 seconds.

The blocked tasks suggest the kernel is detecting unresponsive processes. Several factors could contribute:

  • I/O Wait Dominance: When kjournald (ext3/ext4 journaling daemon) and flush workers are blocked, it indicates storage subsystem problems
  • Memory Pressure: No swap space can cause OOM killer to trigger improperly
  • CPU Saturation: 400%+ Java CPU suggests thread contention or GC issues

Run these during normal operation to establish baselines:

# Check I/O wait
vmstat 1 10 | awk '{print $16}'

# Monitor disk latency
iostat -xmd 2

# Check memory pressure
free -m && cat /proc/meminfo | grep -E 'MemFree|Swap'

# Identify Java thread states
jstack -l <pid> | grep "java.lang.Thread.State" | sort | uniq -c

For Ubuntu 10.04 with 2.6.x kernels:

# Disable memory overcommit (add to /etc/sysctl.conf)
vm.overcommit_memory = 2
vm.overcommit_ratio = 80

# Adjust dirty page thresholds
vm.dirty_background_ratio = 5
vm.dirty_ratio = 15

# Disable NUMA balancing if present
echo 0 > /proc/sys/kernel/numa_balancing

Add these JVM flags to prevent GC-induced hangs:

-XX:+UseConcMarkSweepGC 
-XX:+CMSIncrementalMode
-XX:+UseTLAB
-XX:ParallelGCThreads=<CPU cores>
-XX:+DisableExplicitGC

If hangs persist after tuning:

# Temporary workaround (not recommended for production)
echo 0 > /proc/sys/kernel/hung_task_timeout_secs

# Better alternative - adjust timeout to 300s
echo 300 > /proc/sys/kernel/hung_task_timeout_secs

For kjournald issues, verify your filesystem mount options:

# Example fstab entry with safer options
/dev/sdX1 / ext4 noatime,nodiratime,data=writeback,barrier=0 0 1

Note: Disable barriers only if you have battery-backed RAID controllers.


When Java processes spike beyond 400% CPU utilization on Linux servers (particularly Ubuntu 10.04 with 2.6.3x kernels), system monitoring reveals repetitive console errors like:

INFO: task java:21547 blocked for more than 120 seconds
INFO: task kjournald:190 blocked for more than 120 seconds
INFO: task flush-202:0:709 blocked for more than 120 seconds

Several critical patterns emerge:

  • Occurs across both VM and bare-metal environments
  • Persists after hardware migration
  • Involves both Java processes and kernel threads (kjournald, flush)
  • Console output becomes primary diagnostic channel (dmesg inaccessible)

Based on the evidence, we're likely dealing with:

# Common triggers for task blocking:
1. I/O subsystem congestion (kjournald involvement)
2. Memory pressure (no swap remaining)
3. CPU starvation (400%+ Java utilization)
4. Kernel-level deadlocks
5. Missing irqbalance service

First, establish monitoring before crashes:

#!/bin/bash
# Continuous system monitoring script
while true; do
  echo "===== $(date) =====" >> /var/log/system_monitor.log
  top -b -n 1 | head -20 >> /var/log/system_monitor.log
  vmstat 1 5 >> /var/log/system_monitor.log
  iostat -dx 1 5 >> /var/log/system_monitor.log
  sleep 30
done

While investigating, implement these adjustments:

# Temporary kernel parameter changes
echo 300 > /proc/sys/kernel/hung_task_timeout_secs
echo 1 > /proc/sys/vm/dirty_background_ratio
echo 5 > /proc/sys/vm/dirty_ratio

# Install critical services
apt-get install irqbalance sysstat
service irqbalance start

For the high-CPU Java processes:

// Add these JVM flags:
-XX:+UseG1GC 
-XX:MaxGCPauseMillis=200 
-XX:ParallelGCThreads=4 
-XX:ConcGCThreads=2 
-XX:InitiatingHeapOccupancyPercent=35

Given kjournald involvement, examine filesystem performance:

# Check filesystem mount options
mount | grep -E '(ext3|ext4|xfs)'

# Recommended options for database/Java workloads:
defaults,noatime,nodiratime,data=writeback,barrier=0
  1. Upgrade to newer LTS Ubuntu (14.04+) with modern kernel
  2. Implement proper process cgroups for Java
  3. Add swap space if memory constrained
  4. Consider switching to XFS for better journaling performance

When console is the only output channel, use serial console redirection:

# In /etc/default/grub:
GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8"
GRUB_TERMINAL=serial

# Then update-grub and configure serial logging