Debugging Extreme LOC (Local Timer) Interrupt Spikes in Linux: Causes and Solutions


1 views

Local timer interrupts (LOC) are a critical part of Linux's scheduling mechanism, typically firing at HZ frequency (usually 100, 250, or 1000 times per second) per CPU core. When you see spikes reaching millions per second (as shown in your Munin graphs), it indicates a serious system anomaly.

# Typical expected LOC values (per core)
Expected: 100-1000 interrupts/sec/core
Your case: 100M+ interrupts/sec/core (abnormal)

The most common causes for such extreme LOC spikes include:

  • Kernel Scheduling Issues: Problems with CFS (Completely Fair Scheduler) or improper timer configuration
  • CPU Frequency Scaling: Aggressive power management causing constant frequency switches
  • Hardware Problems: CPU cache issues or motherboard clock signal problems
  • Kernel Bugs: Particularly in older kernels (your 2.6.24 is quite dated)

To investigate further, run these commands:

# Check current timer frequency
cat /proc/timer_list | grep -i "jiffies"

# Monitor interrupts in real-time
watch -n 1 "cat /proc/interrupts | grep -i loc"

# Check CPU frequency scaling
cpupower frequency-info

1. Kernel Upgrade:
Your 2.6.24 kernel is ancient. Many timer-related bugs were fixed in later versions. Upgrade to at least 4.x series.

# For Ubuntu 18.04+:
sudo apt install --install-recommends linux-generic-hwe-18.04

2. Timer Configuration:
Try switching to high-resolution timers:

# Add to GRUB config (GRUB_CMDLINE_LINUX_DEFAULT)
clocksource=hpet nohz=off highres=on

3. CPU Isolation:
Isolate the affected cores from general workload:

# Example for isolating CPUs 4-7
sudo cset shield -c 4-7

Create a monitoring script to track LOC changes:

#!/bin/bash
while true; do
  date >> /var/log/loc_monitor.log
  grep -i loc /proc/interrupts >> /var/log/loc_monitor.log
  mpstat -P ALL 1 1 >> /var/log/loc_monitor.log
  sleep 5
done

For persistent issues, use Ftrace to analyze timer events:

echo function_graph > /sys/kernel/debug/tracing/current_tracer
echo timer:* > /sys/kernel/debug/tracing/set_event
echo 1 > /sys/kernel/debug/tracing/tracing_on
# Wait for 10 seconds
echo 0 > /sys/kernel/debug/tracing/tracing_on
cat /sys/kernel/debug/tracing/trace > /tmp/timer_trace.txt

Local timer interrupts (LOC) are CPU-local timer events generated by the APIC (Advanced Programmable Interrupt Controller) for scheduling purposes. In your case, the /proc/interrupts output shows alarmingly high LOC counts (100M+ per core), which explains the CPU graph saturation.

The most common causes for such spikes include:

  • Kernel timer frequency: Older kernels (like 2.6.24) often default to 1000Hz timer interrupts
  • CPU-bound processes: Tight loops in userspace can trigger frequent scheduling
  • Virtualization overhead: If running as a VM guest

First, check your current HZ setting:

# Check kernel timer frequency
grep CONFIG_HZ /boot/config-$(uname -r)

# Monitor interrupt distribution
watch -n 1 "cat /proc/interrupts | grep -E 'LOC|RES|CAL'"

# Check process accounting
sudo apt-get install acct
sa -m

1. Kernel Timer Configuration

For server workloads, consider rebuilding the kernel with lower HZ:

# In kernel config:
CONFIG_HZ_250=y
CONFIG_HZ=250

2. Tickless Kernel Mode

Modern solutions (kernel 2.6.21+) support NO_HZ:

# Check available options
grep NO_HZ /boot/config-$(uname -r)

# For newer kernels (3.10+), enable adaptive ticks:
echo 1 > /sys/devices/system/cpu/cpu*/nohz_full

3. CPU Isolation

Isolate cores from handling timer interrupts:

# Add to GRUB_CMDLINE_LINUX in /etc/default/grub
isolcpus=2,3,6,7 nohz_full=2,3,6,7 rcu_nocbs=2,3,6,7

# Then update grub and reboot
update-grub

Here's how we fixed similar issues on a database server:

# Sample monitoring script
#!/bin/bash
while true; do
    date +"%T" >> interrupts.log
    cat /proc/interrupts | grep -E 'LOC|RES' >> interrupts.log
    mpstat -P ALL 1 1 >> cpu_stats.log
    sleep 5
done

# After analysis, we implemented:
# 1. Kernel rebuild with CONFIG_HZ=250
# 2. CPU isolation for dedicated DB processes
# 3. Upgrade to newer LTS kernel (3.13+)

For persistent cases, use perf to analyze timer usage:

perf stat -e irq_vectors:local_timer_entry -a sleep 10
perf top -e irq_vectors:local_timer_entry

Consider upgrading to a modern kernel (4.19+ LTS) which includes:

  • Improved tickless operation
  • Better timer coalescing
  • Dynamic tick adjustments

Remember to test changes in staging before production deployment. The optimal solution often combines kernel tuning, CPU isolation, and software configuration adjustments.