Optimizing Xen Virtual Machine Performance: Solving CPU0 Interrupt Flooding from eth1 Network Interface


2 views

When examining the /proc/interrupts output on this Ubuntu 11.04 Xen virtual machine, we observe a severe imbalance where CPU0 handles over 113 million eth1 interrupts while other cores show zero activity for this network interface. This creates a performance bottleneck where:

  • First CPU core reaches 100% utilization
  • Other cores remain underutilized
  • Network throughput becomes limited by single-core performance

The interrupt distribution reveals Xen's default behavior with virtual network interfaces:

283: 113720624 0 0 0 0 0 0 0 xen-dyn-event eth1

This shows all eth1 interrupts are processed by CPU0 through Xen's dynamic event channel mechanism. Unlike physical hardware where we could use irqbalance, Xen virtual devices require different approaches.

1. Enabling SMP Affinity for Xen Network Devices

First, check current affinity settings:

# cat /proc/irq/283/smp_affinity 
01

To distribute across first 4 CPUs (hex mask):

# echo 0f > /proc/irq/283/smp_affinity

For persistent configuration, create a startup script:

#!/bin/sh
echo 0f > /proc/irq/$(grep eth1 /proc/interrupts | awk -F: '{print $1}')/smp_affinity

2. Xen Network Backend Configuration

Modify Xen domain configuration (in /etc/xen/):

vif = [
    'mac=00:16:3e:xx:xx:xx,bridge=xenbr1,backend_per_cpu=1'
]

The backend_per_cpu option enables separate queue per vCPU.

3. Alternative: Using netfront Multi-queue

For newer Xen versions (4.0+), enable multi-queue support:

modprobe xen-netfront queues=4

Then verify with:

# ethtool -l eth1
Pre-set maximums:
RX:     4
TX:     4
Current hardware settings:
RX:     1
TX:     1

Set queues to match vCPU count:

# ethtool -L eth1 combined 4

After implementing changes, verify interrupt distribution:

# watch -n1 'grep eth1 /proc/interrupts'
283: 28740624 28459210 28134872 28385918 ...

Measure throughput improvement using ab:

ab -n 100000 -c 100 http://localhost/

Compare CPU utilization across cores with mpstat -P ALL 1.

  • Some Xen versions require xen_netback.percpu=1 kernel parameter
  • Virtual NIC performance may be limited by host's physical NIC capabilities
  • Consider using PV drivers instead of emulated network devices

For dynamic environments, implement automated balancing:

#!/bin/bash
IRQ=$(grep eth1 /proc/interrupts | cut -d: -f1)
while true; do
    for cpu in $(seq 0 $(($(nproc)-1))); do
        mask=$((1 << $cpu))
        printf "%x" $mask > /proc/irq/$IRQ/smp_affinity
        sleep 5
    done
done

When examining /proc/interrupts on this Ubuntu 11.04 Xen VM, we observe a severe imbalance where CPU0 handles 113,720,624 interrupts from eth1 while other cores remain nearly idle. This creates a bottleneck where network throughput is limited by single-core performance.

Xen's virtualized interrupts differ from bare metal:

# Typical Xen interrupt entry
283:  113720624          0          0          0   xen-dyn-event     eth1

The xen-dyn-event label indicates dynamic event channels, while xen-percpu-ipi shows processor-specific interrupts.

For Xen VMs, we need to work with both the guest OS and hypervisor settings. First check current affinity:

# Check current IRQ affinity
$ cat /proc/irq/283/smp_affinity 
01

To distribute interrupts across CPUs 0-3:

# Set affinity mask (hexadecimal)
$ echo 0f | sudo tee /proc/irq/283/smp_affinity

Add these parameters to your VM configuration file (.cfg):

vif = ['mac=00:16:3e:xx:xx:xx,bridge=xenbr0,backend=dom0,queues=4']
extra = "console=hvc0 xen_emul_unplug=never"

Add these to /etc/default/grub under GRUB_CMDLINE_LINUX:

GRUB_CMDLINE_LINUX="... xen-pciback.hide=(01:00.0) pci=nomsi"

Then update GRUB and reboot:

$ sudo update-grub
$ sudo reboot

After changes, verify with:

$ watch -n 1 'cat /proc/interrupts | grep eth1'
$ xentop -b | grep -A 1 YOUR_VM_NAME

If hardware support is limited, consider RPS (Receive Packet Steering):

# Enable RPS for eth1
$ echo 7 | sudo tee /sys/class/net/eth1/queues/rx-0/rps_cpus
$ echo 4096 | sudo tee /sys/class/net/eth1/queues/rx-0/rps_flow_cnt

Compare before/after with:

$ ab -k -c 100 -n 10000 http://your-service/
$ sar -u ALL -P ALL 1 30