Linux Network Optimization: Deep Dive into NAPI vs Adaptive Interrupts for High Throughput (400Mbps+) on BCM5709 NICs


2 views

When dealing with high network throughput scenarios (~400Mbps and above), traditional interrupt handling mechanisms can become a serious bottleneck. The Linux kernel offers two primary approaches to mitigate this:

// Typical interrupt handler registration
static irqreturn_t example_interrupt(int irq, void *dev_id) {
    // Standard packet processing
    return IRQ_HANDLED;
}

Adaptive-rx/Adaptive-tx are hardware-dependent features that dynamically adjust interrupt rates based on network load. The Broadcom BCM5709 NIC technically supports this, but the bnx2 driver implementation has limitations:

// Checking adaptive settings via ethtool (often fails on bnx2)
$ ethtool -c eth0 | grep -i adaptive
Adaptive RX: off  TX: off

NAPI (New API) is a pure software solution that switches from interrupt-driven to polling mode under load. Key components include:

// NAPI initialization in driver code
netif_napi_add(netdev, &bp->napi, bnxt_poll, BNXT_NAPI_WEIGHT);

// Polling function structure
int bnxt_poll(struct napi_struct *napi, int budget) {
    // Packet processing logic
    if (work_done < budget) {
        napi_complete(napi);
        bnxt_enable_int(bp);
    }
    return work_done;
}

For BCM5709 NICs running bnx2 driver, consider these kernel parameters:

# Force polling mode earlier by reducing NAPI weight
echo 16 > /sys/class/net/eth0/queues/rx-0/napi_weight

# Adjust interrupt coalescing (if adaptive fails)
ethtool -C eth0 rx-usecs 100 tx-usecs 100

When dealing with bnx2 driver limitations, you might need to patch the driver source:

// In bnx2.c, modify the NAPI weight initialization
static int bnx2_poll_work(struct napi_struct *napi, int budget) {
    // Original uses hardcoded 64, change to:
    int work_done = min(budget, 16);
    ...
}

After applying changes, verify with:

# Monitor interrupt rates
watch -n 1 "cat /proc/interrupts | grep eth0"

# Check packet processing statistics
ethtool -S eth0 | grep -E 'rx_missed|rx_no_buffer'

When dealing with high network throughput (400Mbps+), traditional interrupt-driven packet processing becomes inefficient due to excessive CPU cycles spent handling interrupts. The Linux kernel offers two primary optimization approaches:

NAPI operates through a hybrid approach combining interrupts and polling:

// Typical NAPI initialization in a driver
netif_napi_add(netdev, &bp->napi, bnx2_poll, BNX2_NAPI_WEIGHT);

// Poll function skeleton
static int bnx2_poll(struct napi_struct *napi, int budget)
{
    struct bnx2 *bp = container_of(napi, struct bnx2, napi);
    int work_done = 0;
    
    // Process packets until budget exhausted or queue empty
    while (work_done < budget) {
        // Packet processing logic
        work_done++;
    }
    
    if (work_done < budget) {
        napi_complete(napi);
        bnx2_enable_ints(bp);
    }
    return work_done;
}

Adaptive interrupts dynamically adjust coalescing parameters based on traffic patterns. Hardware support is required, and implementation varies by NIC:

// Example of checking adaptive RX support via ethtool
$ ethtool -c eth0
Coalesce parameters for eth0:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

The bnx2 driver (for BCM5709) has specific limitations regarding adaptive interrupt support despite hardware capabilities. Forcing polling mode requires kernel modifications:

// Potential driver modification to force polling
static int __init bnx2_init_module(void)
{
    // Reduce NAPI weight to trigger polling earlier
    #define BNX2_NAPI_WEIGHT 16  // Default is typically 64
    // ... rest of init code
}

For systems handling ~400Mbps traffic, consider these /proc tweaks:

# Increase netdev_max_backlog
echo 3000 > /proc/sys/net/core/netdev_max_backlog

# Adjust NAPI weight for faster polling
echo 32 > /sys/class/net/eth0/queues/rx-0/rps_flow_cnt

# Disable interrupt moderation if adaptive mode unavailable
ethtool -C eth0 rx-usecs 0 tx-usecs 0

Monitor interrupt behavior to validate your configuration:

# Watch interrupt distribution
watch -n 1 'cat /proc/interrupts | grep eth0'

# Check softirq handling
watch -n 1 'cat /proc/softirqs | grep NET_RX'

For NICs without proper adaptive interrupt support, consider:

# Manual coalescing settings (microseconds)
ethtool -C eth0 rx-usecs 50 rx-frames 32

# Enable RPS (Receive Packet Steering)
echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus

# Consider XPS for transmit queues
echo f > /sys/class/net/eth0/queues/tx-0/xps_cpus