Optimizing Linux Network Latency: taskset vs cpuset for CPU Affinity Binding


4 views

When tuning Linux network applications for minimal latency, CPU affinity becomes crucial. The kernel provides two primary mechanisms: taskset (part of util-linux) and cpuset (via cgroups). While both achieve similar results, their implementations differ significantly.

taskset operates at the process level using the sched_setaffinity() system call, modifying the CPU affinity mask directly. Example usage:

taskset -c 2 ./network_app

cpuset works through control groups (cgroups), creating isolated CPU partitions. This is more complex but offers better resource management:

mkdir /sys/fs/cgroup/cpuset/lowlatency
echo 2 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.cpus
echo 1 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.mems
echo $$ > /sys/fs/cgroup/cpuset/lowlatency/tasks
./network_app

For your single-threaded TCP application:

  • taskset is simpler and sufficient for basic affinity needs
  • cpuset provides better isolation (including memory nodes) and works well with IRQ balancing
  • Both eliminate CPU migration overhead but cpuset prevents kernel threads from using reserved cores

Combine CPU affinity with these techniques:

# Set CPU affinity and real-time priority
taskset -c 2 chrt -f 99 ./network_app

# Disable frequency scaling on the target core
echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor

Check affinity settings during runtime:

# For taskset
cat /proc/$(pidof network_app)/status | grep Cpus_allowed

# For cpuset
cat /proc/$(pidof network_app)/cgroup

Measure context switches with:

perf stat -e cs -p $(pidof network_app)

When optimizing Linux network applications, CPU affinity becomes crucial for reducing latency. The two primary tools for managing CPU affinity are:

taskset operates at the process level and uses the sched_setaffinity() system call. Example usage:

taskset -c 0,1 ./my_network_app

cpuset is more sophisticated, creating virtual CPU groups through cgroup v1. Example setup:

mkdir /sys/fs/cgroup/cpuset/lowlatency
echo 2 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.cpus
echo 1 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.mems
echo $$ > /sys/fs/cgroup/cpuset/lowlatency/tasks

For your single-threaded TCP application with strict latency requirements:

  • Use taskset for simpler deployments where you just need core pinning
  • Consider cpuset when you need:
    • Memory node awareness (NUMA)
    • Complex CPU isolation requirements
    • Additional resource constraints

Combine CPU affinity with these network stack optimizations:

# Disable IRQ balancing for NIC
echo 1 > /proc/irq/[irq_number]/smp_affinity
# Set network stack parameters
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216

Verify your optimizations with:

perf stat -e cycles,instructions,cache-misses -C 0 -- taskset -c 0 ./app
# Or for cpuset:
perf stat -e cycles,instructions,cache-misses -G lowlatency -- ./app

Remember to monitor both user-space and kernel-space execution times when analyzing latency improvements.