Optimizing Linux Network Latency: taskset vs cpuset for CPU Affinity Binding

When tuning Linux network applications for minimal latency, CPU affinity becomes crucial. The kernel provides two primary mechanisms: taskset (part of util-linux) and cpuset (via cgroups). While both achieve similar results, their implementations differ significantly.

taskset operates at the process level using the sched_setaffinity() system call, modifying the CPU affinity mask directly. Example usage:

taskset -c 2 ./network_app

cpuset works through control groups (cgroups), creating isolated CPU partitions. This is more complex but offers better resource management:

mkdir /sys/fs/cgroup/cpuset/lowlatency
echo 2 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.cpus
echo 1 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.mems
echo $$ > /sys/fs/cgroup/cpuset/lowlatency/tasks
./network_app

For your single-threaded TCP application:

taskset is simpler and sufficient for basic affinity needs
cpuset provides better isolation (including memory nodes) and works well with IRQ balancing
Both eliminate CPU migration overhead but cpuset prevents kernel threads from using reserved cores

Combine CPU affinity with these techniques:

# Set CPU affinity and real-time priority
taskset -c 2 chrt -f 99 ./network_app

# Disable frequency scaling on the target core
echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor

Check affinity settings during runtime:

# For taskset
cat /proc/$(pidof network_app)/status | grep Cpus_allowed

# For cpuset
cat /proc/$(pidof network_app)/cgroup

Measure context switches with:

perf stat -e cs -p $(pidof network_app)

When optimizing Linux network applications, CPU affinity becomes crucial for reducing latency. The two primary tools for managing CPU affinity are:

taskset operates at the process level and uses the sched_setaffinity() system call. Example usage:

taskset -c 0,1 ./my_network_app

cpuset is more sophisticated, creating virtual CPU groups through cgroup v1. Example setup:

mkdir /sys/fs/cgroup/cpuset/lowlatency
echo 2 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.cpus
echo 1 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.mems
echo $$ > /sys/fs/cgroup/cpuset/lowlatency/tasks

For your single-threaded TCP application with strict latency requirements:

Use taskset for simpler deployments where you just need core pinning
Consider cpuset when you need:
- Memory node awareness (NUMA)
- Complex CPU isolation requirements
- Additional resource constraints

Combine CPU affinity with these network stack optimizations:

# Disable IRQ balancing for NIC
echo 1 > /proc/irq/[irq_number]/smp_affinity
# Set network stack parameters
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216

Verify your optimizations with:

perf stat -e cycles,instructions,cache-misses -C 0 -- taskset -c 0 ./app
# Or for cpuset:
perf stat -e cycles,instructions,cache-misses -G lowlatency -- ./app

Remember to monitor both user-space and kernel-space execution times when analyzing latency improvements.

ServerDevWorker

Optimizing Linux Network Latency: taskset vs cpuset for CPU Affinity Binding

Related Articles