When tuning Linux network applications for minimal latency, CPU affinity becomes crucial. The kernel provides two primary mechanisms: taskset
(part of util-linux) and cpuset
(via cgroups). While both achieve similar results, their implementations differ significantly.
taskset operates at the process level using the sched_setaffinity()
system call, modifying the CPU affinity mask directly. Example usage:
taskset -c 2 ./network_app
cpuset works through control groups (cgroups), creating isolated CPU partitions. This is more complex but offers better resource management:
mkdir /sys/fs/cgroup/cpuset/lowlatency
echo 2 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.cpus
echo 1 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.mems
echo $$ > /sys/fs/cgroup/cpuset/lowlatency/tasks
./network_app
For your single-threaded TCP application:
taskset
is simpler and sufficient for basic affinity needscpuset
provides better isolation (including memory nodes) and works well with IRQ balancing- Both eliminate CPU migration overhead but cpuset prevents kernel threads from using reserved cores
Combine CPU affinity with these techniques:
# Set CPU affinity and real-time priority
taskset -c 2 chrt -f 99 ./network_app
# Disable frequency scaling on the target core
echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
Check affinity settings during runtime:
# For taskset
cat /proc/$(pidof network_app)/status | grep Cpus_allowed
# For cpuset
cat /proc/$(pidof network_app)/cgroup
Measure context switches with:
perf stat -e cs -p $(pidof network_app)
When optimizing Linux network applications, CPU affinity becomes crucial for reducing latency. The two primary tools for managing CPU affinity are:
taskset operates at the process level and uses the sched_setaffinity() system call. Example usage:
taskset -c 0,1 ./my_network_app
cpuset is more sophisticated, creating virtual CPU groups through cgroup v1. Example setup:
mkdir /sys/fs/cgroup/cpuset/lowlatency
echo 2 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.cpus
echo 1 > /sys/fs/cgroup/cpuset/lowlatency/cpuset.mems
echo $$ > /sys/fs/cgroup/cpuset/lowlatency/tasks
For your single-threaded TCP application with strict latency requirements:
- Use taskset for simpler deployments where you just need core pinning
- Consider cpuset when you need:
- Memory node awareness (NUMA)
- Complex CPU isolation requirements
- Additional resource constraints
Combine CPU affinity with these network stack optimizations:
# Disable IRQ balancing for NIC
echo 1 > /proc/irq/[irq_number]/smp_affinity
# Set network stack parameters
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
Verify your optimizations with:
perf stat -e cycles,instructions,cache-misses -C 0 -- taskset -c 0 ./app
# Or for cpuset:
perf stat -e cycles,instructions,cache-misses -G lowlatency -- ./app
Remember to monitor both user-space and kernel-space execution times when analyzing latency improvements.