How to Monitor Per-Thread CPU Utilization: Tracking System/User/Wait Time Metrics


2 views

When profiling multi-threaded applications, standard tools like top or htop often fall short by not breaking down CPU usage into system/user/wait percentages at the thread level. This granular data is crucial for identifying:

  • Threads stuck in I/O wait
  • Excessive system call overhead
  • Uneven workload distribution

The Linux kernel exposes per-thread statistics through several interfaces:

# View thread CPU usage breakdown
cat /proc/[pid]/task/[tid]/stat

# Alternative using ps
ps -eLo pid,tid,pcpu,stat,wchan:32,comm | grep [process_name]

The perf tool provides the most comprehensive thread-level metrics:

# Monitor all threads of a process
perf stat -e 'cpu-clock,task-clock,cs,cache-references,cache-misses' -p [pid] -I 1000

# Track specific threads
perf stat -t [tid] -e 'cpu-clock:u,cpu-clock:k' sleep 5

For programmatic access, here's a Python script using psutil:

import psutil

def get_thread_stats(pid):
    proc = psutil.Process(pid)
    for thread in proc.threads():
        tid = thread.id
        try:
            with open(f"/proc/{pid}/task/{tid}/stat") as f:
                stats = f.read().split()
                utime = int(stats[13]) / 100  # user time in seconds
                stime = int(stats[14]) / 100  # system time in seconds
                print(f"Thread {tid}: User={utime}s, System={stime}s")
        except Exception as e:
            print(f"Error reading stats for TID {tid}: {str(e)}")

For production systems, eBPF provides low-overhead tracing:

# BPF program to track thread CPU usage
#include <uapi/linux/ptrace.h>
#include <linux/sched.h>

BPF_HASH(stats, u32, u64[3]);

int trace_sched_switch(struct pt_regs *ctx, struct task_struct *prev) {
    u32 pid = prev->pid;
    u64 *val = stats.lookup(&pid);
    if (val) {
        val[0] += prev->utime;
        val[1] += prev->stime;
        val[2] += prev->gtime;
    }
    return 0;
}

For continuous monitoring:

  • Use sar -P ALL 1 for system-wide thread activity
  • Configure collectd with processes plugin
  • Implement custom logging using the proc filesystem data

On Windows systems, similar data can be obtained through:

# PowerShell command
Get-Counter '\Process(*)\% Processor Time' -Continuous

When optimizing multi-threaded applications, system-wide CPU metrics often don't tell the whole story. What we really need is visibility into individual thread behavior - particularly the split between user CPU%, system CPU%, and I/O wait percentages. This granular data helps identify:

  • Threads stuck in I/O wait states
  • Imbalanced workload distribution
  • Kernel-space bottlenecks

While top -H shows threads (LWP), it combines user and system CPU into a single metric. pidstat -t provides some thread details but still lacks I/O wait breakdown. For true per-thread utilization metrics, we need to go deeper.

The Linux perf subsystem provides the most detailed thread-level metrics. Here's a complete monitoring solution:

# Sample perf command for thread monitoring
perf stat -e 'cpu-clock,task-clock,cs,cpu-migrations,page-faults' \
    -e 'sched:sched_stat_iowait' \
    -e 'sched:sched_switch' \
    -p $(pgrep -f your_application) \
    --per-thread \
    sleep 10

For continuous logging, create a script like this:

#!/bin/bash
APP_PID=$(pgrep -f target_application)
LOG_FILE="thread_stats_$(date +%Y%m%d).csv"

echo "timestamp,thread_id,thread_name,user%,system%,wait%" > $LOG_FILE

while true; do
    timestamp=$(date +%s)
    perf stat -e 'cpu-clock:u,cpu-clock:k' -p $APP_PID --per-thread -x, -o tmp.csv sleep 1
    
    awk -v ts="$timestamp" -F, 'NR>1 {
        split($1,parts,"-");
        tid=parts[2];
        gsub(/[^a-zA-Z0-9_]/, "", $2);
        print ts","tid","$2","$3","$4","$5
    }' tmp.csv >> $LOG_FILE
done

For more advanced users, eBPF provides lower-overhead monitoring:

# bpftrace script for thread CPU tracking
#!/usr/bin/bpftrace

BEGIN {
    printf("%-12s %-6s %-16s %-8s %-8s %-8s\n",
        "TIMESTAMP", "TID", "COMM", "USER%", "SYS%", "IOWAIT%");
}

profile:hz:99 /pid == $1/ {
    @user[tid, comm] = count();
    @mode[tid, comm] = 1;
}

profile:hz:99 /pid == $1/ /arg0/ {
    @system[tid, comm] = count();
    @mode[tid, comm] = 2;
}

profile:hz:99 /pid == $1/ /curtask->in_iowait/ {
    @iowait[tid, comm] = count();
    @mode[tid, comm] = 3;
}

interval:s:1 {
    time("%H:%M:%S ");
    print(@mode);
    clear(@user);
    clear(@system);
    clear(@iowait);
    clear(@mode);
}

For analyzing collected data:

  • Grafana with Prometheus metrics
  • Python pandas for CSV analysis
  • Flame graphs for thread activity visualization