When processes show high sy
(system CPU) usage in vmstat
or top
, the first step is to pinpoint which system calls are responsible. While strace
is the classic tool, it introduces significant overhead and may not be practical for production systems.
The Linux perf
tool provides low-overhead system call monitoring:
# System-wide syscall count (Ctrl+C to stop) perf top -e raw_syscalls:sys_enter --sort comm # Top syscalls for specific PID (replace 1234) perf stat -e 'syscalls:sys_enter_*' -p 1234 -- sleep 10
For immediate troubleshooting, try these approaches:
# Live syscall count (requires root) bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' # Syscall breakdown per process sudo syscount -p $(pidof your_process) -d 10
For deeper investigations, SystemTap can capture syscall patterns over time:
# Install SystemTap (Debian/Ubuntu) sudo apt-get install systemtap # Sample script to track top syscalls stap -e ' global syscalls probe syscall.* { syscalls[execname()]++ } probe end { foreach (proc in syscalls- limit 5) { printf("%s: %d syscalls\n", proc, syscalls[proc]) } } ' -c "your_command"
For quick checks without installation:
# Count syscalls per process type grep -E '^random-process' /proc/[0-9]*/comm | xargs -I {} grep syscalls {}/status # Kernel-based sampling (requires 4.4+) echo 1 | sudo tee /proc/sys/kernel/sysrq echo m | sudo tee /proc/sysrq-trigger dmesg | tail -20
Remember to correlate syscall data with other metrics:
# Combined view with pidstat pidstat -w -p $(pidof process) 1 5 # Context switches vs syscalls cat /proc/$(pidof process)/status | grep -E 'voluntary|nonvoluntary'
When you notice high system CPU usage (sy in vmstat), the first step is identifying which processes are making excessive system calls. While top
shows overall CPU usage, we need more granular tools:
# Basic process monitoring
$ top -o SYSCPU
# Or using ps with custom output
$ ps -eo pid,comm,%sys --sort=-%sys | head -n 10
For real-time monitoring, strace
is powerful but heavy. Here are lighter alternatives:
# Using perf for system-wide sampling
$ sudo perf top -e raw_syscalls:sys_enter --sort comm,dso
# Focused monitoring on specific PID
$ sudo perf trace -p [PID]
sysdig
provides exactly what you need - a "top for system calls":
# Install sysdig
$ curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash
# Top-like interface for system calls
$ sudo sysdig -c topscalls
# Filter for specific process
$ sudo sysdig -c topscalls proc.name=[process_name]
For deeper analysis, we can use bpftrace
to create custom instrumentation:
# Count system calls by process
$ sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
# Breakdown by specific syscall type
$ sudo bpftrace -e 'tracepoint:syscalls:sys_enter_* { @[probe] = count(); }'
Let's walk through a real-world scenario where MySQL was showing high system CPU:
# First identify the MySQL process ID
$ pgrep -f mysqld
# Then monitor its system calls
$ sudo strace -c -p [mysql_pid]
# After 30 seconds, press Ctrl+C to see the summary
# Alternative with sysdig
$ sudo sysdig -c topfiles_bytes proc.name=mysqld
For production systems, consider setting up continuous monitoring:
# Systemd service for sysdig
[Unit]
Description=Sysdig system call monitor
After=network.target
[Service]
ExecStart=/usr/bin/sysdig -c topscalls -w /var/log/sysdig/scalls.log
Restart=always
[Install]
WantedBy=multi-user.target