When analyzing NFS server performance on SUSE Enterprise Linux 10, the standard /proc/net/rpc/nfsd
metrics only tell part of the story. To truly understand client-specific bottlenecks, we need deeper instrumentation that reveals:
- Per-client file access patterns
- RPC call distribution and latency
- Disk I/O contention between clients
- Network stack overhead
The nfsstat
utility provides client breakdowns when used with -c
flag:
# Show per-client operations (refresh every 2 seconds)
watch -n 2 "nfsstat -c -n -l"
For more granular analysis, combine iotop
with NFS debugging:
# Monitor disk I/O while tracing NFS operations
iotop -oP &
nfsd.ko debug=1023
This SystemTap script tracks per-client latency:
probe nfsd.proc {
printf("%s [%s] %s %dus\n", client_addr, execname(),
nfs_proc_name($proc), gettimeofday_us() - @entry(gettimeofday_us()))
}
Key metrics to capture:
Metric | Tool | Interpretation |
---|---|---|
RPC queue time | stap nfsd_rpc_queue.stp | Network/CPU contention |
VFS latency | funclatency-bpfcc vfs_* | Filesystem overhead |
Disk service time | iostat -x 1 | Storage bottleneck |
When multiple HPC clients access shared datasets, we observed:
# Client A (simulation job)
nfsv4 WRITE 32768bytes: avg_latency=142ms p95=312ms
# Client B (visualization tool)
nfsv4 READ 131072bytes: avg_latency=89ms p95=201ms
The solution involved implementing client-specific:
- NFS server thread pools (
RDMA_NFSD_THREAD_COUNT=16
) - Per-export caching policies
- Network QoS tagging (DSCP class mapping)
For continuous monitoring, we developed this Python collector:
from prometheus_client import Gauge
nfs_client_ops = Gauge('nfs_client_operations',
'Per-client NFS operations',
['client_ip', 'operation'])
def collect_metrics():
with open('/proc/net/rpc/nfsd') as f:
for line in f:
if line.startswith('net'):
parts = line.split()
nfs_client_ops.labels(client_ip=parts[1],
operation=parts[3]).set(parts[5])
This integrates with existing monitoring stacks for historical analysis of client performance patterns.
When troubleshooting NFS performance issues on SUSE Enterprise Linux 10 (or any Linux distribution), standard /proc/net/rpc/nfsd
statistics often don't provide enough granularity. Here's how to dig deeper into client-specific behavior and server-side bottlenecks.
Beyond basic NFS stats, these tools provide critical insights:
# 1. nfsstat for RPC breakdown
nfsstat -rc # Client RPC stats
nfsstat -s # Server RPC stats
# 2. Client connection tracking
cat /proc/fs/nfsd/clients/*
# 3. Per-client IO monitoring
iotop -oP # Requires root
To identify which clients access specific files:
# Install and configure inotify-tools
sudo zypper install inotify-tools
inotifywait -m -r /exported/nfs/share | \
awk '{print $1 " " $3}' > nfs_access.log
# Combine with client IP data
grep -f <(awk '{print $2}' /proc/fs/nfsd/clients/*/info | cut -d: -f1) \
nfs_access.log
Use a combination of tools:
# Network traffic per client
nethogs -t eth0
# Alternative method using iptables accounting
sudo iptables -I INPUT -p tcp --dport 2049
sudo iptables -I OUTPUT -p tcp --sport 2049
watch -n 1 iptables -nvxL
Break down RPC call types and latency:
# Detailed RPC statistics
rpcinfo -s
rpcinfo -p
# XDR decoding analysis (requires kernel debugging)
echo 1 > /proc/sys/sunrpc/nfsd_debug
dmesg | grep nfsd | awk '/xid/ {print $6,$7,$8}'
When NFS waits on storage:
# Check IO wait at OS level
vmstat 1 5
# Per-disk latency
iostat -x 1
# NFS-specific disk stats
cat /proc/fs/nfsd/pool_stats
Here's a script that combines multiple metrics:
#!/bin/bash
while true; do
# Get client connections
clients=$(grep -oP '([0-9]{1,3}\.){3}[0-9]{1,3}' /proc/fs/nfsd/clients/*/info | sort | uniq)
# Get RPC stats
rpc_stats=$(nfsstat -s | awk '/call|retrans/ {print $1,$2}')
# Get disk latency
disk_lat=$(iostat -x | awk '/sd/ {print $1,$10}')
# Output timestamped data
echo "$(date) | Clients: $clients | RPC: $rpc_stats | Disk: $disk_lat"
sleep 5
done
For long-term analysis, consider these approaches:
# Send data to Graphite/InfluxDB
echo "nfs.client.$(hostname).rpc $(nfsstat -s | grep calls | awk '{print $2}') $(date +%s)" | \
nc graphite.example.com 2003
# Or use collectd with NFS plugin
LoadPlugin nfs
<Plugin nfs>
ReportV2 no
ReportV3 no
ReportV4 no
</Plugin>