When your server shows load averages spiking to 20-30 while CPU sits at 98% idle, you're dealing with a classic I/O wait scenario. The vmstat output reveals telltale signs:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 1 0 1298952 0 0 0 0 0 0 0 9268 7 5 70 19 0
Key observations from your data:
- wa (wait) percentage frequently hits double digits (13-19%)
- bi (blocks in) shows periodic spikes (240 at maximum)
- Load spikes correlate with I/O operations
- Context switches (cs) increase dramatically during high-load periods
While NFS could certainly be the culprit, let's gather more evidence before concluding. Try these diagnostic commands:
# Check NFS server statistics
nfsstat -c
nfsstat -s
# Monitor NFS operations in real-time
mountstats /mount/point
# Alternative: check per-mount NFS stats
cat /proc/fs/nfsfs/volumes
For SAN/FC storage issues:
# Check block device latency
iostat -x 1
# Identify processes causing I/O
iotop -oPa
# Check SCSI layer for errors
dmesg | grep -i scsi
Since this is a VPS on FC SAN:
- Check for hypervisor-level contention:
virsh nodecpustats
- Verify SAN queue depth:
cat /sys/block/sdX/queue/nr_requests
- Monitor multipath I/O if applicable:
multipath -ll
Potential fixes to implement and measure:
# For NFS: adjust mount options
mount -o remount,rsize=32768,wsize=32768,async,intr,tcp /nfs/mount
# For general I/O: tune kernel parameters
echo 100 > /proc/sys/vm/dirty_ratio
echo 50 > /proc/sys/vm/dirty_background_ratio
echo 5000 > /proc/sys/vm/dirty_expire_centisecs
Remember to baseline performance before/after changes using:
vmstat 1 10
iostat -x 1 10
When your server shows high load averages (20-30) while maintaining 98% CPU idle time, this typically indicates I/O wait issues. The vmstat output clearly shows significant time spent in the 'wa' state (up to 19%) coinciding with increased I/O operations.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 1 0 1298952 0 0 0 0 0 0 0 9268 7 5 70 19 0
Notice how the 'wa' (wait) column spikes during I/O operations. The blocked processes ('b' column) also correlate with these events.
While your VPS uses fiber channel SAN, NFS can still introduce latency due to:
- Network round-trip time
- NFS server processing time
- NFS protocol overhead (especially with sync writes)
To isolate NFS issues, try these commands:
# Check NFS server response times
nfsiostat -d 2 5
# Identify processes waiting on I/O
iotop -o
# Check disk latency
iostat -x 1 5
If NFS is the bottleneck:
# Consider these mount options for performance:
mount -o rsize=65536,wsize=65536,hard,intr,noatime,nodiratime,tcp,nolock [server]:/[export] /mnt
For SAN-related issues:
# Check queue depths and device mapper settings
cat /sys/block/sdX/queue/nr_requests
dmsetup status
Don't overlook other possibilities:
- Check for memory pressure (even with free memory shown)
- Verify if the hypervisor is throttling I/O
- Test with local storage to establish baseline performance
Remember that vmstat's 'wa' includes all I/O wait, not just disk. Network filesystems can trigger this through different mechanisms than local storage.