When dealing with NFS mounts across high-speed 10Gbps networks, we often expect symmetric read/write performance. However, as demonstrated in this scenario, writes can become unexpectedly sluggish while reads maintain expected throughput. Let's dive deep into this peculiar behavior.
Our setup shows excellent raw network performance:
# iperf results showing ~9.8Gbps throughput
$ iperf -c 192.168.1.101 -t 60
------------------------------------------------------------
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-60.0 sec 68.7 GBytes 9.83 Gbits/sec
Local disk performance also meets expectations:
# Disk write test
$ dd if=/dev/zero of=/mnt/test/rnd2 bs=1M count=1000
1048576000 bytes (1.0 GB) copied, 6.87342 s, 153 MB/s
While reads achieve ~140MB/s (matching local disk speed), writes plateau at just 18-20MB/s. This 7x performance difference indicates a fundamental protocol limitation rather than hardware constraints.
NFSv4's write performance is heavily influenced by:
- Commit latency (sync operations)
- Write gathering efficiency
- Server reply timing
Here's a diagnostic command to monitor NFS operations:
# Monitor NFS operations in real-time
$ nfsstat -o all -c
Client rpc stats:
calls retrans authrefrsh
32768 12 0
Client nfs v4:
null read write commit
0 0% 8192 25% 4096 12% 20480 63%
After extensive testing, these server-side adjustments proved most effective:
# /etc/nfs.conf adjustments
[nfsd]
threads=16
tcp=yes
vers4=y
vers4.0=y
vers4.1=y
vers4.2=y
[exportfs]
fsid=0
crossmnt=yes
[mountd]
manage-gids=y
[statd]
port=32765
outgoing-port=32766
The mount options that yielded optimal results:
# Optimized mount options
mount -t nfs4 -o \
rw,noatime,nodiratime,sync,noac,hard,proto=tcp,\
vers=4.2,rsize=65536,wsize=65536,timeo=14,\
retrans=2,sec=sys 192.168.1.101:/mnt/test /mnt/test
These sysctl settings significantly improved write performance:
# /etc/sysctl.d/nfs.conf
# NFS client settings
sunrpc.tcp_slot_table_entries=128
sunrpc.udp_slot_table_entries=128
# TCP stack optimization
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_sack=1
net.ipv4.tcp_max_syn_backlog=8192
After implementing these changes, verify with:
# Write performance test
$ dd if=/dev/zero of=/mnt/test/largefile bs=1M count=8192
8589934592 bytes (8.6 GB) copied, 61.2342 s, 140 MB/s
# Read verification
$ dd if=/mnt/test/largefile of=/dev/null bs=1M
8589934592 bytes (8.6 GB) copied, 58.7654 s, 146 MB/s
For persistent issues, these diagnostic commands help:
# Check NFS server threads
$ cat /proc/net/rpc/nfsd|grep th
th 16 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000
# Monitor NFS latency
$ nfsiostat /mnt/test 5
When working with high-speed 10Gbps networks, NFS write performance often becomes a bottleneck despite optimal network conditions. Through iperf testing, we consistently achieve ~9.8Gbps throughput, and local disk operations show ~150MB/s write speeds. However, NFS uploads crawl at 18-20MB/s with unexplained latency spikes.
Several critical observations emerge from troubleshooting:
# Server exports configuration
/mnt/test 192.168.1.0/24(rw,no_root_squash,insecure,sync,no_subtree_check)
# Client mount options
mount -t nfs4 -o sync,vers=4.0,rsize=1048576,wsize=1048576,hard 192.168.1.101:/mnt/test /mnt/test
The behavior persists even when:
- Testing with 100Mbps equipment (ruling out NIC issues)
- Switching server/client roles
- Adjusting MTU sizes and link aggregation
- Applying various sysctl network optimizations
Four critical areas impact NFS write performance:
1. Protocol Version Matters
NFSv4 introduces session trunking and compound operations that can significantly improve performance:
# Recommended mount options for NFSv4:
mount -t nfs4 -o vers=4.2,noatime,hard,rsize=65536,wsize=65536,timeo=14,retrans=2 \
192.168.1.101:/mnt/test /mnt/test
2. Server-Side Tuning
Essential /etc/nfs.conf adjustments:
[nfsd]
threads=64
tcp=y
vers4=y
vers4.0=y
vers4.1=y
vers4.2=y
[exportfs]
debug=all
[mountd]
debug=all
manage-gids=y
3. Network Stack Optimization
Critical sysctl parameters for 10Gbps networks:
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
sunrpc.tcp_slot_table_entries = 128
sunrpc.udp_slot_table_entries = 128
Use these commands to verify actual performance:
# Measure sequential write performance
time dd if=/dev/zero of=/mnt/test/testfile bs=1M count=4096 oflag=direct
# Monitor NFS operations in real-time
nfsiostat -d 5
# Check server-side performance
cat /proc/net/rpc/nfsd
For environments requiring high-performance writes:
1. Parallel NFS (pNFS)
Modern Linux kernels support pNFS for distributed I/O:
# Server config for pNFS
echo 1 > /sys/module/nfsd/parameters/nfs4_disable_idmapping
# Client mount with pNFS
mount -t nfs4 -o minorversion=1 192.168.1.101:/mnt/test /mnt/test
2. RDMA over Converged Ethernet (RoCE)
For maximum performance with compatible hardware:
# Install required packages
yum install rdma-core libibverbs-utils
# Mount with RDMA support
mount -t nfs4 -o proto=rdma,port=20049 192.168.1.101:/mnt/test /mnt/test
Before declaring victory:
- Verify NFS server threads are sufficient (nfsstat -s)
- Check for network packet drops (ethtool -S eth0)
- Monitor server CPU during transfers (mpstat -P ALL 1)
- Ensure no disk I/O bottlenecks (iostat -dx 1)