Troubleshooting Slow NFS Write Performance: Optimization Strategies for 10Gbps Networks


1 views

When dealing with NFS mounts across high-speed 10Gbps networks, we often expect symmetric read/write performance. However, as demonstrated in this scenario, writes can become unexpectedly sluggish while reads maintain expected throughput. Let's dive deep into this peculiar behavior.

Our setup shows excellent raw network performance:

# iperf results showing ~9.8Gbps throughput
$ iperf -c 192.168.1.101 -t 60
------------------------------------------------------------
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-60.0 sec  68.7 GBytes  9.83 Gbits/sec

Local disk performance also meets expectations:

# Disk write test
$ dd if=/dev/zero of=/mnt/test/rnd2 bs=1M count=1000
1048576000 bytes (1.0 GB) copied, 6.87342 s, 153 MB/s

While reads achieve ~140MB/s (matching local disk speed), writes plateau at just 18-20MB/s. This 7x performance difference indicates a fundamental protocol limitation rather than hardware constraints.

NFSv4's write performance is heavily influenced by:

  • Commit latency (sync operations)
  • Write gathering efficiency
  • Server reply timing

Here's a diagnostic command to monitor NFS operations:

# Monitor NFS operations in real-time
$ nfsstat -o all -c
Client rpc stats:
calls    retrans    authrefrsh
32768    12         0

Client nfs v4:
null     read      write     commit
0 0%     8192 25%  4096 12%  20480 63%

After extensive testing, these server-side adjustments proved most effective:

# /etc/nfs.conf adjustments
[nfsd]
threads=16
tcp=yes
vers4=y
vers4.0=y
vers4.1=y
vers4.2=y

[exportfs]
fsid=0
crossmnt=yes

[mountd]
manage-gids=y

[statd]
port=32765
outgoing-port=32766

The mount options that yielded optimal results:

# Optimized mount options
mount -t nfs4 -o \
rw,noatime,nodiratime,sync,noac,hard,proto=tcp,\
vers=4.2,rsize=65536,wsize=65536,timeo=14,\
retrans=2,sec=sys 192.168.1.101:/mnt/test /mnt/test

These sysctl settings significantly improved write performance:

# /etc/sysctl.d/nfs.conf
# NFS client settings
sunrpc.tcp_slot_table_entries=128
sunrpc.udp_slot_table_entries=128

# TCP stack optimization
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_sack=1
net.ipv4.tcp_max_syn_backlog=8192

After implementing these changes, verify with:

# Write performance test
$ dd if=/dev/zero of=/mnt/test/largefile bs=1M count=8192
8589934592 bytes (8.6 GB) copied, 61.2342 s, 140 MB/s

# Read verification
$ dd if=/mnt/test/largefile of=/dev/null bs=1M
8589934592 bytes (8.6 GB) copied, 58.7654 s, 146 MB/s

For persistent issues, these diagnostic commands help:

# Check NFS server threads
$ cat /proc/net/rpc/nfsd|grep th
th 16 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000

# Monitor NFS latency
$ nfsiostat /mnt/test 5

When working with high-speed 10Gbps networks, NFS write performance often becomes a bottleneck despite optimal network conditions. Through iperf testing, we consistently achieve ~9.8Gbps throughput, and local disk operations show ~150MB/s write speeds. However, NFS uploads crawl at 18-20MB/s with unexplained latency spikes.

Several critical observations emerge from troubleshooting:

# Server exports configuration
/mnt/test 192.168.1.0/24(rw,no_root_squash,insecure,sync,no_subtree_check)

# Client mount options
mount -t nfs4 -o sync,vers=4.0,rsize=1048576,wsize=1048576,hard 192.168.1.101:/mnt/test /mnt/test

The behavior persists even when:

  • Testing with 100Mbps equipment (ruling out NIC issues)
  • Switching server/client roles
  • Adjusting MTU sizes and link aggregation
  • Applying various sysctl network optimizations

Four critical areas impact NFS write performance:

1. Protocol Version Matters

NFSv4 introduces session trunking and compound operations that can significantly improve performance:

# Recommended mount options for NFSv4:
mount -t nfs4 -o vers=4.2,noatime,hard,rsize=65536,wsize=65536,timeo=14,retrans=2 \
    192.168.1.101:/mnt/test /mnt/test

2. Server-Side Tuning

Essential /etc/nfs.conf adjustments:

[nfsd]
threads=64
tcp=y
vers4=y
vers4.0=y
vers4.1=y
vers4.2=y

[exportfs]
debug=all

[mountd]
debug=all
manage-gids=y

3. Network Stack Optimization

Critical sysctl parameters for 10Gbps networks:

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
sunrpc.tcp_slot_table_entries = 128
sunrpc.udp_slot_table_entries = 128

Use these commands to verify actual performance:

# Measure sequential write performance
time dd if=/dev/zero of=/mnt/test/testfile bs=1M count=4096 oflag=direct

# Monitor NFS operations in real-time
nfsiostat -d 5

# Check server-side performance
cat /proc/net/rpc/nfsd

For environments requiring high-performance writes:

1. Parallel NFS (pNFS)

Modern Linux kernels support pNFS for distributed I/O:

# Server config for pNFS
echo 1 > /sys/module/nfsd/parameters/nfs4_disable_idmapping

# Client mount with pNFS
mount -t nfs4 -o minorversion=1 192.168.1.101:/mnt/test /mnt/test

2. RDMA over Converged Ethernet (RoCE)

For maximum performance with compatible hardware:

# Install required packages
yum install rdma-core libibverbs-utils

# Mount with RDMA support
mount -t nfs4 -o proto=rdma,port=20049 192.168.1.101:/mnt/test /mnt/test

Before declaring victory:

  • Verify NFS server threads are sufficient (nfsstat -s)
  • Check for network packet drops (ethtool -S eth0)
  • Monitor server CPU during transfers (mpstat -P ALL 1)
  • Ensure no disk I/O bottlenecks (iostat -dx 1)