Optimizing NFS Client Timeout Configuration to Handle Server Disconnections


11 views

When working with NFS mounts, the default behavior of hanging indefinitely during server disconnections can severely impact application performance. The current configuration shows:

<remote-host-ip>:/path/to/origin /shared/point nfs defaults 0 0

Which translates to these actual mount options:

hard,proto=tcp,timeo=600,retrans=2

These NFS client parameters control timeout behavior:

  • timeo: Initial timeout in deciseconds (600 = 60 seconds)
  • retrans: Number of retries before failure (default 2)
  • hard/soft: Recovery behavior after retries exhausted

For more responsive failure detection, modify your fstab entry:

<remote-host-ip>:/path/to/origin /shared/point nfs rw,nosuid,nodev,noexec,soft,timeo=30,retrans=3 0 0

Key changes:

soft - Fail operations after retries instead of hanging
timeo=30 - 3 second initial timeout (30 deciseconds)
retrans=3 - Total of 3 attempts (original + 2 retries)

The worst-case detection time can be calculated as:

timeo * (2^(retrans-1)) / 10 seconds

For our example (timeo=30, retrans=3):
30 * (2^(2)) / 10 = 30 * 4 / 10 = 12 seconds maximum

When using the soft option, implement proper error handling in your code:

try:
    with open("/shared/point/file.txt") as f:
        data = f.read()
except IOError as e:
    if e.errno == errno.EIO:
        print("NFS server unavailable")
    else:
        raise

If you must use hard mounts, consider adding the intr option:

<remote-host-ip>:/path/to/origin /shared/point nfs rw,hard,intr,timeo=30,retrans=3 0 0

This allows processes to be interrupted (killed) while waiting for NFS operations.

Check NFS client statistics to tune your timeout values:

nfsstat -c
cat /proc/fs/nfsfs/servers

Look for retrans values that are significantly higher than your retrans setting.

For advanced tuning, these kernel parameters affect NFS behavior:

# Reduce TCP timeout for NFS connections
echo 10 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes
echo 20 > /proc/sys/net/ipv4/tcp_keepalive_intvl

Every sysadmin working with NFS mounts has faced this scenario: a remote server goes down, and suddenly your local system becomes unresponsive when trying to access the mounted directory. The ls command hangs, file operations freeze, and your productivity grinds to a halt.

From your /etc/fstab entry:

<remote-host-ip>:/path/to/origin /shared/point nfs defaults 0 0

And the actual mount parameters shown by mount:

<remote-host-ip>:/path/to/origin on /shared/point type nfs4 (rw,relatime,vers=4.1,
rsize=1048576,wsize=1048576,namelen=255,hard,proto=tcp,port=0,timeo=600,
retrans=2,sec=sys,clientaddr=<my-ip>,local_lock=none,addr=<remote-ip>)

The culprit here is the combination of hard mount option with default timeout values. Let's break down the relevant parameters:

hard          # The mount will retry indefinitely
timeo=600     # Timeout in deciseconds (60 seconds)
retrans=2     # Number of retries before major timeout

For applications that need to fail fast, use these options in your /etc/fstab:

<remote-host-ip>:/path/to/origin /shared/point nfs 
rw,soft,timeo=50,retrans=3,intr 0 0

Key changes:

  • soft instead of hard: Allows operations to fail after retries
  • timeo=50: 5-second timeout (in deciseconds)
  • retrans=3: Limited retry attempts
  • intr: Allows process interruption

For more granular control, you can specify timeouts when mounting:

mount -t nfs -o soft,timeo=30,retrans=1 <remote-host>:/path /mnt/point

After remounting, verify with:

mount | grep nfs

To test the timeout behavior:

# Simulate network failure
iptables -A OUTPUT -d <remote-ip> -j DROP

# Try accessing the mount
time ls /shared/point  # Should fail within your specified timeout

# Restore connection
iptables -D OUTPUT -d <remote-ip> -j DROP

For critical systems, consider using autofs with timeout settings:

# /etc/auto.master
/-    /etc/auto.nfs --timeout=30

# /etc/auto.nfs
/shared/point -fstype=nfs,soft,timeo=30 <remote-host>:/path/to/origin