Optimizing NFS Client Timeout Configuration to Handle Server Disconnections


2 views

When working with NFS mounts, the default behavior of hanging indefinitely during server disconnections can severely impact application performance. The current configuration shows:

<remote-host-ip>:/path/to/origin /shared/point nfs defaults 0 0

Which translates to these actual mount options:

hard,proto=tcp,timeo=600,retrans=2

These NFS client parameters control timeout behavior:

  • timeo: Initial timeout in deciseconds (600 = 60 seconds)
  • retrans: Number of retries before failure (default 2)
  • hard/soft: Recovery behavior after retries exhausted

For more responsive failure detection, modify your fstab entry:

<remote-host-ip>:/path/to/origin /shared/point nfs rw,nosuid,nodev,noexec,soft,timeo=30,retrans=3 0 0

Key changes:

soft - Fail operations after retries instead of hanging
timeo=30 - 3 second initial timeout (30 deciseconds)
retrans=3 - Total of 3 attempts (original + 2 retries)

The worst-case detection time can be calculated as:

timeo * (2^(retrans-1)) / 10 seconds

For our example (timeo=30, retrans=3):
30 * (2^(2)) / 10 = 30 * 4 / 10 = 12 seconds maximum

When using the soft option, implement proper error handling in your code:

try:
    with open("/shared/point/file.txt") as f:
        data = f.read()
except IOError as e:
    if e.errno == errno.EIO:
        print("NFS server unavailable")
    else:
        raise

If you must use hard mounts, consider adding the intr option:

<remote-host-ip>:/path/to/origin /shared/point nfs rw,hard,intr,timeo=30,retrans=3 0 0

This allows processes to be interrupted (killed) while waiting for NFS operations.

Check NFS client statistics to tune your timeout values:

nfsstat -c
cat /proc/fs/nfsfs/servers

Look for retrans values that are significantly higher than your retrans setting.

For advanced tuning, these kernel parameters affect NFS behavior:

# Reduce TCP timeout for NFS connections
echo 10 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes
echo 20 > /proc/sys/net/ipv4/tcp_keepalive_intvl

Every sysadmin working with NFS mounts has faced this scenario: a remote server goes down, and suddenly your local system becomes unresponsive when trying to access the mounted directory. The ls command hangs, file operations freeze, and your productivity grinds to a halt.

From your /etc/fstab entry:

<remote-host-ip>:/path/to/origin /shared/point nfs defaults 0 0

And the actual mount parameters shown by mount:

<remote-host-ip>:/path/to/origin on /shared/point type nfs4 (rw,relatime,vers=4.1,
rsize=1048576,wsize=1048576,namelen=255,hard,proto=tcp,port=0,timeo=600,
retrans=2,sec=sys,clientaddr=<my-ip>,local_lock=none,addr=<remote-ip>)

The culprit here is the combination of hard mount option with default timeout values. Let's break down the relevant parameters:

hard          # The mount will retry indefinitely
timeo=600     # Timeout in deciseconds (60 seconds)
retrans=2     # Number of retries before major timeout

For applications that need to fail fast, use these options in your /etc/fstab:

<remote-host-ip>:/path/to/origin /shared/point nfs 
rw,soft,timeo=50,retrans=3,intr 0 0

Key changes:

  • soft instead of hard: Allows operations to fail after retries
  • timeo=50: 5-second timeout (in deciseconds)
  • retrans=3: Limited retry attempts
  • intr: Allows process interruption

For more granular control, you can specify timeouts when mounting:

mount -t nfs -o soft,timeo=30,retrans=1 <remote-host>:/path /mnt/point

After remounting, verify with:

mount | grep nfs

To test the timeout behavior:

# Simulate network failure
iptables -A OUTPUT -d <remote-ip> -j DROP

# Try accessing the mount
time ls /shared/point  # Should fail within your specified timeout

# Restore connection
iptables -D OUTPUT -d <remote-ip> -j DROP

For critical systems, consider using autofs with timeout settings:

# /etc/auto.master
/-    /etc/auto.nfs --timeout=30

# /etc/auto.nfs
/shared/point -fstype=nfs,soft,timeo=30 <remote-host>:/path/to/origin