When working with NFS mounts, the default behavior of hanging indefinitely during server disconnections can severely impact application performance. The current configuration shows:
<remote-host-ip>:/path/to/origin /shared/point nfs defaults 0 0
Which translates to these actual mount options:
hard,proto=tcp,timeo=600,retrans=2
These NFS client parameters control timeout behavior:
- timeo: Initial timeout in deciseconds (600 = 60 seconds)
- retrans: Number of retries before failure (default 2)
- hard/soft: Recovery behavior after retries exhausted
For more responsive failure detection, modify your fstab entry:
<remote-host-ip>:/path/to/origin /shared/point nfs rw,nosuid,nodev,noexec,soft,timeo=30,retrans=3 0 0
Key changes:
soft - Fail operations after retries instead of hanging
timeo=30 - 3 second initial timeout (30 deciseconds)
retrans=3 - Total of 3 attempts (original + 2 retries)
The worst-case detection time can be calculated as:
timeo * (2^(retrans-1)) / 10 seconds
For our example (timeo=30, retrans=3):
30 * (2^(2)) / 10 = 30 * 4 / 10 = 12 seconds maximum
When using the soft option, implement proper error handling in your code:
try:
with open("/shared/point/file.txt") as f:
data = f.read()
except IOError as e:
if e.errno == errno.EIO:
print("NFS server unavailable")
else:
raise
If you must use hard mounts, consider adding the intr option:
<remote-host-ip>:/path/to/origin /shared/point nfs rw,hard,intr,timeo=30,retrans=3 0 0
This allows processes to be interrupted (killed) while waiting for NFS operations.
Check NFS client statistics to tune your timeout values:
nfsstat -c
cat /proc/fs/nfsfs/servers
Look for retrans values that are significantly higher than your retrans setting.
For advanced tuning, these kernel parameters affect NFS behavior:
# Reduce TCP timeout for NFS connections
echo 10 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes
echo 20 > /proc/sys/net/ipv4/tcp_keepalive_intvl
Every sysadmin working with NFS mounts has faced this scenario: a remote server goes down, and suddenly your local system becomes unresponsive when trying to access the mounted directory. The ls
command hangs, file operations freeze, and your productivity grinds to a halt.
From your /etc/fstab
entry:
<remote-host-ip>:/path/to/origin /shared/point nfs defaults 0 0
And the actual mount parameters shown by mount
:
<remote-host-ip>:/path/to/origin on /shared/point type nfs4 (rw,relatime,vers=4.1,
rsize=1048576,wsize=1048576,namelen=255,hard,proto=tcp,port=0,timeo=600,
retrans=2,sec=sys,clientaddr=<my-ip>,local_lock=none,addr=<remote-ip>)
The culprit here is the combination of hard
mount option with default timeout values. Let's break down the relevant parameters:
hard # The mount will retry indefinitely
timeo=600 # Timeout in deciseconds (60 seconds)
retrans=2 # Number of retries before major timeout
For applications that need to fail fast, use these options in your /etc/fstab
:
<remote-host-ip>:/path/to/origin /shared/point nfs
rw,soft,timeo=50,retrans=3,intr 0 0
Key changes:
soft
instead ofhard
: Allows operations to fail after retriestimeo=50
: 5-second timeout (in deciseconds)retrans=3
: Limited retry attemptsintr
: Allows process interruption
For more granular control, you can specify timeouts when mounting:
mount -t nfs -o soft,timeo=30,retrans=1 <remote-host>:/path /mnt/point
After remounting, verify with:
mount | grep nfs
To test the timeout behavior:
# Simulate network failure
iptables -A OUTPUT -d <remote-ip> -j DROP
# Try accessing the mount
time ls /shared/point # Should fail within your specified timeout
# Restore connection
iptables -D OUTPUT -d <remote-ip> -j DROP
For critical systems, consider using autofs with timeout settings:
# /etc/auto.master
/- /etc/auto.nfs --timeout=30
# /etc/auto.nfs
/shared/point -fstype=nfs,soft,timeo=30 <remote-host>:/path/to/origin