Troubleshooting Persistent CIFS/Samba Mount Hangs on Linux During Idle Connections


1 views

When mounting Windows shares via CIFS/Samba on Linux systems (specifically Gentoo in this case), users may encounter unresponsive mounts that:

  • Cause processes to enter D state (uninterruptible sleep)
  • Make affected processes immune to SIGTERM/SIGKILL
  • Block filesystem operations (ls, save dialogs, etc.)
  • Persist for 5+ minutes before self-resolving

This typically occurs due to:

  1. Session timeout: Windows SMB servers may drop idle connections after 15 minutes by default
  2. TCP keepalive misconfiguration: Linux may not properly detect broken connections
  3. Network instability: Packet loss during session maintenance

1. Mount Parameter Optimization:

mount -t cifs -o username=WindowsUser,password=pass,uid=user,vers=3.0,actimeo=0,soft,serverino //192.168.0.103/Users /mnt/windowsbox

Key parameters:

  • vers=3.0: Force SMB3 protocol
  • actimeo=0: Disable attribute caching
  • soft: Allow fails instead of hanging
  • serverino: Better handle Windows inodes

2. System-level TCP Tuning:

# Add to /etc/sysctl.conf
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 10

When hangs occur, try these instead of standard kill:

# Find stuck processes
lsof +D /mnt/windowsbox

# Force unmount (dangerous but effective)
umount -l /mnt/windowsbox

# Alternative when processes won't die
echo 1 > /proc/sys/kernel/sysrq
echo b > /proc/sysrq-trigger  # Emergency reboot

Create a watchdog script (/usr/local/bin/cifs_watchdog):

#!/bin/bash
TIMEOUT=30
TARGET="/mnt/windowsbox"

if [[ $(stat -c "%s" "$TARGET/.testfile" 2>&1) =~ "Transport endpoint is not connected" ]]; then
    umount -l "$TARGET"
    mount "$TARGET"
elif ! timeout $TIMEOUT touch "$TARGET/.testfile"; then
    logger "CIFS watchdog: Mount hung, forcing unmount"
    umount -l "$TARGET"
    mount "$TARGET"
fi

Add to crontab:

*/5 * * * * /usr/local/bin/cifs_watchdog

When mounting Windows shares via CIFS/Samba on Linux systems (particularly in this Gentoo case), we often encounter stubborn hangs where processes enter uninterruptible D-state. The typical scenario involves:


# Classic mount command that eventually hangs
mount -t cifs -o username=WindowsUsername,password=thepassword,uid=pistos //192.168.0.103/Users /mnt/windowsbox

Key behavioral patterns observed:

  • Processes become unkillable (even with SIGKILL)
  • Network disconnection of Windows machine doesn't resolve the issue
  • Hangs typically occur after periods of idle connection
  • Manual umount either hangs or reports "device busy"

The root cause typically stems from CIFS/Samba's default behavior regarding session management. Three primary factors contribute:


// Common culprits found in kernel log (dmesg):
[ 1234.567890] CIFS VFS: Server not responding
[ 1235.678901] CIFS VFS: cifs_reconnect tcp session mangled

1. Session Timeout Mismatch: Windows SMB servers typically have 15-minute idle session timeouts while Linux CIFS defaults to 7 seconds for retry attempts.

2. TCP/IP Stack Differences: Windows and Linux handle TCP keepalives differently, causing premature connection drops.

Immediate Workarounds

For existing hung processes:


# Forcefully unmount (dangerous but sometimes necessary)
umount -f -l /mnt/windowsbox

# Alternative when NFS processes are stuck
echo 1 > /proc/fs/cifs/LinuxExtensionsEnabled

Preventive Measures

Modify your mount options to include these critical parameters:


mount -t cifs -o username=WinUser,password=pass,uid=localuser,\
noserverino,vers=3.0,actimeo=120,hard,noforce,\
cache=strict,serverino //192.168.0.103/Users /mnt/windowsbox

Key options explained:

  • vers=3.0: Forces SMB3 protocol which handles disconnections better
  • actimeo=120: Sets attribute cache timeout to 2 minutes
  • hard: Makes the mount persistent despite server unavailability
  • cache=strict: Enforces coherent caching behavior

For enterprise environments, consider these kernel-level tweaks:


# Add to /etc/sysctl.conf
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15

# CIFS-specific module parameters
echo "options cifs disable_legacy_dialects=Y" > /etc/modprobe.d/cifs.conf

Create a watchdog script to monitor and recover stuck mounts:


#!/bin/bash
MOUNT_POINT="/mnt/windowsbox"
TIMEOUT=60

if [[ $(findmnt -M "$MOUNT_POINT" -o STAT -n) =~ .*D.* ]]; then
    echo "$(date) - Detected hung mount at $MOUNT_POINT"
    umount -l "$MOUNT_POINT"
    sleep 2
    mount -a
    logger -t cifs_watchdog "Recovered stuck CIFS mount $MOUNT_POINT"
fi

Set this as a cron job running every 5 minutes for persistent monitoring.

When problems occur, gather these diagnostics:


# Check current CIFS connections
cat /proc/fs/cifs/DebugData

# View active SMB sessions
smbstatus -vv

# Capture network traces
tcpdump -i eth0 -w cifs_debug.pcap port 445