How to Ensure remote-fs.target Starts Only After GlusterFS NFS Service is Fully Operational


2 views

When dealing with network-based filesystems like GlusterFS exports mounted via NFS, we often face a race condition between service startup and mount attempts. The systemd dependency chain (Requires + After) only ensures the process started, not that the service is fully operational.

Your existing unit definitions show proper structural dependencies:

# glusterfsd.service
[Unit]
After=network.target glusterd.service

# remote-fs.target 
[Unit]
Requires=glusterfsd.service
After=glusterfsd.service remote-fs-pre.target

Systemd marks units as "active" when their main process starts, but GlusterFS NFS requires additional initialization time. We need to implement a readiness check.

Create a helper service that actively verifies NFS availability:

# /etc/systemd/system/gluster-nfs-ready.service
[Unit]
Description=GlusterFS NFS Readiness Check
After=glusterfsd.service

[Service]
Type=oneshot
ExecStart=/usr/bin/bash -c 'until showmount -e localhost &>/dev/null; do sleep 1; done'
TimeoutSec=300

[Install]
WantedBy=remote-fs.target

Then modify remote-fs.target:

# /etc/systemd/system/remote-fs.target.d/10-gluster-wait.conf
[Unit]
Requires=gluster-nfs-ready.service
After=gluster-nfs-ready.service

For mount units specifically, systemd v240+ supports automatic retries:

# /etc/systemd/system/stor.mount
[Unit]
After=glusterfsd.service
ConditionPathExists=/stor

[Mount]
What=node04:/stor
Where=/stor
Type=nfs
Options=retry=10,timeo=30

[Install]
WantedBy=remote-fs.target

For enterprise deployments, consider combining both approaches:

# /etc/systemd/system/gluster-nfs-probe.service
[Unit]
After=glusterfsd.service
Before=remote-fs.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/gluster-nfs-probe.sh
TimeoutSec=0

[Install]
WantedBy=multi-user.target

Sample probe script (/usr/local/bin/gluster-nfs-probe.sh):

#!/bin/bash
MAX_RETRIES=30
INTERVAL=2

for i in $(seq 1 $MAX_RETRIES); do
    if showmount -e localhost | grep -q '/stor'; then
        exit 0
    fi
    sleep $INTERVAL
done

exit 1

Use these commands to verify the solution:

systemd-analyze critical-chain remote-fs.target
journalctl -u gluster-nfs-ready.service -u remote-fs.target --since "5 minutes ago"

When working with distributed filesystems like GlusterFS, we often encounter timing issues where NFS exports aren't immediately available after the service starts. The standard After= and Requires= directives in systemd only ensure the service process has started, not that it's fully operational.

Your current unit files show good practice with proper dependency declarations:

[Unit]
Description=GlusterFS brick processes (stopping only)
After=network.target glusterd.service

And in remote-fs.target:

[Unit]
Requires=glusterfsd.service
After=glusterfsd.service remote-fs-pre.target

The logs clearly show the race condition:

Apr 14 16:16:22 node04 systemd[1]: Started GlusterFS
Apr 14 16:16:22 node04 systemd[1]: Mounting /stor...
Apr 14 16:16:23 node04 mount[2960]: mount.nfs: mounting node04:/stor failed

Systemd considers the service "started" when the process launches, but Gluster's NFS exports take additional time to become available.

Solution 1: Health Check Script

Create a helper service that verifies NFS availability:

[Unit]
Description=GlusterFS NFS readiness check
After=glusterfsd.service
Before=remote-fs.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/check-gluster-nfs.sh
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Sample check script:

#!/bin/bash
for i in {1..30}; do
    if showmount -e localhost | grep -q /stor; then
        exit 0
    fi
    sleep 1
done
exit 1

Solution 2: Mount Unit Retry Logic

Modify your mount unit to include retries:

[Unit]
Description=GlusterFS NFS Mount
After=glusterfsd.service
Requires=glusterfsd.service

[Mount]
What=node04:/stor
Where=/stor
Type=nfs
Options=soft,retry=5,timeo=10,retrans=1

Solution 3: Systemd Path Unit

Trigger the mount when a readiness file appears:

[Unit]
Description=Watch for Gluster NFS readiness

[Path]
PathExists=/var/run/gluster-nfs.ready

[Install]
WantedBy=multi-user.target

For production systems, I recommend combining approaches:

[Unit]
Description=GlusterFS NFS Mount
After=glusterfs-nfs-ready.service
Requires=glusterfs-nfs-ready.service

[Mount]
What=node04:/stor
Where=/stor
Type=nfs
Options=soft,retry=5,timeo=10

With the readiness service:

[Unit]
Description=GlusterFS NFS readiness
After=glusterfsd.service
Before=remote-fs.target

[Service]
Type=oneshot
ExecStart=/usr/bin/bash -c 'until showmount -e localhost | grep -q /stor; do sleep 1; done'
ExecStart=/usr/bin/touch /var/run/gluster-nfs.ready
RemainAfterExit=yes

This ensures proper sequencing while providing monitoring capabilities through the systemd journal.