Troubleshooting GlusterFS Mount Failures on Boot: Client-Server Configuration on Ubuntu


2 views

When running GlusterFS 3.5 on Ubuntu 12.04 in a client-server configuration, the most common failure pattern appears in syslog as:

[2014-06-17 08:20:53.000373] E [socket.c:2161:socket_connect_finish] 0-glusterfs: 
connection to 127.0.0.1:24007 failed (Connection refused)
[2014-06-17 08:20:53.000427] E [glusterfsd-mgmt.c:1601:mgmt_rpc_notify] 
0-glusterfsd-mgmt: failed to connect with remote-host: 127.0.0.1

The root cause stems from improper service ordering in Ubuntu's Upstart system. The mount attempt occurs before the GlusterFS server is ready to accept connections. Key observations from boot logs:

  • Mount operations start too early (during "Mount network filesystems")
  • GlusterFS server starts later (after "System V runlevel compatibility")
  • The upstart job mounting-glusterfs.conf fails to properly delay mounting

Option 1: Systemd-based Solution (Ubuntu 15.04+)

# /etc/systemd/system/glusterfs-mount.service
[Unit]
Description=Mount GlusterFS Volumes
After=glusterd.service
Requires=glusterd.service

[Service]
Type=oneshot
ExecStart=/bin/mount /var/www/shared/public/uploads
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Option 2: Upstart Workaround (Ubuntu 12.04-14.10)

# /etc/init/glusterfs-mount.conf
description "Delayed GlusterFS mount"
start on (started glusterfs-server and net-device-up IFACE=lo)
task
exec /bin/mount -a -t glusterfs

For environments requiring complex dependencies:

# /etc/network/if-up.d/gluster-mount
#!/bin/sh
[ "$IFACE" = "lo" ] || exit 0
until pgrep -f glusterd >/dev/null; do sleep 1; done
mount /var/www/shared/public/uploads

When troubleshooting persistent issues:

# Check service status
sudo service glusterfs-server status

# Verify brick availability
gluster volume status public_uploads detail

# Test manual mount
sudo umount /var/www/shared/public/uploads
sudo mount -v -t glusterfs 127.0.0.1:/public_uploads /var/www/shared/public/uploads

For mission-critical systems, implement these safeguards:

  • Add retry logic in mount scripts with exponential backoff
  • Configure monitoring for mount point availability
  • Implement fallback to local storage if GlusterFS is unavailable
  • Consider using autofs for on-demand mounting

The core problem occurs when Ubuntu's init system attempts to mount GlusterFS volumes before both network connectivity and the GlusterFS server service are fully operational. The error logs clearly show connection refusal (port 24007) during boot:

[2014-06-17 08:20:53.000373] E [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)
[2014-06-17 08:20:53.000427] E [glusterfsd-mgmt.c:1601:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 127.0.0.1

The existing fstab entry:

127.0.0.1:/private_uploads /var/www/shared/private/uploads glusterfs defaults,_netdev 0 0

While _netdev flag helps with network availability, it doesn't account for GlusterFS service readiness. The upstart job (mounting-glusterfs.conf) only waits for network, not the GlusterFS service.

Option 1: Systemd Service Unit (Ubuntu 15.04+)

Create a systemd mount unit that explicitly depends on both network and glusterd.service:

# /etc/systemd/system/var-www-shared-public-uploads.mount
[Unit]
Description=Mount GlusterFS Volume
Requires=network-online.target glusterfs-server.service
After=network-online.target glusterfs-server.service

[Mount]
What=127.0.0.1:/public_uploads
Where=/var/www/shared/public/uploads
Type=glusterfs
Options=_netdev,defaults

[Install]
WantedBy=multi-user.target

Option 2: Upstart Workaround (Ubuntu 12.04)

For older Ubuntu versions using Upstart, modify the existing approach:

# /etc/init/mount-gluster.conf
description "Mount GlusterFS volumes after service start"
start on started glusterfs-server

script
    mount -a -t glusterfs
end script

Option 3: Fstab with Custom Mount Helper

Create a wrapper script to handle dependencies:

#!/bin/bash
# /sbin/mount.glusterfs-wrapper
until systemctl is-active --quiet glusterfs-server; do
    sleep 1
done
exec /sbin/mount.glusterfs "$@"

Then update fstab:

127.0.0.1:/public_uploads /var/www/shared/public/uploads glusterfs defaults,_netdev,x-systemd.requires=glusterfs-server.service,x-systemd.after=glusterfs-server.service 0 0

After implementing any solution, verify boot sequence with:

systemd-analyze plot > boot.svg

Or for Upstart:

grep -E 'gluster|mount' /var/log/syslog

For more resilient mounting:

# /etc/auto.master
/- /etc/auto.gluster --timeout=60

# /etc/auto.gluster
/var/www/shared/public/uploads -fstype=glusterfs 127.0.0.1:/public_uploads