GlusterFS High Availability: Understanding Server Node Failure Handling and Data Replication Mechanisms


2 views

When you mount a Gluster volume from a specific server node and that node fails, the mounted point on clients becomes unresponsive because:


# Example mount command that creates single-point dependency
mount -t glusterfs gluster-server1:/myvolume /mnt/gluster

The key point most docs miss is that mounting from a single server defeats HA purposes. Instead, you should use:


# Proper HA mount using the virtual IP or DNS round-robin
mount -t glusterfs gluster-cluster.example.com:/myvolume /mnt/gluster

Gluster's replication works fundamentally differently than rsync/unison:

  • Active-active replication at block level (not file-level like rsync)
  • Automatic healing during reconnection
  • No manual sync commands needed

A proper 3-node replicated volume setup:


gluster volume create myvolume replica 3 \
  server1:/bricks/brick1 \
  server2:/bricks/brick1 \
  server3:/bricks/brick1

For production clients, implement these resilience patterns:


#!/bin/bash
# Auto-retry mount script
MAX_RETRIES=3
MOUNT_POINT="/mnt/gluster"

mount_gluster() {
  for i in $(seq 1 $MAX_RETRIES); do
    mount -t glusterfs gluster-cluster.example.com:/myvolume $MOUNT_POINT && break
    sleep $((i*2))
    [ $i -eq $MAX_RETRIES ] && exit 1
  done
}

# Check mount health
if ! grep -qs "$MOUNT_POINT" /proc/mounts; then
  umount -l $MOUNT_POINT 2>/dev/null
  mount_gluster
fi

When a failed server rejoins:

  1. Gluster detects brick synchronization needs
  2. Background healing processes start automatically
  3. Clients remain operational during healing

Force healing check:


gluster volume heal myvolume info

Essential settings for true HA:


# /etc/glusterfs/glusterd.vol
option transport-type tcp,rdma
option cluster.granular-entry-heal on
option cluster.background-self-heal-count 8

# Volume settings
gluster volume set myvolume cluster.ensure-durability on
gluster volume set myvolume performance.cache-size 2GB

GlusterFS operates as a distributed network filesystem that aggregates storage servers into a single unified namespace. The key components are:

# Typical Gluster volume creation command
gluster volume create test-volume replica 3 server1:/data/brick1 server2:/data/brick1 server3:/data/brick1
gluster volume start test-volume

When you mount a Gluster volume using:

mount -t glusterfs server1:/test-volume /mnt/gluster

The client maintains connections to all servers in the volume's trusted storage pool. If the mounted server (server1 in this case) fails:

  • The client automatically fails over to other available servers
  • Existing file handles remain valid through the FUSE layer
  • New operations are routed to healthy servers

Unlike rsync/unison which perform periodic synchronization:

# Traditional rsync approach (not how Gluster works)
rsync -azv /source/dir/ server2:/destination/dir/

GlusterFS provides:

  • Real-time synchronous replication (for replica volumes)
  • Automatic conflict resolution
  • Self-healing capabilities when nodes return

For optimal failover, clients should:

  1. Use DNS round-robin for server addresses
  2. Specify multiple servers in the backupvolfile-server option
# /etc/fstab entry with failover support
server1:/test-volume /mnt/gluster glusterfs backupvolfile-server=server2,server3 0 0

When troubleshooting failed mounts:

# Check Gluster client logs
tail -f /var/log/glusterfs/mnt-gluster.log

# Verify volume status
gluster volume status test-volume

For a 3-node replica volume where server1 fails:

# On healthy nodes, check replica status
gluster volume heal test-volume info

# After server1 returns, trigger healing
gluster volume heal test-volume

# Monitor healing progress
gluster volume heal test-volume info healed