Resolving GlusterFS Brick Startup Failure After Full Cluster Outage in 2-Node HA Setup

In a 2-node GlusterFS 3.7.6 setup with Pacemaker+Corosync for high availability, we observe an interesting behavior during cluster recovery from full shutdown. When both nodes experience power loss and only one node boots first, the brick process fails to start automatically despite the Gluster service running.

# Symptom when single node comes up first
[root@node1 ~]# gluster volume status data
Status of volume: data
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node1:/gluster_data                   N/A       N/A        N       N/A  
NFS Server on localhost                     N/A       N/A        N       N/A

GlusterFS in a 2-node replicated volume configuration requires peer handshaking before activating bricks. This is a safety mechanism to prevent split-brain scenarios. The brick processes won't start until:

Minimum required peers (2 in this case) are online
Successful peer probe completes
Quorum is established

Here are three approaches to handle this situation:

1. Forced Start with heal-full Option

# On the first booted node:
gluster volume set data cluster.force-mandatory-volumefile-check off
gluster volume start data force
gluster volume heal data full

2. Automatic Recovery Script

Create a systemd service that runs after glusterd starts:

[Unit]
Description=GlusterFS single node recovery
After=glusterd.service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/gluster-recovery.sh

[Install]
WantedBy=multi-user.target

Script content (/usr/local/bin/gluster-recovery.sh):

#!/bin/bash
PEER_STATUS=$(gluster peer status | grep -c "Connected")
if [ $PEER_STATUS -eq 0 ]; then
    gluster volume start data force
    gluster volume heal data full
fi

3. Using Pacemaker Resource Agent

Modify your Pacemaker configuration to handle this case:

pcs resource create glusterfs ocf:heartbeat:glusterd \
    op monitor interval="30s"
pcs resource create gvol_data ocf:heartbeat:gluster \
    volname="data" op monitor interval="30s" \
    op start timeout="300s" op stop timeout="120s"
pcs constraint colocation add gvol_data with glusterfs INFINITY
pcs constraint order glusterfs then gvol_data

For production environments, I recommend combining methods 2 and 3:

Use Pacemaker for normal HA operation
Implement the recovery script as a safety net
Add monitoring for split-brain conditions

Also consider these GlusterFS tuning parameters in your volume configuration:

gluster volume set data cluster.quorum-type auto
gluster volume set data cluster.server-quorum-type server
gluster volume set data network.ping-timeout 20

When implementing a 2-node high availability setup with GlusterFS 3.7.6, a critical issue emerges during full cluster outages. If both nodes power off and only one node boots first, the brick process fails to start automatically despite the Gluster service running. This behavior contradicts the expected HA functionality where a single surviving node should maintain service availability.

# Typical error output when single node starts first
gluster volume status
Status of volume: data
Gluster process      TCP Port  RDMA Port  Online  Pid
-----------------------------------------------------
Brick node1:/data    N/A       N/A        N       N/A

GlusterFS employs a dynamic quorum system where a majority of nodes must be online to maintain volume availability. In a 2-node configuration, this creates a chicken-and-egg problem during full restarts. The volume won't activate until it detects peer availability, while peers wait for each other to establish connectivity.

We can modify GlusterFS behavior through these configuration changes:

# On both nodes, edit glusterd settings
vim /etc/glusterfs/glusterd.vol

# Add these parameters under 'option transport-type' section
option cluster.quorum-type fixed
option cluster.server-quorum-type server
option cluster.minimum-quorum-percentage 51

Additionally, create a startup script to force brick activation:

#!/bin/bash
# /usr/local/bin/gluster-force-start

# Wait for glusterd to initialize
sleep 10

# Check peer status
PEER_STATUS=$(gluster peer status | grep -c "Peer in Cluster")

if [ "$PEER_STATUS" -eq 0 ]; then
    # If no peers detected, force local volume start
    gluster volume start data force
    logger "GlusterFS: Forced volume start in single-node mode"
fi

For proper HA integration, modify your Pacemaker resource configuration:

# Sample pacemaker configuration for Gluster resource
primitive glusterfs-server ocf:heartbeat:glusterd \
    op monitor interval="30s" \
    op start interval="0" timeout="120s" \
    op stop interval="0" timeout="120s"

primitive gluster-volume ocf:heartbeat:gluster_volume \
    params volname="data" \
    op monitor interval="30s" \
    op start interval="0" timeout="60s" \
    op stop interval="0" timeout="120s"

colocation gluster-with-volume inf: glusterfs-server gluster-volume
order gluster-then-volume inf: glusterfs-server:start gluster-volume:start

After implementing these changes, test your failover scenario:

Power off both nodes completely
Start Node1 alone and verify brick status within 60 seconds
Check Apache can mount and serve from the Gluster volume
Start Node2 and verify automatic healing occurs
Power off Node2 and confirm Node1 maintains service

# Verification commands
gluster volume status
mount | grep gluster
df -h /var/www/html
crm_mon -1

For mission-critical deployments, consider these architectural improvements:

Add a third (even lightweight) arbiter node to maintain proper quorum
Implement geo-replication to a cold standby site
Use GlusterFS 7+ which includes improved single-node recovery features
Configure ZFS underneath Gluster for additional data integrity

ServerDevWorker

Resolving GlusterFS Brick Startup Failure After Full Cluster Outage in 2-Node HA Setup

1. Forced Start with heal-full Option

2. Automatic Recovery Script

3. Using Pacemaker Resource Agent

Related Articles