How to Configure VRRP (Keepalived) for One-Way Failover: Preventing Failed Master from Regaining Priority


2 views

In high-availability setups using VRRP (via Keepalived), a common requirement is to implement one-way failover where the backup server (Machine B) should permanently take over when the master (Machine A) fails, until manual intervention occurs. The default VRRP behavior of automatic reversion isn't always desirable - especially when the master's failure might indicate deeper system issues.

We can achieve this by combining three techniques in keepalived.conf:


# On Machine A (Original Master)
vrrp_instance VI_1 {
    state BACKUP           # Start as backup initially
    priority 100           # Original master priority
    nopreempt              # Critical: prevent auto-recovery
    preempt_delay 0
    notify_master "/path/to/disable_vrrp.sh"  # Script to self-demote
    
    # Standard VRRP config continues...
    interface eth0
    virtual_router_id 51
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass secret
    }
    virtual_ipaddress {
        192.168.1.100/24
    }
}

Create /path/to/disable_vrrp.sh on Machine A:


#!/bin/bash
# This script runs when Machine A regains master status
logger "VRRP auto-demotion triggered - keeping backup state"
# Permanently reduce priority below backup node
sed -i 's/priority 100/priority 50/' /etc/keepalived/keepalived.conf
# Reload Keepalived to apply changes
systemctl reload keepalived

For more sophisticated failure detection:


vrrp_script chk_service {
    script "/usr/bin/pgrep nginx || exit 1"
    interval 2
    weight -50  # Significant penalty if service fails
}

vrrp_instance VI_1 {
    track_script {
        chk_service
    }
    # Rest of config...
}
  1. Start both nodes with Machine A as initial master (priority 100)
  2. Simulate failure on Machine A (stop keepalived/service crash)
  3. Machine B should take over VIP
  4. Restart keepalived on Machine A - verify it stays in BACKUP state
  5. Check logs for self-demotion script execution
  • Set proper permissions on the demotion script (root:root 700)
  • Implement log rotation for script outputs
  • Consider using version-controlled configuration management
  • Document the manual recovery procedure for operations team

In a standard VRRP implementation using Keepalived, when the original master (Machine A) recovers from failure, it will automatically reclaim the master status due to its higher priority. This behavior isn't always desirable in production environments where we want to maintain the failed-over state until manual validation.

The most effective way to achieve this is by implementing a sticky backup approach through Keepalived's configuration. Here's how to modify your keepalived.conf:


vrrp_script chk_service {
    script "/usr/local/bin/check_service_health.sh"
    interval 2
    weight -50
}

vrrp_instance VI_1 {
    state BACKUP            # Both nodes configured as BACKUP
    interface eth0
    virtual_router_id 51
    priority 100            # Original master's base priority
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass secret123
    }
    virtual_ipaddress {
        192.168.1.100/24
    }
    track_script {
        chk_service
    }
    nopreempt               # Critical for preventing automatic failback
}

  • nopreempt: This option prevents a higher priority node from taking over master status automatically
  • state BACKUP: Configuring both nodes as BACKUP forces manual intervention for failback
  • Weighted health checks: The health script can control priority dynamically

Create a health check script (/usr/local/bin/check_service_health.sh) that will manage priority:


#!/bin/bash
# Check if manual failback is allowed
if [ -f "/etc/keepalived/manual_failback_allowed" ]; then
    exit 0
else
    exit 1
fi

When you're ready to restore the original master:

  1. SSH into the original master (Machine A)
  2. Create the trigger file: touch /etc/keepalived/manual_failback_allowed
  3. Restart Keepalived: systemctl restart keepalived
  4. Remove the file after successful failback if you want to maintain the new state

For a more permanent solution, you can modify Machine A's configuration after failover:


# On Machine A after recovery:
sed -i 's/priority 100/priority 90/' /etc/keepalived/keepalived.conf
systemctl restart keepalived