In high-availability setups using VRRP (via Keepalived), a common requirement is to implement one-way failover where the backup server (Machine B) should permanently take over when the master (Machine A) fails, until manual intervention occurs. The default VRRP behavior of automatic reversion isn't always desirable - especially when the master's failure might indicate deeper system issues.
We can achieve this by combining three techniques in keepalived.conf:
# On Machine A (Original Master)
vrrp_instance VI_1 {
state BACKUP # Start as backup initially
priority 100 # Original master priority
nopreempt # Critical: prevent auto-recovery
preempt_delay 0
notify_master "/path/to/disable_vrrp.sh" # Script to self-demote
# Standard VRRP config continues...
interface eth0
virtual_router_id 51
advert_int 1
authentication {
auth_type PASS
auth_pass secret
}
virtual_ipaddress {
192.168.1.100/24
}
}
Create /path/to/disable_vrrp.sh on Machine A:
#!/bin/bash
# This script runs when Machine A regains master status
logger "VRRP auto-demotion triggered - keeping backup state"
# Permanently reduce priority below backup node
sed -i 's/priority 100/priority 50/' /etc/keepalived/keepalived.conf
# Reload Keepalived to apply changes
systemctl reload keepalived
For more sophisticated failure detection:
vrrp_script chk_service {
script "/usr/bin/pgrep nginx || exit 1"
interval 2
weight -50 # Significant penalty if service fails
}
vrrp_instance VI_1 {
track_script {
chk_service
}
# Rest of config...
}
- Start both nodes with Machine A as initial master (priority 100)
- Simulate failure on Machine A (stop keepalived/service crash)
- Machine B should take over VIP
- Restart keepalived on Machine A - verify it stays in BACKUP state
- Check logs for self-demotion script execution
- Set proper permissions on the demotion script (root:root 700)
- Implement log rotation for script outputs
- Consider using version-controlled configuration management
- Document the manual recovery procedure for operations team
In a standard VRRP implementation using Keepalived, when the original master (Machine A) recovers from failure, it will automatically reclaim the master status due to its higher priority. This behavior isn't always desirable in production environments where we want to maintain the failed-over state until manual validation.
The most effective way to achieve this is by implementing a sticky backup approach through Keepalived's configuration. Here's how to modify your keepalived.conf:
vrrp_script chk_service {
script "/usr/local/bin/check_service_health.sh"
interval 2
weight -50
}
vrrp_instance VI_1 {
state BACKUP # Both nodes configured as BACKUP
interface eth0
virtual_router_id 51
priority 100 # Original master's base priority
advert_int 1
authentication {
auth_type PASS
auth_pass secret123
}
virtual_ipaddress {
192.168.1.100/24
}
track_script {
chk_service
}
nopreempt # Critical for preventing automatic failback
}
- nopreempt: This option prevents a higher priority node from taking over master status automatically
- state BACKUP: Configuring both nodes as BACKUP forces manual intervention for failback
- Weighted health checks: The health script can control priority dynamically
Create a health check script (/usr/local/bin/check_service_health.sh) that will manage priority:
#!/bin/bash
# Check if manual failback is allowed
if [ -f "/etc/keepalived/manual_failback_allowed" ]; then
exit 0
else
exit 1
fi
When you're ready to restore the original master:
- SSH into the original master (Machine A)
- Create the trigger file:
touch /etc/keepalived/manual_failback_allowed
- Restart Keepalived:
systemctl restart keepalived
- Remove the file after successful failback if you want to maintain the new state
For a more permanent solution, you can modify Machine A's configuration after failover:
# On Machine A after recovery:
sed -i 's/priority 100/priority 90/' /etc/keepalived/keepalived.conf
systemctl restart keepalived