High-Availability Clustering Showdown: Keepalived vs Corosync vs Pacemaker vs Heartbeat for Firewall/Router Failover


2 views

When implementing failover solutions for critical infrastructure like firewalls or routers, Linux offers several mature technologies. Each has distinct architectures and use cases:

// Basic VRRP configuration example for Keepalived
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    virtual_ipaddress {
        192.168.1.100/24
    }
}

Keepalived implements VRRP (Virtual Router Redundancy Protocol) with:

  • Lightweight design (single daemon)
  • Basic health checking
  • Direct virtual IP failover

Corosync provides cluster messaging with:

  • Totem protocol for reliable messaging
  • Native Infiniband support (important for low-latency networks)
  • Quorum systems for split-brain prevention
# Sample Corosync configuration (corosync.conf)
totem {
    version: 2
    cluster_name: my_cluster
    transport: udpu
    interface {
        ringnumber: 0
        bindnetaddr: 192.168.1.0
    }
}

Pacemaker builds on Corosync/Heartbeat to provide:

  • Policy-based resource management
  • Fencing/stonith capabilities
  • Complex dependency chains

Example firewall failover configuration:

pcs resource create ClusterIP ocf:heartbeat:IPaddr2 \
    ip=192.168.1.100 cidr_netmask=24 op monitor interval=30s

pcs resource create FirewallService systemd:firewalld \
    op monitor interval=60s

While still maintained, Heartbeat:

  • Uses older UDP-based communication
  • Lacks Corosync's advanced features
  • Mainly exists for backward compatibility

For pure router/firewall failover:

  1. Simple VRRP needs: Keepalived is ideal
  2. Complex HA requirements: Corosync+Pacemaker
  3. Infiniband environments: Corosync is mandatory

Advanced firewall example with Pacemaker:

pcs resource create FW-Rule ocf:heartbeat:iptables \
    chain="INPUT" protocol="tcp" port="22" \
    action="accept" rule_file="/etc/iptables.rules" \
    op monitor interval="120s"

Common issues and solutions:

  • Split-brain scenarios: Implement proper fencing
  • VRRP priority conflicts: Ensure proper priority values
  • Corosync communication: Verify multicast/unicast settings

When building a Linux-based firewall failover solution, you're essentially choosing between two architectural approaches:

  • VRRP-based solutions (Keepalived)
  • Messaging-layer solutions (Corosync+Pacemaker)

Keepalived

Primarily implements VRRP (Virtual Router Redundancy Protocol) for IP failover:

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    virtual_ipaddress {
        192.168.1.100/24 dev eth0
    }
}

Corosync+Pacemaker

A more comprehensive cluster management solution:

# Sample corosync.conf
totem {
    version: 2
    cluster_name: my_firewall_cluster
    transport: udpu
    interface {
        ringnumber: 0
        bindnetaddr: 192.168.1.0
        mcastport: 5405
    }
}

For iptables/nftables failover, consider these approaches:

Keepalived Approach

vrrp_script chk_firewall {
    script "/usr/sbin/iptables -L >/dev/null 2>&1"
    interval 2
    weight 50
}

vrrp_instance VI_1 {
    track_script {
        chk_firewall
    }
    # ... rest of config
}

Pacemaker Approach

# crm configure
primitive p_firewall ocf:heartbeat:iptables \
    params chain="INPUT" policy="DROP" \
    op monitor interval="30s"

Key metrics for firewall failover:

Solution Failover Time Throughput Impact
Keepalived ~1-3s Low
Corosync ~500ms-2s Medium

Combining Keepalived with custom health checks:

vrrp_script chk_fw_rules {
    script "test $(iptables-save | wc -l) -gt 50"
    interval 5
    fall 2
    rise 2
    timeout 5
}