Implementing High Availability Postfix Mail Servers with Real-Time Synchronization and Failover


4 views

Building a truly synchronized Postfix mail server pair requires addressing several technical challenges beyond basic MX backup configuration. The key requirements include:

  • Real-time mailstore synchronization (not just queue forwarding)
  • Consistent state maintenance during failover
  • Automatic reconciliation after primary server recovery
  • DNS-based failover mechanism

The solution combines multiple technologies:

Primary Server (mail1.example.com) ──┬── DRBD (Block-level replication)
                                     ├── Corosync/Pacemaker (Cluster management)
                                     └── Postfix (MTA)
                                     
Secondary Server (mail2.example.com) ──┬── DRBD (Replica)
                                       ├── Corosync/Pacemaker
                                       └── Postfix

1. Mailstore Synchronization with DRBD

Configure DRBD for /var/mail replication:

resource mailstore {
  protocol C;
  device /dev/drbd0;
  disk /dev/sdb1;
  meta-disk internal;
  
  on mail1.example.com {
    address 192.168.1.10:7789;
  }
  
  on mail2.example.com {
    address 192.168.1.20:7789;
  }
}

2. Cluster Configuration with Pacemaker

Set up the failover cluster:

pcs cluster setup --name postfix-cluster mail1.example.com mail2.example.com
pcs cluster start --all
pcs property set stonith-enabled=false
pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.1.100 cidr_netmask=24 op monitor interval=30s
pcs resource create PostfixService systemd:postfix op monitor interval=60s

3. Postfix Configuration for Multi-Server Operation

Configure transport maps for synchronization:

/etc/postfix/transport:
*   relay:[192.168.1.100]:10025

/etc/postfix/master.cf:
10025      inet  n       -       n       -       -       smtpd
  -o content_filter=
  -o mynetworks=192.168.1.0/24
  -o receive_override_options=no_header_body_checks

Handling MX Records for Failover

Use DNS with low TTL values:

example.com.   300 IN MX 10 mail1.example.com.
example.com.   300 IN MX 20 mail2.example.com.

Monitoring and Alerting

Sample Nagios check for cluster status:

define command {
  command_name    check_pacemaker
  command_line    /usr/lib/nagios/plugins/check_by_ssh -H $HOSTADDRESS$ -C "crm_mon -1 | grep -q '2 nodes online' && echo OK || echo CRITICAL"
}
  • Split-brain scenario: Monitor DRBD status with drbdadm status
  • Mail queue buildup: Configure qmgr -c "flush deferred" as postsuper
  • DNS propagation delays: Use 300-second TTL for MX records

Building a true HA Postfix system requires more than just MX backup configuration. The fundamental challenge lies in maintaining complete mailbox synchronization between primary and secondary servers, not just handling temporary mail queuing during outages.

We need a solution that combines:

  • Real-time mailbox synchronization (Dovecot + rsync)
  • Postfix queue replication
  • Automatic failover with Postfix's backup MX
  • DNS-based traffic redirection

1. Mailbox Synchronization with Dovecot and rsync

First, configure Dovecot to use Maildir format (required for proper synchronization):

# /etc/dovecot/conf.d/10-mail.conf
mail_location = maildir:~/Maildir

Then set up a cron job for regular synchronization:

#!/bin/bash
rsync -az --delete --rsh='ssh -p 22' /var/vmail/ backup-server:/var/vmail/

2. Postfix Queue Replication

Configure Postfix to use a shared queue directory (NFS or GlusterFS):

# /etc/postfix/main.cf
queue_directory = /mnt/postfix-queue

3. DNS Configuration

Set up proper MX records with different priorities:

example.com.    IN  MX  10 mail1.example.com.
example.com.    IN  MX  20 mail2.example.com.

4. Automatic Failover Script

Create a health check script to trigger DNS updates:

#!/bin/bash
if ! nc -z mail1.example.com 25; then
  curl -X PUT "https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records/RECORD_ID" \
  -H "Authorization: Bearer API_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{"type":"MX","name":"example.com","content":"mail2.example.com","priority":10}'
fi

For IMAP synchronization, consider using Perdition:

# /etc/perdition/perdition.conf
server mail1.example.com {
  protocol imap;
  port 143;
}

server mail2.example.com {
  protocol imap;
  port 143;
  backup_of mail1.example.com;
}

Implement monitoring with Nagios or Prometheus:

# Sample Prometheus alert rule
- alert: PostfixSyncDelay
  expr: time() - postfix_last_sync_time_seconds > 300
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Postfix synchronization delayed on {{ $labels.instance }}"

Remember that complete real-time synchronization is challenging. Consider trade-offs between:

  • Performance impact of constant synchronization
  • Storage requirements for complete duplication
  • Network bandwidth between locations

For most implementations, a combination of hourly rsync jobs with Postfix queue replication provides the best balance between reliability and performance.