Building a truly synchronized Postfix mail server pair requires addressing several technical challenges beyond basic MX backup configuration. The key requirements include:
- Real-time mailstore synchronization (not just queue forwarding)
- Consistent state maintenance during failover
- Automatic reconciliation after primary server recovery
- DNS-based failover mechanism
The solution combines multiple technologies:
Primary Server (mail1.example.com) ──┬── DRBD (Block-level replication)
├── Corosync/Pacemaker (Cluster management)
└── Postfix (MTA)
Secondary Server (mail2.example.com) ──┬── DRBD (Replica)
├── Corosync/Pacemaker
└── Postfix
1. Mailstore Synchronization with DRBD
Configure DRBD for /var/mail replication:
resource mailstore {
protocol C;
device /dev/drbd0;
disk /dev/sdb1;
meta-disk internal;
on mail1.example.com {
address 192.168.1.10:7789;
}
on mail2.example.com {
address 192.168.1.20:7789;
}
}
2. Cluster Configuration with Pacemaker
Set up the failover cluster:
pcs cluster setup --name postfix-cluster mail1.example.com mail2.example.com
pcs cluster start --all
pcs property set stonith-enabled=false
pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.1.100 cidr_netmask=24 op monitor interval=30s
pcs resource create PostfixService systemd:postfix op monitor interval=60s
3. Postfix Configuration for Multi-Server Operation
Configure transport maps for synchronization:
/etc/postfix/transport:
* relay:[192.168.1.100]:10025
/etc/postfix/master.cf:
10025 inet n - n - - smtpd
-o content_filter=
-o mynetworks=192.168.1.0/24
-o receive_override_options=no_header_body_checks
Handling MX Records for Failover
Use DNS with low TTL values:
example.com. 300 IN MX 10 mail1.example.com.
example.com. 300 IN MX 20 mail2.example.com.
Monitoring and Alerting
Sample Nagios check for cluster status:
define command {
command_name check_pacemaker
command_line /usr/lib/nagios/plugins/check_by_ssh -H $HOSTADDRESS$ -C "crm_mon -1 | grep -q '2 nodes online' && echo OK || echo CRITICAL"
}
- Split-brain scenario: Monitor DRBD status with
drbdadm status
- Mail queue buildup: Configure
qmgr -c "flush deferred"
as postsuper - DNS propagation delays: Use 300-second TTL for MX records
Building a true HA Postfix system requires more than just MX backup configuration. The fundamental challenge lies in maintaining complete mailbox synchronization between primary and secondary servers, not just handling temporary mail queuing during outages.
We need a solution that combines:
- Real-time mailbox synchronization (Dovecot + rsync)
- Postfix queue replication
- Automatic failover with Postfix's backup MX
- DNS-based traffic redirection
1. Mailbox Synchronization with Dovecot and rsync
First, configure Dovecot to use Maildir format (required for proper synchronization):
# /etc/dovecot/conf.d/10-mail.conf
mail_location = maildir:~/Maildir
Then set up a cron job for regular synchronization:
#!/bin/bash
rsync -az --delete --rsh='ssh -p 22' /var/vmail/ backup-server:/var/vmail/
2. Postfix Queue Replication
Configure Postfix to use a shared queue directory (NFS or GlusterFS):
# /etc/postfix/main.cf
queue_directory = /mnt/postfix-queue
3. DNS Configuration
Set up proper MX records with different priorities:
example.com. IN MX 10 mail1.example.com.
example.com. IN MX 20 mail2.example.com.
4. Automatic Failover Script
Create a health check script to trigger DNS updates:
#!/bin/bash
if ! nc -z mail1.example.com 25; then
curl -X PUT "https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records/RECORD_ID" \
-H "Authorization: Bearer API_TOKEN" \
-H "Content-Type: application/json" \
--data '{"type":"MX","name":"example.com","content":"mail2.example.com","priority":10}'
fi
For IMAP synchronization, consider using Perdition:
# /etc/perdition/perdition.conf
server mail1.example.com {
protocol imap;
port 143;
}
server mail2.example.com {
protocol imap;
port 143;
backup_of mail1.example.com;
}
Implement monitoring with Nagios or Prometheus:
# Sample Prometheus alert rule
- alert: PostfixSyncDelay
expr: time() - postfix_last_sync_time_seconds > 300
for: 5m
labels:
severity: critical
annotations:
summary: "Postfix synchronization delayed on {{ $labels.instance }}"
Remember that complete real-time synchronization is challenging. Consider trade-offs between:
- Performance impact of constant synchronization
- Storage requirements for complete duplication
- Network bandwidth between locations
For most implementations, a combination of hourly rsync jobs with Postfix queue replication provides the best balance between reliability and performance.