MySQL Failover Guide: Promoting Slave to Master During Disaster Recovery

While MySQL master-slave replication is well-documented, the actual failover process remains surprisingly obscure in official documentation. Most tutorials stop at replication setup without addressing the crucial transition when disaster strikes.

Before executing failover, ensure your slave meets these requirements:

SHOW SLAVE STATUS\G
-- Verify Seconds_Behind_Master is 0 or acceptable
-- Check Slave_IO_Running and Slave_SQL_Running are both 'Yes'

1. Stop replication on the slave:

STOP SLAVE;
RESET SLAVE ALL;

2. Promote slave to master:

RESET MASTER;
-- For MySQL 8.0+ with GTIDs:
SET @@GLOBAL.read_only = OFF;
SET @@GLOBAL.super_read_only = OFF;

The most reliable approach involves DNS/network changes:

Update DNS TTL in advance (300 seconds recommended)
Assign master's IP to slave machine (if using static IPs)
Example Linux command: ifconfig eth0:0 MASTER_IP netmask 255.255.255.0 up

Critical step often overlooked - export and import user privileges:

-- On original master (before failure):
mysqldump --all-databases --routines --no-data > privileges.sql

-- On new master:
mysql < privileges.sql
FLUSH PRIVILEGES;

Implement connection retry logic in your application:

// Python example with retry
import mysql.connector
from mysql.connector import Error

def create_connection():
    retries = 3
    for attempt in range(retries):
        try:
            return mysql.connector.connect(
                host='master_host',
                user='app_user',
                password='password',
                database='app_db'
            )
        except Error as e:
            if attempt == retries - 1:
                raise
            time.sleep(2)

After promoting the slave:

Reconfigure remaining slaves to replicate from new master
Update monitoring systems with new master location
Document the failure scenario for future improvements

For production environments, consider:

MySQL Group Replication
Orchestrator for automated failover
ProxySQL for connection routing

html

Most MySQL replication tutorials stop at configuration, but the real challenge comes during failover. Here's what actually works in production environments when your master server becomes unavailable.

First, ensure your slave can assume the master's network identity:

# Example IP takeover (Linux)
sudo ifconfig eth0:0 MASTER_IP netmask 255.255.255.0 up
# Or using floating IP with cloud providers
aws ec2 assign-private-ip-addresses --network-interface-id eni-123456 \
    --private-ip-addresses MASTER_IP

Execute these commands on the slave to promote it:

STOP SLAVE;
RESET SLAVE ALL;
RESET MASTER;
SET GLOBAL read_only = OFF;
FLUSH PRIVILEGES;  # Critical for permission updates

Common permission issues after promotion:

# Check replication user permissions
SELECT * FROM mysql.user WHERE User='repl_user'\G

# Typical fix if permissions differ:
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'%' IDENTIFIED BY 'password';

For quick failover, save this as promote_slave.sh:

#!/bin/bash
# Fail if not running as root
[ "$(id -u)" -ne 0 ] && { echo "Root required"; exit 1; }

# Network config
ip addr add MASTER_IP/24 dev eth0 label eth0:0

# MySQL promotion
mysql -uroot -p$MYSQL_ROOT_PW <<-EOF
STOP SLAVE;
RESET SLAVE ALL;
RESET MASTER;
SET GLOBAL read_only=OFF;
FLUSH PRIVILEGES;
EOF

# Update app configuration
sed -i 's/old_master_ip/NEW_MASTER_IP/' /etc/myapp/config.ini
systemctl restart myapp-service

After promotion, verify with:

SHOW MASTER STATUS;
SELECT Host, User FROM mysql.user WHERE User LIKE 'repl%';
SHOW PROCESSLIST;

Update DNS TTLs in advance (critical!)
Configure monitoring on new master
Document the exact timeline of failure
Test connection pooling behavior (common issue)

For re-introducing the original master as a slave:

# On new master:
SHOW MASTER STATUS;
# On original master:
CHANGE MASTER TO
  MASTER_HOST='new_master_ip',
  MASTER_USER='repl_user',
  MASTER_PASSWORD='password',
  MASTER_LOG_FILE='mysql-bin.000XXX',
  MASTER_LOG_POS=XXX;
START SLAVE;

ServerDevWorker

MySQL Failover Guide: Promoting Slave to Master During Disaster Recovery

Related Articles