When your MySQL replication suddenly stops with Slave_SQL_Running: No
and error 1396 on a CREATE USER statement, it typically indicates a synchronization conflict between master and slave databases. Here's what's happening under the hood:
SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Last_Errno: 1396
Last_Error: Error 'Operation CREATE USER failed for 'user'@'ip'' on query.
The error occurs because:
- The user already exists on the slave with different authentication parameters
- There's a privilege mismatch between master and slave
- The replication user lacks sufficient privileges to execute user management commands
First, skip the problematic transaction:
STOP SLAVE;
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;
For persistent solutions:
Always create users through replication:
-- On master:
CREATE USER 'repl_user'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'%';
-- On slave:
STOP SLAVE;
CHANGE MASTER TO MASTER_USER='repl_user', MASTER_PASSWORD='password';
START SLAVE;
When users must exist on both servers:
-- On slave:
DROP USER IF EXISTS 'problem_user'@'ip';
STOP SLAVE;
START SLAVE;
Add these to your my.cnf:
[mysqld]
slave-skip-errors = 1396
replicate-wild-ignore-table=mysql.%
Monitor replication status with:
SHOW SLAVE STATUS\G
SELECT * FROM performance_schema.replication_applier_status_by_worker;
For complex cases, examine the binary log:
SHOW BINLOG EVENTS IN 'mysql-bin.00000X' FROM [position] LIMIT 10;
mysqlbinlog --start-position=[pos] /var/lib/mysql/mysql-bin.00000X
When your MySQL replication breaks with Slave_SQL_Running: No
and error 1396 on a CREATE USER
statement, it typically indicates a synchronization issue between master and slave. Here's the anatomy of this specific failure:
SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Last_Errno: 1396
Last_Error: Error 'Operation CREATE USER failed for 'user'@'ip'' on query
The error occurs because:
- The user already exists on the slave with different credentials
- Privilege tables are out of sync
- There might be a replication filter preventing user creation
For immediate recovery (with understanding this is temporary):
STOP SLAVE;
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;
Warning: This only works if the next events in the binary log aren't dependent on this user creation.
To properly fix this without skipping events:
-- On master:
SELECT CONCAT('SHOW GRANTS FOR ''',user,'''@''',host,''';')
FROM mysql.user WHERE user='problem_user'\G
-- On slave:
DROP USER IF EXISTS 'user'@'ip';
-- Then apply the grants from master
Best practices for user management in replication:
-- Always create users with IF NOT EXISTS:
CREATE USER IF NOT EXISTS 'user'@'ip' IDENTIFIED BY 'password';
-- Or use mysql_native_password explicitly:
CREATE USER 'user'@'ip' IDENTIFIED WITH mysql_native_password AS '*hash';
-- Consider adding to your sync checks:
pt-table-checksum --replicate=percona.checksums --databases=mysql
If you're using replication filters, ensure they don't exclude mysql database:
replicate-ignore-db=mysql # This would cause our issue
Instead, use:
replicate-wild-ignore-table=mysql.% # More precise filtering
Set up alerts for these key metrics:
Seconds_Behind_Master > threshold
Slave_SQL_Running = No
Last_Errno != 0
Consider using tools like Percona PMM or Orchestrator for automatic failover scenarios.