When your MySQL slave shows Slave_SQL_Running_State: System lock
with high Seconds_Behind_Master
(402263 in this case), it indicates the SQL thread is blocked while applying relay log events. The processlist shows two critical threads:
1. IO Thread: "Waiting for master to send event" (healthy state)
2. SQL Thread: "System lock" (problem state)
In MySQL 5.6 with InnoDB tables, this typically occurs due to:
- Disk I/O bottlenecks: Especially on EBS volumes with provisioned IOPS
- Single-threaded replication: All events are applied sequentially
- Long-running transactions: From the master being replayed
- Metadata locking: DDL operations causing contention
Run these while the issue is occurring:
-- Check for locking issues
SHOW ENGINE INNODB STATUS\\G
-- Examine replication delay details
SELECT * FROM performance_schema.replication_applier_status_by_worker;
-- Check for long-running transactions
SELECT * FROM information_schema.innodb_trx ORDER BY trx_started ASC LIMIT 10;
1. Optimize EBS Performance
For Amazon EC2 with EBS:
# Check current disk stats
iostat -dxm 5
# If %util is consistently > 80%, consider:
- Upgrade to gp3 volume type
- Increase provisioned IOPS
- Use RAID 0 across multiple EBS volumes
2. Tune MySQL Parameters
Add these to my.cnf:
[mysqld]
# Reduce disk I/O pressure
innodb_flush_log_at_trx_commit = 2
sync_binlog = 0
# Increase replication buffers
slave_parallel_workers = 4
slave_parallel_type = LOGICAL_CLOCK
slave_preserve_commit_order = 1
3. Handle Large Transactions
For batch operations on master:
-- On master, break into smaller chunks
START TRANSACTION;
DELETE FROM large_table WHERE id BETWEEN 1 AND 10000;
COMMIT;
-- Not
DELETE FROM large_table WHERE id < 1000000;
If the issue persists, examine the specific event causing the lock:
-- Find the current position
SHOW SLAVE STATUS\\G
-- Then examine the corresponding binlog event
mysqlbinlog --start-position=345413702 --stop-position=345413863 relay-log.000197
Look for DDL statements (ALTER TABLE) or large DML operations that might be blocking.
- Monitor
Seconds_Behind_Master
with CloudWatch/PMM - Set up alerts when delay exceeds thresholds
- Consider upgrading to MySQL 5.7+ for better parallel replication
- Schedule heavy writes during low-traffic periods
When monitoring MySQL replication, seeing Slave_SQL_Running_State: System lock
with significant delay (Seconds_Behind_Master: 402263
in this case) indicates a serious performance bottleneck. The slave is processing relay logs but spending excessive time waiting for internal locks.
From your SHOW SLAVE STATUS
output, several critical points emerge:
- All tables are InnoDB (good for transactional consistency)
- EBS storage on AWS EC2 (potential I/O bottleneck)
- Significant replication lag (402263 seconds)
- No immediate SQL errors reported
- Relay log space consumption is high (19834085375 bytes)
The "System lock" state typically occurs when:
- Single-threaded SQL applier gets blocked by large transactions
- Storage subsystem cannot keep up with write I/O
- Schema changes or DDL operations are being replicated
- InnoDB internal lock contention occurs
Immediate Actions
-- Check for long-running transactions:
SELECT * FROM information_schema.innodb_trx
WHERE trx_started < NOW() - INTERVAL 1 HOUR;
-- Monitor InnoDB status:
SHOW ENGINE INNODB STATUS\G
-- Consider temporarily stopping replication to investigate:
STOP SLAVE;
START SLAVE; -- After investigation
Configuration Optimizations
Add these to your my.cnf (adjust based on instance size):
[mysqld]
slave_parallel_workers = 4 # For MySQL 5.6+
slave_parallel_type = DATABASE
innodb_flush_log_at_trx_commit = 2 # For better I/O on slaves
innodb_buffer_pool_size = 12G # For r4.xlarge or larger
innodb_io_capacity = 2000 # For EBS optimized instances
innodb_read_io_threads = 16
innodb_write_io_threads = 16
Using Performance Schema (MySQL 5.6+)
-- Check lock waits:
SELECT * FROM performance_schema.events_waits_current
WHERE EVENT_NAME LIKE '%lock%';
-- Identify slow transactions:
SELECT * FROM performance_schema.events_statements_history_long
WHERE SQL_TEXT IS NOT NULL
ORDER BY TIMER_WAIT DESC LIMIT 10;
EBS-Specific Optimizations
For Amazon EBS volumes:
- Use Provisioned IOPS (io1) volumes for database storage
- Ensure EBS optimization is enabled for the EC2 instance
- Consider RAID 0 across multiple EBS volumes for better throughput
- Monitor EBS burst balance and IOPS usage
One production case showed similar symptoms where a slave was stuck in "System lock" for hours. The solution involved:
1. Identifying a batch DELETE operation in the relay log
2. Increasing innodb_buffer_pool_size from 4G to 16G
3. Setting slave_parallel_workers = 8
4. Migrating to gp2 volumes with 3000 baseline IOPS
This reduced replication lag from 6+ hours to under 5 minutes.
Implement these checks:
-- Create a lag monitoring query:
SELECT
NOW() AS current_time,
SEC_TO_TIME(TIMESTAMPDIFF(SECOND,
MIN(trx_started), NOW())) AS oldest_trx_age
FROM information_schema.innodb_trx;
-- Set up alerts for:
-- Seconds_Behind_Master > 300
-- Slave_SQL_Running_State = "System lock" for > 5 minutes
For persistent issues:
- Consider rebuilding the slave from a fresh snapshot
- Evaluate moving to RDS which handles replication more efficiently
- Implement delayed replication for critical systems