Optimizing MySQL InnoDB Performance for High-Volume Insert Operations During Long-Running SELECT Queries


4 views

When dealing with a 300M+ row InnoDB table experiencing concurrent inserts and long-running SELECT queries, we're seeing transaction isolation characteristics in action. The processlist shows multiple transactions stuck in COMMIT state while one large query (ID 40) executes:

SELECT * FROM large_table 
WHERE (large_table.hotspot_id = 3000064)  
ORDER BY discovered_at 
LIMIT 799000, 1000

Contrary to MyISAM, InnoDB implements MVCC (Multi-Version Concurrency Control). However, several factors can still cause blocking:

  • Transaction Isolation Level: REPEATABLE READ (InnoDB default) creates read views that must be maintained
  • Undo Log Retention: Long-running queries prevent purge operations
  • Index Scans: The ORDER BY operation forces a filesort with large offsets

For your specific case with hotspot_id filtering, consider these optimizations:

-- Create optimized composite index
ALTER TABLE large_table 
ADD INDEX idx_hotspot_discovered (hotspot_id, discovered_at);

-- Rewrite the query to use indexed seek
SELECT * FROM large_table 
FORCE INDEX (idx_hotspot_discovered)
WHERE hotspot_id = 3000064
ORDER BY discovered_at
LIMIT 1000 OFFSET 799000;

For a write-heavy table with analytical queries:

  1. Read Replicas: Offload SELECT queries to replicas
  2. Partitioning: Range-partition by discovered_at
  3. Queue Processing: Buffer inserts using Redis/Kafka

While MyISAM seems tempting for its simpler locking model, consider:

Factor InnoDB MyISAM
Concurrent Inserts Possible with row-level locks Table locks during SELECT
Crash Safety ACID compliant Prone to corruption
Full-Text Search MySQL 5.6+ supported Historically better

Key my.cnf adjustments for high-volume inserts:

[mysqld]
innodb_buffer_pool_size = 12G  # 70-80% of available RAM
innodb_io_capacity = 2000      # For SSD storage
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_purge_threads = 4       # Helps with undo log cleanup
innodb_flush_neighbors = 0     # Disable for SSDs
innodb_flush_log_at_trx_commit = 2 # Trade durability for speed

For pagination queries, consider keyset pagination instead:

-- First page
SELECT * FROM large_table
WHERE hotspot_id = 3000064
ORDER BY discovered_at
LIMIT 1000;

-- Subsequent pages (using last seen value)
SELECT * FROM large_table
WHERE hotspot_id = 3000064 
AND discovered_at > '2023-01-01 12:00:00'
ORDER BY discovered_at
LIMIT 1000;

When working with large InnoDB tables (300M+ rows in this case), we expect SELECT queries to coexist peacefully with INSERT operations due to InnoDB's MVCC architecture. However, the processlist shows multiple COMMIT operations stuck waiting while a large SELECT executes:

|  40 | user | localhost | dbname | Query |   33 | Sorting result | 
SELECT * FROM large_table WHERE hotspot_id = 3000064 ORDER BY discovered_at LIMIT 799000, 1000

The InnoDB status reveals several critical indicators:

SEMAPHORES:
RW-shared spins 2119697, OS waits 1037149
Modified db pages 1439
Buffer pool hit rate 1000 / 1000

Immediate Workarounds

For the specific query shown, implement pagination optimization:

-- Instead of:
SELECT * FROM large_table WHERE hotspot_id = 3000064 ORDER BY discovered_at LIMIT 799000, 1000

-- Use keyset pagination:
SELECT * FROM large_table 
WHERE hotspot_id = 3000064 AND discovered_at > '2023-01-01 00:00:00'
ORDER BY discovered_at LIMIT 1000

Architectural Changes

Consider these schema modifications for the large table:

-- Add covering index
ALTER TABLE large_table ADD INDEX idx_cover (hotspot_id, discovered_at, device_address);

-- Partition by hotspot_id
ALTER TABLE large_table PARTITION BY RANGE (hotspot_id) (
    PARTITION p0 VALUES LESS THAN (1000000),
    PARTITION p1 VALUES LESS THAN (2000000),
    PARTITION p2 VALUES LESS THAN (3000000),
    PARTITION pmax VALUES LESS THAN MAXVALUE
);

Critical my.cnf adjustments for heavy-write workloads:

[mysqld]
innodb_buffer_pool_size = 12G         # 50-70% of available RAM
innodb_io_capacity = 2000             # For SSD storage
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_flush_neighbors = 0            # Disable for SSD
innodb_purge_threads = 4
innodb_adaptive_hash_index = OFF      # For large tables

While MyISAM seems tempting, consider these tradeoffs:

-- MyISAM creates table-level locks during:
--   SELECT + INSERT scenarios
--   Heavy read workloads with concurrent writes

-- TokuDB alternative (if available):
ALTER TABLE large_table ENGINE=TokuDB;
SET GLOBAL tokudb_cache_size=4G;

Essential queries for diagnosing similar issues:

-- Check for lock waits
SELECT * FROM performance_schema.events_waits_current 
WHERE EVENT_NAME LIKE '%innodb%lock%';

-- Monitor InnoDB history list length
SHOW ENGINE INNODB STATUS\G