Optimizing Slow MySQL Queries on Amazon RDS: Performance Comparison with Local Development Environment


2 views

When migrating from a local MySQL instance to Amazon RDS, many developers encounter unexpected performance degradation - especially with complex queries that ran smoothly on development machines. The case where a query executes in 200ms locally but takes 1300ms on a premium RDS instance (db.r5.8xlarge) is particularly puzzling.

Before diving into solutions, let's examine the critical factors that could cause this performance gap:

-- Example diagnostic queries to run on both environments
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
SHOW ENGINE INNODB STATUS;
EXPLAIN ANALYZE [your_problem_query];
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read%';

The physical distance between your application and RDS instance adds network overhead. Even with AWS's excellent infrastructure, this can account for 50-300ms depending on location. Test with:

-- Measure pure network latency
SELECT 1 FROM DUAL;

Despite having "the same" configuration, RDS has several important differences:

  • Default innodb_flush_neighbors=0 on RDS vs 1 locally
  • Different IOPS allocation for storage
  • EC2 instance vs dedicated hardware characteristics

1. Query Tuning

Rewrite problematic queries to use covering indexes:

-- Before
SELECT * FROM orders WHERE user_id = 100 AND status = 'completed';

-- After (with composite index on (user_id, status))
SELECT id, user_id, status FROM orders 
WHERE user_id = 100 AND status = 'completed';

2. RDS Parameter Groups

Adjust these critical parameters in your custom DB parameter group:

innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_flush_neighbors = 0
innodb_read_io_threads = 8
innodb_write_io_threads = 8

3. Storage Optimization

For IO-bound workloads, provisioned IOPS (io1) storage often performs better than gp2:

-- Check current throughput
SELECT * FROM sys.io_global_by_file_by_bytes 
WHERE file LIKE '%ibdata%' OR file LIKE '%ibd%';

When all else fails, consider these architectural changes:

  • Implement read replicas for reporting queries
  • Use Elasticache for query result caching
  • Migrate to Aurora MySQL for better concurrency

Establish performance baselines with these tools:

-- Enable Performance Schema
CALL mysql.rds_set_configuration('performance_schema',1);

-- Sample monitoring query
SELECT event_name, count_star, avg_timer_wait/1000000000 as avg_ms
FROM performance_schema.events_waits_summary_global_by_event_name
ORDER BY sum_timer_wait DESC LIMIT 10;

Recently I encountered a puzzling scenario where our production MySQL query that executes in 200ms on my local MacBook (16GB RAM, M1 Pro) takes 1300ms on Amazon RDS using db.m6g.8xlarge (32 vCPUs, 128GB RAM). Both environments:

  • Run MySQL 8.0.28
  • Have identical schema and indexes
  • Have query cache disabled
  • Contain the same dataset (~5GB)

After thorough investigation, several key factors emerged:


-- The problematic query (simplified):
SELECT o.order_id, c.customer_name, p.product_name 
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE o.status = 'pending'
ORDER BY o.created_at DESC
LIMIT 100;

Using EXPLAIN ANALYZE revealed:

Metric Local RDS
Execution Time ~200ms ~1300ms
Network Latency 0ms ~150ms (roundtrip)
IOPS Local SSD EBS gp3 (3000 IOPS baseline)

1. RDS Parameter Tuning:


-- Modified RDS parameter group:
innodb_buffer_pool_size = 12G  # Was 8G
innodb_io_capacity = 2000      # Was 300
innodb_flush_neighbors = 0     # For SSD storage

2. Query Refactoring:


-- Optimized version:
SELECT /*+ MAX_EXECUTION_TIME(500) */ 
  o.order_id, c.customer_name, p.product_name 
FROM orders o FORCE INDEX (idx_status_created)
STRAIGHT_JOIN customers c ON o.customer_id = c.customer_id
STRAIGHT_JOIN products p ON o.product_id = p.product_id
WHERE o.status = 'pending'
ORDER BY o.created_at DESC
LIMIT 100;

For latency-sensitive applications:

  • Consider RDS Proxy for connection pooling
  • Evaluate Aurora MySQL for better I/O performance
  • Implement read replicas for reporting queries
  • Use Elasticache for caching frequent queries

After optimizations:

Original RDS: 1300ms → Optimized: 350ms
Network overhead: 150ms → 50ms (using RDS Proxy)
IOPS utilization: 90% → 40%

The key lesson: Cloud databases require different tuning approaches than local development environments.