Optimizing MySQL for 192 Trillion Records: Indexing, Partitioning, and Alternative Storage Solutions


1 views

When dealing with 192 trillion records (or even 26 quadrillion in the alternative scenario), traditional database approaches hit fundamental limitations. The core issues become:

  • Index storage overhead (typically 10-30% of table size)
  • B-tree traversal depth at scale
  • Memory pressure for query execution plans

For your current schema, consider these adjustments:

-- Revised table structure with composite indexes
CREATE TABLE ref (
  id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
  rel_id BIGINT(20) UNSIGNED NOT NULL,
  p1 BIGINT(20) UNSIGNED NOT NULL,
  p2 BIGINT(20) UNSIGNED DEFAULT NULL,
  p3 BIGINT(20) UNSIGNED DEFAULT NULL,
  s BIGINT(20) UNSIGNED NOT NULL,
  p4 BIGINT(20) UNSIGNED DEFAULT NULL,
  p5 BIGINT(20) UNSIGNED DEFAULT NULL,
  p6 BIGINT(20) UNSIGNED DEFAULT NULL,
  PRIMARY KEY (id),
  KEY composite_lookup (rel_id,p3,p4,s),
  KEY covering_idx (id,rel_id,p1,p2,p3,p4,p5,p6)
) ENGINE=InnoDB
PARTITION BY HASH(id)
PARTITIONS 1024;

For the single-column bigint scenario, consider these alternatives:

// RocksDB configuration example
options.OptimizeForPointLookup(10);
options.IncreaseParallelism(16);
options.OptimizeLevelStyleCompaction();

rocksdb::DB* db;
rocksdb::Options options;
options.create_if_missing = true;
rocksdb::Status status = rocksdb::DB::Open(options, "/path/to/db", &db);

// Insertion
db->Put(rocksdb::WriteOptions(), key, "");

// Existence check
std::string value;
status = db->Get(rocksdb::ReadOptions(), key, &value);

Based on your access patterns:

  • Point lookups dominate (WHERE id=?)
  • Composite filtering occurs (rel_id + p3 + p4)
  • Bulk inserts happen in bursts

At this scale, hardware becomes part of the solution:

Component Recommendation
Storage NVMe SSDs in RAID 10
Memory 512GB+ for InnoDB buffer pool
CPU High clock speed (>3.5GHz) for single-threaded MySQL ops

For a phased approach:

  1. Benchmark with 1B records using different storage engines
  2. Implement partitioning based on access patterns
  3. Consider sharding if single-node performance limits are hit

When dealing with tables containing 192 trillion records, we're operating at a scale that requires rethinking conventional database approaches. The reference table structure shows a typical relational pattern:

CREATE TABLE ref (
  id INTEGER(13) AUTO_INCREMENT DEFAULT NOT NULL,
  rel_id INTEGER(13) NOT NULL,
  p1 INTEGER(13) NOT NULL,
  p2 INTEGER(13) DEFAULT NULL,
  p3 INTEGER(13) DEFAULT NULL,
  s INTEGER(13) NOT NULL,
  p4 INTEGER(13) DEFAULT NULL,
  p5 INTEGER(13) DEFAULT NULL,
  p6 INTEGER(13) DEFAULT NULL,
  PRIMARY KEY (id),
  KEY (s),
  KEY (rel_id),
  KEY (p3),
  KEY (p4)
);

The workload consists of three primary operations:

-- Frequent lookup (90%+ of operations)
SELECT id, s FROM ref WHERE red_id="$rel_id" AND p3="$p3" AND p4="$p4"

-- Primary key access
SELECT rel_id, p1, p2, p3, p4, p5, p6 FROM ref WHERE id="$id"

-- Batch inserts (occasional)
INSERT INTO rel (rel_id, p1, p2, p3, s, p4, p5, p6)
VALUES ("$rel_id", "$p1", "$p2", "$p3", "$s", "$p4", "$p5", "$p6")

At this scale, the default InnoDB storage engine may not be optimal. Consider these alternatives:

-- TokuDB for better compression and write performance
ALTER TABLE ref ENGINE=TokuDB;

-- Or MyRocks for SSD-optimized storage
SET default_storage_engine=rocksdb;
CREATE TABLE ref (...) ENGINE=ROCKSDB;

Horizontal partitioning is essential. For the first approach (192T records):

-- Shard by rel_id range
CREATE TABLE ref_0 LIKE ref;
CREATE TABLE ref_1 LIKE ref;
-- ... up to N shards

-- Application-level routing logic
function get_shard(rel_id) {
    return "ref_" + (rel_id % 100);
}

For the second approach (26Q records as BigInt keys):

-- Ultra-simple schema for existence checks
CREATE TABLE bigint_store (
    id BIGINT(20) UNSIGNED PRIMARY KEY
) ENGINE=ROCKSDB
PARTITION BY KEY(id) PARTITIONS 1024;

For pure existence checks, specialized systems may outperform MySQL:

// Redis with Bloom filter
redisClient.bf.reserve('exists', 0.01, 1000000000000);

// ClickHouse for analytical workloads
CREATE TABLE ref_uint64 (id UInt64) 
ENGINE = MergeTree() 
ORDER BY id;

Early benchmarks with 100M rows show promising results (0.0004s queries), but extrapolating to trillion-scale requires:

-- Test with representative data volumes
EXPLAIN ANALYZE 
SELECT id, s FROM ref 
WHERE red_id=12345 AND p3=678 AND p4=910;

-- Monitor with performance_schema
SELECT * FROM performance_schema.events_statements_summary_by_digest
WHERE digest_text LIKE '%WHERE red_id=%';