High IO Wait on EXT4 JDB2 During MySQL Commits in Python Multiprocessing Indexer – Optimization Guide


3 views

When implementing a document indexer with Python multiprocessing (4 parallel workers), we hit a severe performance wall. Each worker:

def process_document(doc):
    text = extract_text(doc)
    cursor.execute("INSERT INTO documents VALUES (...)")
    connection.commit()  # THIS is where hell breaks loose

System monitoring shows jbd2 (EXT4 journaling daemon) pegged at 99.9% IO, forcing CPU stalls during every MySQL commit operation.

EXT4's journal ensures filesystem consistency but creates significant overhead for:

  • Small, frequent transactions (exactly what we're doing)
  • Metadata-heavy operations (database commits qualify)
  • Concurrent writers (our 4 processes)

The nuclear option (barrier=0) completely disables:

/etc/fstab
UUID=... / ext4 defaults,barrier=0,noatime 0 1

DANGER: Without UPS, this risks filesystem corruption during power loss. With UPS? Probably safe.

Before going nuclear, try these:

1. Batch Commits

def worker(docs):
    for i, doc in enumerate(docs):
        cursor.execute(...)
        if i % 100 == 0:  # Commit every 100 docs
            connection.commit()

2. Tune Journal Settings

# Reduce journal commit interval (default 5s)
echo 100 > /proc/sys/fs/jbd2/commit_timeout

# Limit journal size
tune2fs -J size=1024 /dev/sdX

3. Filesystem Alternatives

For read-heavy workloads with frequent small writes:

mkfs.xfs -f -l size=1024m /dev/sdX

When using InnoDB:

[mysqld]
innodb_flush_log_at_trx_commit = 2  # Trade durability for speed
innodb_doublewrite = 0             # If you're REALLY brave

Verify improvements with:

iostat -x 1
# Look for %util & await on your data disk

When scaling up our Python document indexer from single-process to multiprocessing (4 parallel workers), we encountered severe performance degradation during MySQL commits. The surprising culprit? EXT4's journaling daemon (jbd2) maxing out at 99% IO wait, causing CPU stalls.

Here's what happens under the hood:

# Python worker pseudocode
def process_document(doc):
    text = extract_text(doc)
    cursor.execute("INSERT INTO documents VALUES (...)")
    connection.commit()  # ← IO storm begins

Each commit forces:

  1. MySQL's innodb_flush_log_at_trx_commit=1 (default) ensures ACID compliance
  2. EXT4 journal writes metadata twice (before+after data write)
  3. fsync() calls guarantee durability

Before making filesystem changes, verify with:

# Check IO wait distribution
iotop -oP

# Monitor jbd2 specifically
watch -n 1 'grep -E "jbd2" /proc/diskstats'

# Measure actual commit latency
import time
start = time.time()
conn.commit()
print(f"Commit took {time.time()-start:.4f}s")

Option 1: Filesystem Mount Tweaks

Add these to /etc/fstab (requires reboot):

/dev/sda1  /  ext4  defaults,barrier=0,data=writeback  0  1

Tradeoffs:

  • barrier=0: Disables write barriers (OK with UPS)
  • data=writeback: Journal only metadata
  • Risk: Potential corruption on power loss

Option 2: MySQL Configuration

[mysqld]
innodb_flush_log_at_trx_commit=2  # Don't flush every transaction
innodb_flush_method=O_DIRECT      # Bypass OS cache
innodb_doublewrite=0              # Careful! Disables crash safety

Option 3: Batch Processing Pattern

Our final implementation reduced commits by 90%:

from queue import Queue

class BatchWriter:
    def __init__(self, batch_size=1000):
        self.queue = Queue()
        self.batch_size = batch_size
    
    def add_document(self, doc):
        self.queue.put(doc)
        if self.queue.qsize() >= self.batch_size:
            self.flush()
    
    def flush(self):
        with connection.cursor() as cur:
            while not self.queue.empty():
                doc = self.queue.get()
                cur.execute("INSERT ...", doc)
            connection.commit()
Approach Documents/sec jbd2 IO%
Original 42 99%
Barrier=0 210 75%
Batching 380 15%