Diagnosing and Mitigating Redis CPU Spikes: Advanced Troubleshooting and Resource Limitation Techniques


2 views

When Redis consistently hits high CPU utilization (100%+), it typically indicates one of these scenarios:

  1. Complex operations blocking the event loop (KEYS *, long-running Lua scripts)
  2. Excessive client connections overwhelming single-threaded processing
  3. Frequent persistence operations (BGSAVE/AOF rewrite)
  4. Memory pressure leading to swap usage

First, enable proper logging in redis.conf:

loglevel verbose
logfile /var/log/redis/redis-server.log

Essential diagnostic commands when Redis is responsive:

redis-cli --latency-history
redis-cli info cpu
redis-cli slowlog get 25
redis-cli client list

While Redis doesn't natively support CPU throttling, these OS-level solutions work:

Using cgroups (Linux):

# Create cgroup
sudo cgcreate -g cpu:/redis-limited

# Set CPU limit (50% of single core)
echo 50000 > /sys/fs/cgroup/cpu/redis-limited/cpu.cfs_quota_us
echo 100000 > /sys/fs/cgroup/cpu/redis-limited/cpu.cfs_period_us

# Apply to Redis process
sudo cgclassify -g cpu:redis-limited $(pgrep redis-server)

Using cpulimit:

sudo cpulimit -l 50 -p $(pgrep redis-server)

For queue systems experiencing periodic bursts:

# Implement client-side backoff when Redis is busy
def push_to_queue(item):
    try:
        r.rpush('queue', item)
    except redis.exceptions.BusyLoadingError:
        time.sleep(random.uniform(0.1, 0.5))
        push_to_queue(item)

Configure the Redis exporter for Prometheus:

# Sample Prometheus alert rule
- alert: RedisHighCPU
  expr: rate(redis_cpu_sys_seconds_total[1m]) > 0.9
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Redis CPU usage critically high on {{ $labels.instance }}"

For systems requiring strict CPU control:

# Example using Redis Streams with capped processing
while True:
    # Process max 100 items per iteration
    items = r.xread(count=100, block=5000, streams={'workstream': '$'})
    if not items:
        continue
    process_batch(items[0][1])

When Redis acts as a queuing agent in mission-critical systems, periodic CPU spikes (often exceeding 100% utilization) typically indicate either:

  • Queue processing bottlenecks
  • Inefficient Lua scripts
  • Keyspace scanning operations
  • Memory pressure triggering eviction policies

First configure proper logging to capture diagnostic data:

# /etc/redis/redis.conf
loglevel debug
logfile /var/log/redis/redis-debug.log

Then force a log rotation and check the output:

redis-cli config set loglevel debug
redis-cli config rewrite
sudo systemctl restart redis-server
tail -f /var/log/redis/redis-debug.log

Use these Redis CLI commands during high CPU events:

# Check slow logs (adjust threshold in ms)
redis-cli slowlog get 10

# Monitor current commands
redis-cli --stat
redis-cli --latency

# Memory analysis
redis-cli info memory | grep -E 'used_memory|maxmemory|evicted'

For queue-heavy workloads, implement these safeguards:

Option A: Cgroup-based throttling

# Create CPU limit via cgroups
sudo cgcreate -g cpu:/redis-throttled
echo 50000 > /sys/fs/cgroup/cpu/redis-throttled/cpu.cfs_quota_us
echo $REDIS_PID > /sys/fs/cgroup/cpu/redis-throttled/tasks

Option B: Redis configuration tuning

# redis.conf adjustments
maxmemory-policy allkeys-lru
lua-time-limit 500
slowlog-log-slower-than 10000

For queue processing workloads:

# Instead of naive LPOP/RPUSH:
while True:
    # Batch process 100 items at once
    items = redis.lrange('workqueue', 0, 99)
    if not items:
        break
    process_batch(items)
    redis.ltrim('workqueue', 100, -1)

# Enable pipelining for bulk inserts
pipe = redis.pipeline()
for item in new_items:
    pipe.rpush('workqueue', item)
pipe.execute()

Create a monitoring script to detect anomalies:

#!/bin/bash
CPU_THRESHOLD=90
ALERT_EMAIL="admin@example.com"

cpu_usage=$(top -bn1 | grep redis-server | awk '{print $9}')

if (( $(echo "$cpu_usage > $CPU_THRESHOLD" | bc -l) )); then
    echo "Redis CPU Alert: $cpu_usage%" | mail -s "Redis Performance Alert" $ALERT_EMAIL
    # Capture diagnostics
    redis-cli info > /tmp/redis_emergency_info.txt
    redis-cli slowlog get > /tmp/redis_slowlog.txt
fi