Understanding kswapd0 High IO Usage with Zero Disk Activity: PostgreSQL Performance Deep Dive


5 views

When monitoring a slow PostgreSQL query with iotop, you'll often encounter a puzzling scenario where kswapd0 shows 99.99% IO usage while displaying zero disk activity. This phenomenon reveals critical memory pressure issues:

# Typical iotop output showing the anomaly
TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
27 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [kswapd0]

kswapd0 is Linux's memory page reclaim daemon. The 99.99% IO usage indicates:

  • Intense memory scanning operations (not physical disk I/O)
  • Memory pressure triggering constant page evaluation
  • Potential thrashing where the system spends more time managing memory than executing processes

When your heavy query runs, PostgreSQL's memory behavior creates this situation:

# Check PostgreSQL's shared buffer usage
SELECT name, setting, unit FROM pg_settings 
WHERE name IN ('shared_buffers', 'work_mem', 'maintenance_work_mem');

# Monitor memory pressure
SELECT * FROM pg_stat_activity 
WHERE state = 'active' ORDER BY query_start;

Before investing in new hardware, verify these PostgreSQL configurations:

# Critical memory parameters to review
shared_buffers = 25% of available RAM (but not > 8GB)
effective_cache_size = 50-75% of total RAM
work_mem = (RAM - shared_buffers) / (max_connections * 3)

Try these immediate improvements before hardware upgrades:

-- Optimize the problematic query with EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM large_table WHERE complex_conditions;

-- Create targeted indexes
CREATE INDEX CONCURRENTLY idx_improvement 
ON large_table (critical_columns) 
WHERE frequently_used_conditions;

For hardware improvements, focus on:

  • Faster storage (NVMe) if you must swap
  • Higher RAM density modules if motherboard slots are full
  • Processor with better cache architecture

When analyzing PostgreSQL performance issues, I recently encountered a puzzling scenario where kswapd0 showed 99.99% IO utilization while reporting 0% disk read/write activity. Here's what's actually happening:

# Sample iotop output showing the anomaly
TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
27 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [kswapd0]

kswapd0 is the kernel's swap daemon that manages memory pages. The high IO percentage indicates it's actively managing memory, but the zero disk activity suggests:

  • The system is under memory pressure but not necessarily swapping to disk
  • kswapd is scanning memory pages aggressively (page reclaimation)
  • This creates CPU overhead rather than disk I/O bottlenecks

Before considering hardware upgrades, let's verify PostgreSQL's memory settings:

# Check current PostgreSQL memory configuration
SHOW shared_buffers;
SHOW work_mem;
SHOW maintenance_work_mem;
SHOW effective_cache_size;

# Linux memory pressure metrics
grep -E '^(SwapCached|SwapTotal|SwapFree|Committed_AS)' /proc/meminfo

Here are concrete steps to improve performance without immediate hardware upgrades:

# 1. Optimize PostgreSQL configuration (postgresql.conf)
shared_buffers = 25% of available RAM (but not > 8GB)
effective_cache_size = 50-75% of total RAM
work_mem = (2MB per core) for complex queries

# 2. Linux kernel parameters (sysctl.conf)
vm.swappiness = 1 (reduces aggressive swapping)
vm.vfs_cache_pressure = 50 (balanced cache reclaimation)

# 3. Monitor with better tools
sudo perf top -p $(pgrep kswapd0)
sudo bpftrace -e 'kprobe:shrink_page_list { @[comm] = count(); }'

Consider these metrics before upgrading:

  • If pg_stat_activity shows >30% time spent on I/O waits
  • When vmstat 1 shows sustained high si/so values
  • If query EXPLAIN ANALYZE shows excessive temp file usage

For complex queries, first try adding appropriate indexes:

-- Example index for common slow query pattern
CREATE INDEX CONCURRENTLY idx_orders_customer_date 
ON orders (customer_id, order_date DESC)
WHERE status = 'completed';

Often the biggest gains come from query restructuring. Here's an example of optimizing a slow aggregate query:

-- Before: Full table scan with heavy aggregation
EXPLAIN ANALYZE 
SELECT customer_id, SUM(amount) 
FROM transactions 
WHERE date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY customer_id;

-- After: Materialized view for pre-aggregated data
CREATE MATERIALIZED VIEW customer_yearly_totals AS
SELECT customer_id, SUM(amount) as yearly_total
FROM transactions
GROUP BY customer_id, DATE_TRUNC('year', date);

-- With refresh policy
CREATE UNIQUE INDEX idx_cust_year ON customer_yearly_totals (customer_id, DATE_TRUNC('year', date));
REFRESH MATERIALIZED VIEW CONCURRENTLY customer_yearly_totals;