PostgreSQL Temporary Tables Performance Issue: High CPU and Disk I/O Under Low Workload


4 views

During routine monitoring of our PostgreSQL 9.1.2 database, we observed unusual behavior where postmaster processes were consuming 40-50% CPU per connection and writing approximately 15MB/s to disk - even during periods of minimal database activity (10s of inserts per minute). Performance monitoring tools showed:

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17057 postgres  20   0  236m  33m  13m R 45.0  0.1  73:48.78 postmaster
17188 postgres  20   0  219m  15m  11m R 42.3  0.0  61:45.57 postmaster

Upon examining the database filesystem, we discovered thousands of small (8KB) temporary files in $PG_DATA/base/ directory being constantly rewritten:

-rw-------. 1 postgres postgres 8192 Jul  3 23:08 t12_1430975
-rw-------. 1 postgres postgres 8192 Jul  3 23:08 t16_1432736
-rw-------. 1 postgres postgres 8192 Jul  3 23:08 t28_1439066

Strace output revealed intensive file operations on these temporary table files:

open("base/16388/t24_1435947", O_RDWR) = 9
lseek(9, 0, SEEK_END) = 8192
ftruncate(9, 0) = 0
write(9, "\\0\\0\\0\\0\\0\\0\\0\\0\\1\\0\\0\\0000\\0\\360\\37\\360\\37\\4 \\0\\0\\0\\0b1\\5\\0\\2\\0\\0\\0"..., 8192) = 8192

The issue stemmed from our replication middleware creating temporary tables for each physical table in the database (approximately 50 tables) for every new connection (10-20 connections), using:

CREATE TEMPORARY TABLE... ON COMMIT DELETE ROWS;

This resulted in:

  • 500-1000 temporary tables existing simultaneously
  • Constant maintenance overhead from PostgreSQL
  • Excessive filesystem operations on small 8KB files

We modified our approach to use regular tables with proper cleanup mechanisms:

-- Replace temporary tables with regular tables
CREATE TABLE replication_buffer_<table_id>_<connection_id> (
    -- Same schema as original table
);

-- Implement cleanup procedure
CREATE OR REPLACE FUNCTION cleanup_replication_buffers(conn_id text) RETURNS void AS $$
BEGIN
    EXECUTE 'DROP TABLE IF EXISTS replication_buffer_' || table_id || '_' || conn_id;
END;
$$ LANGUAGE plpgsql;

For systems requiring temporary tables, consider these optimizations:

-- 1. Create temporary tables in a separate tablespace on faster storage
CREATE TABLESPACE fast_temp LOCATION '/mnt/ssd/temp_tables';
CREATE TEMPORARY TABLE temp_table (...) TABLESPACE fast_temp;

-- 2. Adjust work_mem for better temporary table handling
SET work_mem = '16MB';

-- 3. Use UNLOGGED tables for better performance
CREATE UNLOGGED TABLE session_data_<id> (...);

After implementing regular tables:

  • CPU usage dropped from ~40% per connection to <5%
  • Disk I/O reduced from 15MB/s to negligible levels
  • Connection establishment time improved by 80%

Key takeaways from this troubleshooting experience:

  1. Temporary tables in PostgreSQL have significant overhead when used in large quantities
  2. The ON COMMIT DELETE ROWS option triggers frequent filesystem operations
  3. Each PostgreSQL backend maintains its own set of temporary table files
  4. For replication buffers, regular tables with proper cleanup may be more efficient

For future implementations, we're evaluating PostgreSQL's logical decoding feature as a replacement for our custom replication solution.


During routine monitoring of our PostgreSQL 9.1.2 database, we noticed something peculiar: postmaster processes were consuming 40-50% CPU each and writing 15MB/s to disk - while the application was virtually idle (just 10 inserts per minute). The pg_stat_activity table showed all connections as idle, yet they were hammering the system.


# Sample from iotop showing disk writes
Total DISK READ: 0.00 B/s | Total DISK WRITE: 195.97 M/s
TID  PRIO  USER     DISK WRITE      COMMAND                                                                                                                        
17962 be/4 postgres   14.83 M/s  postgres: aggw aggw [local] idle
17084 be/4 postgres   15.53 M/s  postgres: aggw aggw [local] idle
17963 be/4 postgres   15.00 M/s  postgres: aggw aggw [local] idle

Using strace, we discovered intense activity around files matching the pattern tNN_NNNNNNN in the $PG_DATA/base/ directory. These 8KB temporary files were being constantly rewritten:


# Example strace output showing file operations
open("base/16388/t24_1435947", O_RDWR) = 9
lseek(9, 0, SEEK_END) = 8192
ftruncate(9, 0) = 0
write(9, "\0\0\0\0\0\0\0\0\1\0\0\0000\0\360\37\360\37\4 \0\0\0\0b1\5\0\2\0\0\0"..., 8192) = 8192

The root cause emerged when we examined our replication middleware's architecture. For each connection (10-20 typically), we were creating temporary tables for all ~50 database tables:


CREATE TEMPORARY TABLE repl_orders (
    LIKE orders INCLUDING ALL
) ON COMMIT DELETE ROWS;

The implementation had worked well with MySQL but caused severe performance degradation in PostgreSQL due to:

  • Massive metadata overhead per temp table
  • File-per-table architecture in PostgreSQL
  • ON COMMIT DELETE ROWS requiring constant truncation

We tested several solutions:

Option 1: Permanent tables with connection-specific names


CREATE TABLE repl_conn1_orders (
    LIKE orders INCLUDING ALL
);
-- Cleanup on disconnect:
DROP TABLE repl_conn1_orders;

Option 2: UNLOGGED tables (PostgreSQL 9.1+)


CREATE UNLOGGED TABLE session_orders (
    LIKE orders INCLUDING ALL
);
-- Doesn't generate WAL writes

Option 3: Schema-per-connection with search_path


CREATE SCHEMA connection_1;
SET search_path TO connection_1, public;
CREATE TABLE orders (LIKE public.orders INCLUDING ALL);

PostgreSQL temporary tables have specific performance characteristics:

  1. Each temp table creates physical files (unlike some other DBMS)
  2. ON COMMIT DELETE ROWS causes expensive truncation operations
  3. Mass creation (50+ per connection) creates metadata overhead

For our replication middleware, we ultimately implemented a hybrid approach using UNLOGGED tables with connection-specific prefixes and a cleanup daemon.