During routine monitoring of our PostgreSQL 9.1.2 database, we observed unusual behavior where postmaster processes were consuming 40-50% CPU per connection and writing approximately 15MB/s to disk - even during periods of minimal database activity (10s of inserts per minute). Performance monitoring tools showed:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17057 postgres 20 0 236m 33m 13m R 45.0 0.1 73:48.78 postmaster
17188 postgres 20 0 219m 15m 11m R 42.3 0.0 61:45.57 postmaster
Upon examining the database filesystem, we discovered thousands of small (8KB) temporary files in $PG_DATA/base/
directory being constantly rewritten:
-rw-------. 1 postgres postgres 8192 Jul 3 23:08 t12_1430975
-rw-------. 1 postgres postgres 8192 Jul 3 23:08 t16_1432736
-rw-------. 1 postgres postgres 8192 Jul 3 23:08 t28_1439066
Strace output revealed intensive file operations on these temporary table files:
open("base/16388/t24_1435947", O_RDWR) = 9
lseek(9, 0, SEEK_END) = 8192
ftruncate(9, 0) = 0
write(9, "\\0\\0\\0\\0\\0\\0\\0\\0\\1\\0\\0\\0000\\0\\360\\37\\360\\37\\4 \\0\\0\\0\\0b1\\5\\0\\2\\0\\0\\0"..., 8192) = 8192
The issue stemmed from our replication middleware creating temporary tables for each physical table in the database (approximately 50 tables) for every new connection (10-20 connections), using:
CREATE TEMPORARY TABLE... ON COMMIT DELETE ROWS;
This resulted in:
- 500-1000 temporary tables existing simultaneously
- Constant maintenance overhead from PostgreSQL
- Excessive filesystem operations on small 8KB files
We modified our approach to use regular tables with proper cleanup mechanisms:
-- Replace temporary tables with regular tables
CREATE TABLE replication_buffer_<table_id>_<connection_id> (
-- Same schema as original table
);
-- Implement cleanup procedure
CREATE OR REPLACE FUNCTION cleanup_replication_buffers(conn_id text) RETURNS void AS $$
BEGIN
EXECUTE 'DROP TABLE IF EXISTS replication_buffer_' || table_id || '_' || conn_id;
END;
$$ LANGUAGE plpgsql;
For systems requiring temporary tables, consider these optimizations:
-- 1. Create temporary tables in a separate tablespace on faster storage
CREATE TABLESPACE fast_temp LOCATION '/mnt/ssd/temp_tables';
CREATE TEMPORARY TABLE temp_table (...) TABLESPACE fast_temp;
-- 2. Adjust work_mem for better temporary table handling
SET work_mem = '16MB';
-- 3. Use UNLOGGED tables for better performance
CREATE UNLOGGED TABLE session_data_<id> (...);
After implementing regular tables:
- CPU usage dropped from ~40% per connection to <5%
- Disk I/O reduced from 15MB/s to negligible levels
- Connection establishment time improved by 80%
Key takeaways from this troubleshooting experience:
- Temporary tables in PostgreSQL have significant overhead when used in large quantities
- The ON COMMIT DELETE ROWS option triggers frequent filesystem operations
- Each PostgreSQL backend maintains its own set of temporary table files
- For replication buffers, regular tables with proper cleanup may be more efficient
For future implementations, we're evaluating PostgreSQL's logical decoding feature as a replacement for our custom replication solution.
During routine monitoring of our PostgreSQL 9.1.2 database, we noticed something peculiar: postmaster processes were consuming 40-50% CPU each and writing 15MB/s to disk - while the application was virtually idle (just 10 inserts per minute). The pg_stat_activity table showed all connections as idle, yet they were hammering the system.
# Sample from iotop showing disk writes
Total DISK READ: 0.00 B/s | Total DISK WRITE: 195.97 M/s
TID PRIO USER DISK WRITE COMMAND
17962 be/4 postgres 14.83 M/s postgres: aggw aggw [local] idle
17084 be/4 postgres 15.53 M/s postgres: aggw aggw [local] idle
17963 be/4 postgres 15.00 M/s postgres: aggw aggw [local] idle
Using strace, we discovered intense activity around files matching the pattern tNN_NNNNNNN
in the $PG_DATA/base/
directory. These 8KB temporary files were being constantly rewritten:
# Example strace output showing file operations
open("base/16388/t24_1435947", O_RDWR) = 9
lseek(9, 0, SEEK_END) = 8192
ftruncate(9, 0) = 0
write(9, "\0\0\0\0\0\0\0\0\1\0\0\0000\0\360\37\360\37\4 \0\0\0\0b1\5\0\2\0\0\0"..., 8192) = 8192
The root cause emerged when we examined our replication middleware's architecture. For each connection (10-20 typically), we were creating temporary tables for all ~50 database tables:
CREATE TEMPORARY TABLE repl_orders (
LIKE orders INCLUDING ALL
) ON COMMIT DELETE ROWS;
The implementation had worked well with MySQL but caused severe performance degradation in PostgreSQL due to:
- Massive metadata overhead per temp table
- File-per-table architecture in PostgreSQL
- ON COMMIT DELETE ROWS requiring constant truncation
We tested several solutions:
Option 1: Permanent tables with connection-specific names
CREATE TABLE repl_conn1_orders (
LIKE orders INCLUDING ALL
);
-- Cleanup on disconnect:
DROP TABLE repl_conn1_orders;
Option 2: UNLOGGED tables (PostgreSQL 9.1+)
CREATE UNLOGGED TABLE session_orders (
LIKE orders INCLUDING ALL
);
-- Doesn't generate WAL writes
Option 3: Schema-per-connection with search_path
CREATE SCHEMA connection_1;
SET search_path TO connection_1, public;
CREATE TABLE orders (LIKE public.orders INCLUDING ALL);
PostgreSQL temporary tables have specific performance characteristics:
- Each temp table creates physical files (unlike some other DBMS)
ON COMMIT DELETE ROWS
causes expensive truncation operations- Mass creation (50+ per connection) creates metadata overhead
For our replication middleware, we ultimately implemented a hybrid approach using UNLOGGED tables with connection-specific prefixes and a cleanup daemon.