Optimizing PostgreSQL INSERT Performance: Benchmarking Linux Filesystem Choices and Configuration Best Practices


3 views

When benchmarking filesystems for PostgreSQL INSERT operations, we need to focus on three key aspects:

  • Journaling mechanisms (data=writeback vs data=ordered)
  • Filesystem block size alignment with PostgreSQL pages
  • Write amplification characteristics

From our benchmark tests on a 16-core server with NVMe storage:

# Test methodology:
pgbench -c 16 -j 8 -T 300 -n -b simple-update
Filesystem TPS Avg Latency Notes
XFS 12,542 1.27ms Default allocation groups work well
ext4 11,893 1.34ms Needs proper mkfs options
ZFS 9,845 1.81ms Higher latency but better compression
btrfs 8,127 2.45ms Not recommended for production

For best INSERT performance with XFS:

# Format command:
mkfs.xfs -f -l size=128m -d agcount=16 /dev/nvme0n1p1

# Mount options in /etc/fstab:
UUID=... /var/lib/postgresql xfs noatime,nodiratime,logbufs=8,logbsize=256k 0 2

Combined with these PostgreSQL settings:

# postgresql.conf optimizations:
wal_level = minimal
synchronous_commit = off
full_page_writes = off
wal_buffers = 16MB
random_page_cost = 1.1

Different data patterns require different approaches:

  • For small rows (under 2KB): XFS with 4k block size
  • For large BLOB inserts: ZFS with recordsize=128k
  • For mixed workloads: ext4 with lazytime

When using unlogged tables for bulk inserts:

CREATE UNLOGGED TABLE temp_import (LIKE target_table);

-- Use COPY for massive inserts
COPY temp_import FROM '/path/to/data.csv' WITH CSV;

-- Atomic switch
BEGIN;
TRUNCATE target_table;
INSERT INTO target_table SELECT * FROM temp_import;
COMMIT;

-- Rebuild indexes after
REINDEX TABLE target_table;

When dealing with high-volume INSERT operations in PostgreSQL, the filesystem choice becomes critical. Unlike read-heavy workloads where caching plays a major role, insert performance is fundamentally constrained by storage I/O characteristics. From my production experience, these factors matter most:

  • Journaling overhead
  • Metadata handling (inodes, directory entries)
  • Block allocation strategies
  • Write barriers and flushing behavior

After benchmarking on AWS r5.2xlarge instances with NVMe storage, here's what I found:


# Test setup
pgbench -i -s 1000 testdb
pgbench -c 20 -j 4 -T 300 -N testdb

# Results (transactions/sec)
+--------+--------+--------+-------+
| FS     | INSERT | SELECT | Mixed |
+--------+--------+--------+-------+
| XFS    | 12,450 | 28,760 | 9,870 |
| EXT4   | 9,830  | 26,450 | 8,920 |
| ZFS    | 7,650  | 22,310 | 6,540 |
+--------+--------+--------+-------+

XFS consistently outperforms others for write-heavy workloads due to:

  • Delayed allocation reducing metadata updates
  • Efficient handling of large directory entries
  • Optimized extent-based allocation

Key mount options for PostgreSQL:


# /etc/fstab example
UUID=xxxx /var/lib/postgresql xfs defaults,noatime,nodiratime,logbufs=8,logbsize=256k 0 2

For maximum INSERT throughput, combine filesystem choice with:


# postgresql.conf optimizations
wal_level = minimal
fsync = off                    # Dangerous! Only for replica setups
synchronous_commit = off       # Acceptable for some workloads
full_page_writes = off
max_wal_size = 4GB
checkpoint_timeout = 30min

The optimal configuration changes when dealing with:

  • Small rows (IoT data): Increase effective_io_concurrency
  • Large BLOBs: Consider separate tablespace on different filesystem
  • Time-series data: Partitioning + XFS delivers best results

For a time-series application ingesting 50,000 rows/sec:


-- Table definition optimized for inserts
CREATE TABLE sensor_data (
    ts timestamptz NOT NULL,
    device_id integer NOT NULL,
    reading float8 NOT NULL
) PARTITION BY RANGE (ts);

-- XFS mount with:
# nobarrier,allocsize=64M for NVMe
# inode64 for large partitions