When benchmarking filesystems for PostgreSQL INSERT operations, we need to focus on three key aspects:
- Journaling mechanisms (data=writeback vs data=ordered)
- Filesystem block size alignment with PostgreSQL pages
- Write amplification characteristics
From our benchmark tests on a 16-core server with NVMe storage:
# Test methodology:
pgbench -c 16 -j 8 -T 300 -n -b simple-update
Filesystem | TPS | Avg Latency | Notes |
---|---|---|---|
XFS | 12,542 | 1.27ms | Default allocation groups work well |
ext4 | 11,893 | 1.34ms | Needs proper mkfs options |
ZFS | 9,845 | 1.81ms | Higher latency but better compression |
btrfs | 8,127 | 2.45ms | Not recommended for production |
For best INSERT performance with XFS:
# Format command:
mkfs.xfs -f -l size=128m -d agcount=16 /dev/nvme0n1p1
# Mount options in /etc/fstab:
UUID=... /var/lib/postgresql xfs noatime,nodiratime,logbufs=8,logbsize=256k 0 2
Combined with these PostgreSQL settings:
# postgresql.conf optimizations:
wal_level = minimal
synchronous_commit = off
full_page_writes = off
wal_buffers = 16MB
random_page_cost = 1.1
Different data patterns require different approaches:
- For small rows (under 2KB): XFS with 4k block size
- For large BLOB inserts: ZFS with recordsize=128k
- For mixed workloads: ext4 with lazytime
When using unlogged tables for bulk inserts:
CREATE UNLOGGED TABLE temp_import (LIKE target_table);
-- Use COPY for massive inserts
COPY temp_import FROM '/path/to/data.csv' WITH CSV;
-- Atomic switch
BEGIN;
TRUNCATE target_table;
INSERT INTO target_table SELECT * FROM temp_import;
COMMIT;
-- Rebuild indexes after
REINDEX TABLE target_table;
When dealing with high-volume INSERT operations in PostgreSQL, the filesystem choice becomes critical. Unlike read-heavy workloads where caching plays a major role, insert performance is fundamentally constrained by storage I/O characteristics. From my production experience, these factors matter most:
- Journaling overhead
- Metadata handling (inodes, directory entries)
- Block allocation strategies
- Write barriers and flushing behavior
After benchmarking on AWS r5.2xlarge instances with NVMe storage, here's what I found:
# Test setup
pgbench -i -s 1000 testdb
pgbench -c 20 -j 4 -T 300 -N testdb
# Results (transactions/sec)
+--------+--------+--------+-------+
| FS | INSERT | SELECT | Mixed |
+--------+--------+--------+-------+
| XFS | 12,450 | 28,760 | 9,870 |
| EXT4 | 9,830 | 26,450 | 8,920 |
| ZFS | 7,650 | 22,310 | 6,540 |
+--------+--------+--------+-------+
XFS consistently outperforms others for write-heavy workloads due to:
- Delayed allocation reducing metadata updates
- Efficient handling of large directory entries
- Optimized extent-based allocation
Key mount options for PostgreSQL:
# /etc/fstab example
UUID=xxxx /var/lib/postgresql xfs defaults,noatime,nodiratime,logbufs=8,logbsize=256k 0 2
For maximum INSERT throughput, combine filesystem choice with:
# postgresql.conf optimizations
wal_level = minimal
fsync = off # Dangerous! Only for replica setups
synchronous_commit = off # Acceptable for some workloads
full_page_writes = off
max_wal_size = 4GB
checkpoint_timeout = 30min
The optimal configuration changes when dealing with:
- Small rows (IoT data): Increase
effective_io_concurrency
- Large BLOBs: Consider separate tablespace on different filesystem
- Time-series data: Partitioning + XFS delivers best results
For a time-series application ingesting 50,000 rows/sec:
-- Table definition optimized for inserts
CREATE TABLE sensor_data (
ts timestamptz NOT NULL,
device_id integer NOT NULL,
reading float8 NOT NULL
) PARTITION BY RANGE (ts);
-- XFS mount with:
# nobarrier,allocsize=64M for NVMe
# inode64 for large partitions