During routine benchmarking of a PostgreSQL 9.1 installation on Ubuntu 12.10 with software RAID 1 (mdadm) configuration, we observed unexpectedly high COMMIT latency (22ms avg) compared to a local development machine (0.4ms avg). The test case involved simple single-row INSERT transactions:
BEGIN;
INSERT INTO test (foo) VALUES ('bar');
COMMIT; -- This was the bottleneck
The pg_test_fsync
utility revealed significant performance differences:
Server (RAID 1):
fsync: 30.524 ops/sec (≈32.7ms per operation)
fdatasync: 11.920 ops/sec (≈83.9ms per operation)
Local (Single Disk):
fsync: 34.593 ops/sec (≈28.9ms per operation)
fdatasync: 68.871 ops/sec (≈14.5ms per operation)
Key hardware specifications:
- Server: Dual 2TB SATA (Seagate ST3000DM001) in mdadm RAID 1
- Local: Single consumer-grade SATA disk
- Filesystem: ext4 with default options
The mdadm configuration showed proper alignment but potential write barriers:
/dev/md2:
Version : 1.2
Raid Level : raid1
Array Size : 2917156159 (2782.02 GiB)
Raid Devices : 2
State : clean
Partition alignment verification:
sudo parted /dev/sdb unit s print
Sector size (logical/physical): 512B/4096B
Partition Start: 26218496s (properly aligned)
For ext4 on RAID configurations:
# /etc/fstab options for PostgreSQL WAL:
/dev/md2 /var/lib/postgresql ext4 defaults,noatime,nodiratime,data=writeback,barrier=0 0 1
# Recommended mkfs options if reformatting:
mkfs.ext4 -E stride=16,stripe-width=32 /dev/md2
Critical postgresql.conf parameters for commit performance:
# WAL settings
wal_level = minimal
synchronous_commit = off # For bulk loads, consider local fsync
wal_sync_method = fdatasync # Test alternatives
wal_buffers = 16MB
commit_delay = 10000 # Microseconds
commit_siblings = 5
# Storage tuning
random_page_cost = 2.0 # Lower for RAID
effective_io_concurrency = 200 # For RAID
For write-heavy workloads:
-- Consider batched transactions
BEGIN;
INSERT INTO test (foo) VALUES ('bar1');
INSERT INTO test (foo) VALUES ('bar2');
...
INSERT INTO test (foo) VALUES ('bar100');
COMMIT; # Single commit overhead
-- Or use UNLOGGED tables for temporary data
CREATE UNLOGGED TABLE temp_data (...);
When RAID isn't enough:
- Test with HW RAID controller with battery-backed cache
- Consider separate WAL disk (SSD recommended)
- Evaluate ZFS with dedicated SLOG device
When benchmarking simple INSERT transactions, I discovered COMMIT operations were taking 22ms compared to 0.4ms on a slower development machine. The pg_test_fsync
results revealed the core issue:
fdatasync: 11.920 ops/sec (server) vs 68.871 ops/sec (dev machine)
fsync: 30.524 ops/sec (server) vs 34.593 ops/sec (dev machine)
The problematic server runs Software RAID 1 with these specifications:
Array Size: 2917156159 (2782.02 GiB)
Disks: 2x Seagate ST3000DM001-9YN166 (SATA)
Filesystem: ext4 with default options
Key findings from storage diagnostics:
- Partition alignment confirmed with 4096B physical sectors
- No SMART errors detected
- Individual disk tests showed similar performance
- Mount options:
rw,noatime
For this RAID configuration, consider these postgresql.conf adjustments:
# Reduce fsync pressure
wal_buffers = 16MB
synchronous_commit = off
commit_delay = 10000
commit_siblings = 5
# Alternative WAL method
wal_sync_method = fdatasync
Add these to /etc/sysctl.conf for better I/O performance:
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
vm.swappiness = 1
blockdev --setra 4096 /dev/md2
Remount with optimized parameters:
mount -o remount,noatime,nodiratime,data=writeback,barrier=0 /dev/md2 /
For critical systems where 22ms commits are unacceptable:
1. Consider battery-backed write cache controller
2. Test with XFS filesystem
3. Evaluate ZFS with ZIL disabled
4. Upgrade to NVMe storage for WAL
Create a monitoring view to track commit latency:
CREATE VIEW commit_stats AS
SELECT
avg(total_time) as avg_commit_time,
max(total_time) as max_commit_time,
count(*) as transactions
FROM pg_stat_statements
WHERE query = 'COMMIT';