Zero-Downtime PostgreSQL Upgrade Strategies for Large Production Databases (9.4 to 9.5)


2 views

When dealing with a 10GB+ production PostgreSQL database (like your 9.4 instance), traditional upgrade methods become problematic. The pg_upgradecluster approach requires stopping the database, while pg_dump/restore operations take prohibitively long (7+ hours in your case). The autovacuum errors you encountered during restoration indicate serious performance bottlenecks.

Here are three battle-tested methods for enterprise PostgreSQL upgrades:

// Method 1: Logical Replication with Slony
// Install Slony on both servers
sudo apt-get install slony1-2-bin postgresql-9.5-slony1-2

// Basic slonik configuration example
cluster name = upgrade_cluster;
node 1 admin conninfo = 'host=primary dbname=mydb user=slony';
node 2 admin conninfo = 'host=replica dbname=mydb user=slony';
init cluster (id=1, comment = 'Primary Node');
store node (id=2, comment = 'Replica Node', event node=1);
store path (server=1, client=2, conninfo='host=primary dbname=mydb');
store path (server=2, client=1, conninfo='host=replica dbname=mydb');

For newer PostgreSQL versions (9.4+), pglogical offers better performance:

-- On primary (9.4)
CREATE EXTENSION pglogical;
SELECT pglogical.create_node(node_name := 'provider', dsn := 'host=primary dbname=mydb');

-- On replica (9.5)
CREATE EXTENSION pglogical;
SELECT pglogical.create_node(node_name := 'subscriber', dsn := 'host=replica dbname=mydb');
SELECT pglogical.create_subscription(
    subscription_name := 'upgrade_sub',
    provider_dsn := 'host=primary dbname=mydb',
    replication_sets := ARRAY['default'],
    synchronize_data := true
);

When ready to cut over:

  1. Stop writes to primary
  2. Verify replication lag is zero
  3. Promote replica
  4. Reconfigure applications

For large tables, consider these performance tweaks during migration:

-- On target server (9.5)
ALTER SYSTEM SET maintenance_work_mem = '1GB';
ALTER SYSTEM SET autovacuum = off;
ALTER SYSTEM SET wal_level = 'logical';
ALTER SYSTEM SET max_wal_senders = 10;

For more complex scenarios:

# Install Bucardo
sudo apt-get install bucardo

# Configure sync
bucardo add db primary dbname=mydb
bucardo add db replica dbname=mydb
bucardo add all tables
bucardo add sync upgrade sync=upgrade type=pushdelta
bucardo start

Upgrading PostgreSQL on a production server with a 10GB+ database presents unique challenges. Traditional methods like pg_upgradecluster or pg_dump/restore often require unacceptable downtime windows - especially when indexes need rebuilding or autovacuum interferes with the process.

The fundamental issues with conventional approaches:

  • pg_upgradecluster requires stopping the database during the entire upgrade process
  • Dump/restore operations become exponentially slower with larger databases
  • Autovacuum tasks compete for resources during restoration
  • Index creation becomes a major bottleneck (as seen in your 7+ hour restore)

Here are proven methods for major version upgrades without service interruption:

1. Logical Replication with Slony or Londiste

Example Slony setup for minimal downtime:


# On master (9.4)
slonik <<EOF
cluster name = upgrade_cluster;
node 1 admin conninfo = 'dbname=mydb host=master user=slony';
node 2 admin conninfo = 'dbname=mydb host=new_server user=slony';

init cluster (id=1, comment = 'Master Node');
store node (id=2, comment = 'Slave Node', event node=1);
store path (server=1, client=2, conninfo='dbname=mydb host=master user=slony');
store path (server=2, client=1, conninfo='dbname=mydb host=new_server user=slony');

create set (id=1, origin=1, comment='All tables');
set add table (set id=1, origin=1, id=1, fully qualified name='public.users');
# Add all other tables...

subscribe set (id=1, provider=1, receiver=2, forward=yes);
EOF

2. Using pglogical Extension

For PostgreSQL 9.4+, pglogical offers native logical replication:


-- On source (9.4)
CREATE EXTENSION pglogical;
SELECT pglogical.create_node(node_name := 'provider', dsn := 'host=master dbname=mydb');

SELECT pglogical.create_replication_set('upgrade_set');
SELECT pglogical.replication_set_add_table('upgrade_set', 'public.users');
-- Add all tables

-- On target (9.5)
CREATE EXTENSION pglogical;
SELECT pglogical.create_node(node_name := 'subscriber', dsn := 'host=new_server dbname=mydb');

SELECT pglogical.create_subscription(
    subscription_name := 'upgrade_sub',
    provider_dsn := 'host=master dbname=mydb',
    replication_sets := ARRAY['upgrade_set'],
    synchronize_data := true
);

When switching to the new server:

  1. Put the application in read-only mode briefly (seconds)
  2. Verify replication lag is zero
  3. Promote the new server
  4. Update connection strings
  5. Re-enable writes

To prevent index creation from becoming a bottleneck:


-- Create indexes concurrently after data load
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
-- Adjust maintenance_work_mem temporarily
SET maintenance_work_mem = '256MB';

Essential checks during the upgrade process:


-- Check replication status
SELECT * FROM pglogical.show_subscription_status();

-- Verify data consistency
SELECT count(*) FROM users; -- Compare between servers
SELECT md5(array_agg(t)::text) FROM (SELECT * FROM users ORDER BY id) t;

For more complex environments:


bucardo add db master dbname=mydb host=master
bucardo add db new_server dbname=mydb host=new_server
bucardo add all tables
bucardo add sync upgrade sync dbs=mydb:source,new_server:target
bucardo start