PostgreSQL Performance Optimization: REINDEX vs VACUUM After Mass Row Deletions


3 views

When working with PostgreSQL databases (especially older versions like 8.2.3), managing table performance after large-scale deletions is crucial. The scenario describes:

  • Logging tables with millions of rows
  • Monthly purges of data older than 30 days
  • Current practice of running REINDEX after deletions
  • Concern about whether VACUUM operations should be included

For optimal performance after mass deletions, you need to consider three operations:

-- Basic maintenance commands
REINDEX TABLE logging_table;
VACUUM (VERBOSE, ANALYZE) logging_table;
VACUUM FULL logging_table; -- Use with caution

In PostgreSQL 8.2.3 (before autovacuum was robust), manual VACUUM is critical because:

  • Deletes mark rows as "dead" but don't reclaim space
  • Indexes retain pointers to dead tuples
  • Table statistics become inaccurate without ANALYZE

While REINDEX helps, it's not a complete solution:

-- REINDEX only addresses index bloat
REINDEX TABLE large_log_table; -- Takes exclusive lock
-- Compare to:
VACUUM (VERBOSE, ANALYZE) large_log_table; -- Less intrusive

For your logging table scenario:

  1. Delete old records
  2. Run VACUUM (ANALYZE)
  3. Periodically REINDEX (weekly/monthly)

Here's a complete maintenance script for logging tables:

-- Delete old records
DELETE FROM application_logs WHERE log_date < NOW() - INTERVAL '30 days';

-- Follow with VACUUM ANALYZE
VACUUM (VERBOSE, ANALYZE) application_logs;

-- Monthly reindex (schedule during low traffic)
REINDEX TABLE application_logs;

-- For very large tables, consider:
VACUUM FULL VERBOSE ANALYZE application_logs; -- Locks table

In PostgreSQL 8.2.3:

  • VACUUM ANALYZE updates statistics for query planner
  • Plain VACUUM is non-blocking (preferred for production)
  • VACUUM FULL rewrites table (requires exclusive lock)

Check table and index bloat with:

SELECT relname, n_dead_tup, last_vacuum, last_autovacuum 
FROM pg_stat_user_tables 
WHERE relname = 'application_logs';

When dealing with logging tables where we regularly purge millions of rows (like 30-day retention policies), proper maintenance becomes critical. Unlike modern PostgreSQL versions with autovacuum, version 8.2.3 requires manual intervention.

When you DELETE FROM logs WHERE created_at < NOW() - INTERVAL '30 days':

1. Dead tuples accumulate (visible via SELECT n_dead_tup FROM pg_stat_user_tables)
2. Indexes maintain pointers to deleted rows
3. Table statistics become outdated

While REINDEX cleans up orphaned index entries, it's often overkill:

-- Before automatic rebuild
SELECT pg_size_pretty(pg_indexes_size('logs'));

-- Full reindex (locks table)
REINDEX TABLE logs;

-- Concurrent alternative (PostgreSQL 8.2 lacks REINDEX CONCURRENTLY)
CREATE INDEX CONCURRENTLY logs_new_idx ON logs(created_at);
DROP INDEX logs_idx;
ALTER INDEX logs_new_idx RENAME TO logs_idx;

For your use case, I recommend:

-- Basic vacuum (space reclamation)
VACUUM VERBOSE logs;

-- With statistics update (helps query planner)
VACUUM ANALYZE VERBOSE logs;

-- Aggressive vacuum (for one-time cleanup)
VACUUM FULL ANALYZE logs;

For a logging table called app_logs:

-- 1. Delete old records
BEGIN;
DELETE FROM app_logs WHERE created_at < NOW() - INTERVAL '30 days';
COMMIT;

-- 2. Vacuum with statistics
VACUUM ANALYZE app_logs;

-- 3. Conditionally reindex
DO $$
BEGIN
    IF (SELECT n_dead_tup FROM pg_stat_user_tables WHERE relname = 'app_logs') > 1000000 THEN
        EXECUTE 'REINDEX TABLE app_logs';
    END IF;
END $$;

Create a monitoring view:

CREATE VIEW table_metrics AS
SELECT 
    relname,
    n_live_tup,
    n_dead_tup,
    pg_size_pretty(pg_relation_size(relid)) AS size,
    pg_size_pretty(pg_indexes_size(relid)) AS idx_size,
    last_vacuum,
    last_autovacuum,
    last_analyze
FROM pg_stat_user_tables;