PostgreSQL Performance Optimization: REINDEX vs VACUUM After Mass Row Deletions

When working with PostgreSQL databases (especially older versions like 8.2.3), managing table performance after large-scale deletions is crucial. The scenario describes:

Logging tables with millions of rows
Monthly purges of data older than 30 days
Current practice of running REINDEX after deletions
Concern about whether VACUUM operations should be included

For optimal performance after mass deletions, you need to consider three operations:

-- Basic maintenance commands
REINDEX TABLE logging_table;
VACUUM (VERBOSE, ANALYZE) logging_table;
VACUUM FULL logging_table; -- Use with caution

In PostgreSQL 8.2.3 (before autovacuum was robust), manual VACUUM is critical because:

Deletes mark rows as "dead" but don't reclaim space
Indexes retain pointers to dead tuples
Table statistics become inaccurate without ANALYZE

While REINDEX helps, it's not a complete solution:

-- REINDEX only addresses index bloat
REINDEX TABLE large_log_table; -- Takes exclusive lock
-- Compare to:
VACUUM (VERBOSE, ANALYZE) large_log_table; -- Less intrusive

For your logging table scenario:

Delete old records
Run VACUUM (ANALYZE)
Periodically REINDEX (weekly/monthly)

Here's a complete maintenance script for logging tables:

-- Delete old records
DELETE FROM application_logs WHERE log_date < NOW() - INTERVAL '30 days';

-- Follow with VACUUM ANALYZE
VACUUM (VERBOSE, ANALYZE) application_logs;

-- Monthly reindex (schedule during low traffic)
REINDEX TABLE application_logs;

-- For very large tables, consider:
VACUUM FULL VERBOSE ANALYZE application_logs; -- Locks table

In PostgreSQL 8.2.3:

VACUUM ANALYZE updates statistics for query planner
Plain VACUUM is non-blocking (preferred for production)
VACUUM FULL rewrites table (requires exclusive lock)

Check table and index bloat with:

SELECT relname, n_dead_tup, last_vacuum, last_autovacuum 
FROM pg_stat_user_tables 
WHERE relname = 'application_logs';

When dealing with logging tables where we regularly purge millions of rows (like 30-day retention policies), proper maintenance becomes critical. Unlike modern PostgreSQL versions with autovacuum, version 8.2.3 requires manual intervention.

When you DELETE FROM logs WHERE created_at < NOW() - INTERVAL '30 days':

1. Dead tuples accumulate (visible via SELECT n_dead_tup FROM pg_stat_user_tables)
2. Indexes maintain pointers to deleted rows
3. Table statistics become outdated

While REINDEX cleans up orphaned index entries, it's often overkill:

-- Before automatic rebuild
SELECT pg_size_pretty(pg_indexes_size('logs'));

-- Full reindex (locks table)
REINDEX TABLE logs;

-- Concurrent alternative (PostgreSQL 8.2 lacks REINDEX CONCURRENTLY)
CREATE INDEX CONCURRENTLY logs_new_idx ON logs(created_at);
DROP INDEX logs_idx;
ALTER INDEX logs_new_idx RENAME TO logs_idx;

For your use case, I recommend:

-- Basic vacuum (space reclamation)
VACUUM VERBOSE logs;

-- With statistics update (helps query planner)
VACUUM ANALYZE VERBOSE logs;

-- Aggressive vacuum (for one-time cleanup)
VACUUM FULL ANALYZE logs;

For a logging table called app_logs:

-- 1. Delete old records
BEGIN;
DELETE FROM app_logs WHERE created_at < NOW() - INTERVAL '30 days';
COMMIT;

-- 2. Vacuum with statistics
VACUUM ANALYZE app_logs;

-- 3. Conditionally reindex
DO $$
BEGIN
    IF (SELECT n_dead_tup FROM pg_stat_user_tables WHERE relname = 'app_logs') > 1000000 THEN
        EXECUTE 'REINDEX TABLE app_logs';
    END IF;
END $$;

Create a monitoring view:

CREATE VIEW table_metrics AS
SELECT 
    relname,
    n_live_tup,
    n_dead_tup,
    pg_size_pretty(pg_relation_size(relid)) AS size,
    pg_size_pretty(pg_indexes_size(relid)) AS idx_size,
    last_vacuum,
    last_autovacuum,
    last_analyze
FROM pg_stat_user_tables;

ServerDevWorker

PostgreSQL Performance Optimization: REINDEX vs VACUUM After Mass Row Deletions

Related Articles