When dealing with long-running PostgreSQL queries that have become unresponsive, database administrators typically try these methods in sequence:
pg_cancel_backend(pid)
- The polite SIGINT approachpg_terminate_backend(pid)
- The forceful SIGTERM approach- OS-level
kill
commands
But what happens when all these standard methods fail? This usually occurs when:
- The backend process is stuck in an uninterruptible state (D state)
- There's a kernel-level deadlock
- The process is waiting for hardware resources
- PostgreSQL's internal state tracking has become corrupted
When standard termination methods fail, try this escalation path:
-- First attempt (SIGINT)
SELECT pg_cancel_backend(12345);
-- Second attempt (SIGTERM)
SELECT pg_terminate_backend(12345);
-- From OS level
kill 12345
kill -9 12345 # Nuclear option
-- For containerized environments
docker kill --signal=SIGKILL container_name
When even SIGKILL fails, you're dealing with a kernel-level issue. Try these advanced techniques:
# Find processes in D state
ps aux | awk '$8 ~ /D/ { print $0 }'
# Alternative process inspection
cat /proc/12345/status | grep State
# Kernel thread inspection (requires root)
echo t > /proc/12345/sysrq-trigger
For Kubernetes environments:
kubectl debug node/node-name -it --image=alpine
nsenter --target 1 --mount
kill -9 12345
To avoid getting into this situation:
- Set proper
statement_timeout
andlock_timeout
parameters - Implement query monitoring with tools like pg_stat_activity
- Consider connection poolers with automatic termination features
- Use PostgreSQL's background worker infrastructure for long operations
When a process enters 'D' state (uninterruptible sleep), it's typically waiting for:
- Disk I/O completion
- Network filesystem responses
- Hardware device readiness
The only ways out are:
- The awaited resource becomes available
- A system reboot
When you can't reboot immediately:
# Force filesystem sync (may help with stuck I/O)
sync
# Drop filesystem caches (dangerous in production)
echo 3 > /proc/sys/vm/drop_caches
# For NFS issues
umount -f /mnt/nfs
We've all been there - a PostgreSQL query gets stuck, consuming resources and blocking other operations. The usual approach is to use:
SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE ...;
But what happens when this fails? Here's what I've learned from years of PostgreSQL administration.
When pg_cancel_backend
fails, the next step is to use its more aggressive cousin:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE state = 'active' AND query_start < NOW() - INTERVAL '1 hour';
This immediately terminates the backend process rather than politely asking it to stop.
In rare cases where pg_terminate_backend
doesn't work, you might need to:
- Restart the PostgreSQL service (not ideal for production)
- Use OS-level commands to kill the process
For Linux systems:
sudo kill -9 [postgres_pid]
Better than fixing is preventing. Consider these approaches:
-- Set statement timeout
ALTER DATABASE your_db SET statement_timeout = '30s';
-- Use query timeouts in your application
SET LOCAL statement_timeout = '5min';
If you frequently encounter this problem, investigate:
- Long-running transactions (check
pg_stat_activity
) - Lock contention (use
pg_locks
view) - Resource bottlenecks (monitor CPU, memory, I/O)
Remember that abruptly terminating queries can lead to data inconsistency, so always prefer graceful termination when possible.