When mod_fcgid fails to gracefully terminate a process (exit code 0), it escalates to SIGKILL after the configured timeout period. This behavior appears in logs as:
[Thu Aug 02 01:17:32 2012] [warn] mod_fcgid: process 26460 graceful kill fail, sending SIGKILL
The described symptoms suggest a cascading failure pattern:
- Sudden CPU saturation (0% idle)
- I/O wait spikes
- No OOM killer involvement
- Low traffic conditions (~200 requests/10min)
Check these critical mod_fcgid directives in your Apache configuration:
FcgidIdleTimeout 120
FcgidProcessLifeTime 3600
FcgidMaxProcesses 100
FcgidMaxProcessesPerClass 10
FcgidIdleScanInterval 30
FcgidZombieScanInterval 5
Run these during incident reproduction:
# Monitor process states
watch -n 1 'ps -eo pid,state,cmd | grep fcgid'
# I/O wait analysis
iotop -oP
# Memory pressure indicators
cat /proc/pressure/{memory,cpu,io}
For comparison, here's a robust PHP-FPM pool configuration that avoids similar issues:
[www]
user = apache
group = apache
listen = /var/run/php-fpm/www.sock
pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 10
pm.process_idle_timeout = 60s
pm.max_requests = 1000
Virtualmin's process management adds these potential factors:
- Per-domain PHP handler configurations
- Cron-initiated maintenance tasks
- Log rotation handling
- Custom wrapper script timeouts
Implement this shell snippet to capture future incidents:
#!/bin/bash
LOG=/var/log/httpd/error_log
PATTERN="mod_fcgid.*graceful kill fail"
tail -Fn0 $LOG | while read line; do
if [[ "$line" =~ $PATTERN ]]; then
echo "FCGID Kill detected at $(date)" >> /var/log/fcgid_monitor.log
ps auxf >> /var/log/fcgid_monitor.log
vmstat 1 5 >> /var/log/fcgid_monitor.log
fi
done
When mod_fcgi encounters process management issues, the typical sequence looks like this in Apache logs:
[Thu Aug 02 01:17:32 2012] [warn] mod_fcgid: process 26460 graceful kill fail, sending SIGKILL
[Thu Aug 02 01:17:33 2012] [warn] mod_fcgid: process 26461 graceful kill fail, sending SIGKILL
This indicates Apache tried to gracefully terminate FCGI processes (SIGTERM) but had to escalate to SIGKILL when processes didn't respond. The cascade effect suggests either:
- Processes were stuck in uninterruptible sleep (D state)
- System-wide resource starvation (memory/IO)
- Kernel-level contention
During such events, check these critical metrics simultaneously:
# Sample diagnostic commands:
dstat -tam --top-io --top-mem 5 # Combined resource monitoring
iotop -oPa # Show active I/O processes
vmstat 1 10 # System-wide memory pressure
Key indicators of trouble:
- Memory: High swap usage (si/so in vmstat) or low free memory
- IO: Elevated await time in iostat or blocked processes in D state
- CPU: High system time (sy%) versus user time (us%)
For Virtualmin environments, these configuration directives often need tuning:
# In /etc/apache2/mods-available/fcgid.conf
FcgidMaxProcesses 200 # Default may be too high
FcgidProcessLifeTime 3600 # Prevent memory leaks
FcgidIdleTimeout 300 # Reclaim idle processes
FcgidIOTimeout 45 # Adjust for slow storage
FcgidBusyTimeout 300 # Kill hung processes
A common pitfall is setting FcgidMaxProcesses
too high without considering:
- Available RAM: ~20MB per process × 200 = 4GB required
- Storage speed: Slow disks increase process hang probability
- CPU cores: Context switching overhead with many processes
Implement these proactive measures:
#!/bin/bash
# Monitor and auto-restart Apache when FCGI fails
while true; do
if tail -n 50 /var/log/apache2/error.log | grep -q "graceful kill fail"; then
echo "[$(date)] FCGI failures detected" >> /var/log/fcgi_watchdog.log
apachectl graceful
fi
sleep 60
done
For more sophisticated monitoring:
# Sample Prometheus alert rule
- alert: FCGIKillRateHigh
expr: rate(apache_mod_fcgi_kills_total[5m]) > 5
for: 2m
labels:
severity: warning
annotations:
summary: "High rate of FCGI process kills ({{ $value }}/min)"
When standard logs aren't enough:
# Trace process deaths in real-time:
strace -p $(pgrep -f 'fcgid') -f -e trace=signal -o /tmp/fcgi_signals.log
# Check for memory allocation failures:
dmesg | grep -i oom
grep -i kill /var/log/kern.log
# Profile PHP processes (common FCGI target):
apt-get install php-xhprof
# Add to php.ini:
xhprof.output_dir=/tmp/xhprof
The described scenario (low traffic but high resource usage) often stems from:
- Scheduled jobs triggering PHP memory spikes
- Database maintenance tasks consuming IOPS
- Filesystem checks running concurrently
Solution approach:
# Identify midnight cron jobs:
grep -r '01:' /etc/cron* /var/spool/cron/
# Check for competing maintenance:
ls -la /etc/cron.daily/
cat /etc/logrotate.conf
For Virtualmin servers handling ~50 sites:
<IfModule mod_fcgid.c>
FcgidMaxProcessesPerClass 10
FcgidMaxProcesses 150
FcgidMinProcessesPerClass 3
FcgidProcessLifeTime 7200
FcgidIdleTimeout 180
FcgidIOTimeout 60
FcgidBusyTimeout 120
FcgidConnectTimeout 15
FcgidOutputBufferSize 65536
</IfModule>
# In php.ini:
memory_limit = 128M # Site-specific override in Virtualmin
max_execution_time = 90
realpath_cache_size = 256k
opcache.enable=1