Debugging mod_fcgi SIGKILL Errors in Virtualmin: Graceful Termination Failures and System Resource Analysis


4 views

When mod_fcgid fails to gracefully terminate a process (exit code 0), it escalates to SIGKILL after the configured timeout period. This behavior appears in logs as:

[Thu Aug 02 01:17:32 2012] [warn] mod_fcgid: process 26460 graceful kill fail, sending SIGKILL

The described symptoms suggest a cascading failure pattern:

  • Sudden CPU saturation (0% idle)
  • I/O wait spikes
  • No OOM killer involvement
  • Low traffic conditions (~200 requests/10min)

Check these critical mod_fcgid directives in your Apache configuration:

FcgidIdleTimeout 120
FcgidProcessLifeTime 3600
FcgidMaxProcesses 100
FcgidMaxProcessesPerClass 10
FcgidIdleScanInterval 30
FcgidZombieScanInterval 5

Run these during incident reproduction:

# Monitor process states
watch -n 1 'ps -eo pid,state,cmd | grep fcgid'

# I/O wait analysis
iotop -oP

# Memory pressure indicators
cat /proc/pressure/{memory,cpu,io}

For comparison, here's a robust PHP-FPM pool configuration that avoids similar issues:

[www]
user = apache
group = apache
listen = /var/run/php-fpm/www.sock
pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 10
pm.process_idle_timeout = 60s
pm.max_requests = 1000

Virtualmin's process management adds these potential factors:

  • Per-domain PHP handler configurations
  • Cron-initiated maintenance tasks
  • Log rotation handling
  • Custom wrapper script timeouts

Implement this shell snippet to capture future incidents:

#!/bin/bash
LOG=/var/log/httpd/error_log
PATTERN="mod_fcgid.*graceful kill fail"

tail -Fn0 $LOG | while read line; do
  if [[ "$line" =~ $PATTERN ]]; then
    echo "FCGID Kill detected at $(date)" >> /var/log/fcgid_monitor.log
    ps auxf >> /var/log/fcgid_monitor.log
    vmstat 1 5 >> /var/log/fcgid_monitor.log
  fi
done

When mod_fcgi encounters process management issues, the typical sequence looks like this in Apache logs:

[Thu Aug 02 01:17:32 2012] [warn] mod_fcgid: process 26460 graceful kill fail, sending SIGKILL
[Thu Aug 02 01:17:33 2012] [warn] mod_fcgid: process 26461 graceful kill fail, sending SIGKILL

This indicates Apache tried to gracefully terminate FCGI processes (SIGTERM) but had to escalate to SIGKILL when processes didn't respond. The cascade effect suggests either:

  • Processes were stuck in uninterruptible sleep (D state)
  • System-wide resource starvation (memory/IO)
  • Kernel-level contention

During such events, check these critical metrics simultaneously:

# Sample diagnostic commands:
dstat -tam --top-io --top-mem 5  # Combined resource monitoring
iotop -oPa                       # Show active I/O processes
vmstat 1 10                      # System-wide memory pressure

Key indicators of trouble:

  • Memory: High swap usage (si/so in vmstat) or low free memory
  • IO: Elevated await time in iostat or blocked processes in D state
  • CPU: High system time (sy%) versus user time (us%)

For Virtualmin environments, these configuration directives often need tuning:

# In /etc/apache2/mods-available/fcgid.conf
FcgidMaxProcesses 200            # Default may be too high
FcgidProcessLifeTime 3600        # Prevent memory leaks
FcgidIdleTimeout 300             # Reclaim idle processes
FcgidIOTimeout 45                # Adjust for slow storage
FcgidBusyTimeout 300             # Kill hung processes

A common pitfall is setting FcgidMaxProcesses too high without considering:

  • Available RAM: ~20MB per process × 200 = 4GB required
  • Storage speed: Slow disks increase process hang probability
  • CPU cores: Context switching overhead with many processes

Implement these proactive measures:

#!/bin/bash
# Monitor and auto-restart Apache when FCGI fails
while true; do
  if tail -n 50 /var/log/apache2/error.log | grep -q "graceful kill fail"; then
    echo "[$(date)] FCGI failures detected" >> /var/log/fcgi_watchdog.log
    apachectl graceful
  fi
  sleep 60
done

For more sophisticated monitoring:

# Sample Prometheus alert rule
- alert: FCGIKillRateHigh
  expr: rate(apache_mod_fcgi_kills_total[5m]) > 5
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "High rate of FCGI process kills ({{ $value }}/min)"

When standard logs aren't enough:

# Trace process deaths in real-time:
strace -p $(pgrep -f 'fcgid') -f -e trace=signal -o /tmp/fcgi_signals.log

# Check for memory allocation failures:
dmesg | grep -i oom
grep -i kill /var/log/kern.log

# Profile PHP processes (common FCGI target):
apt-get install php-xhprof
# Add to php.ini:
xhprof.output_dir=/tmp/xhprof

The described scenario (low traffic but high resource usage) often stems from:

  1. Scheduled jobs triggering PHP memory spikes
  2. Database maintenance tasks consuming IOPS
  3. Filesystem checks running concurrently

Solution approach:

# Identify midnight cron jobs:
grep -r '01:' /etc/cron* /var/spool/cron/

# Check for competing maintenance:
ls -la /etc/cron.daily/
cat /etc/logrotate.conf

For Virtualmin servers handling ~50 sites:

<IfModule mod_fcgid.c>
  FcgidMaxProcessesPerClass 10
  FcgidMaxProcesses 150
  FcgidMinProcessesPerClass 3
  FcgidProcessLifeTime 7200
  FcgidIdleTimeout 180
  FcgidIOTimeout 60
  FcgidBusyTimeout 120
  FcgidConnectTimeout 15
  FcgidOutputBufferSize 65536
</IfModule>

# In php.ini:
memory_limit = 128M        # Site-specific override in Virtualmin
max_execution_time = 90
realpath_cache_size = 256k
opcache.enable=1