How to Permanently Kill a Zombie Logstash Process in Linux: Complete Termination Guide


2 views



When a Java process like Logstash refuses to die despite SIGTERM/SIGKILL signals and even survives reboots, you're dealing with either:

  • A process manager respawning it
  • A zombie process situation
  • Init system interference

First verify if it's being respawned by monitoring process ancestry:

# Check process tree
pstree -p 2591
ps -ef --forest | grep logstash

# Check service status
service logstash status
chkconfig --list logstash

1. Disable the Service First

# For SysV init systems:
service logstash stop
chkconfig logstash off

# Alternative forceful method:
update-rc.d -f logstash remove

2. Nuclear Termination Sequence

When standard methods fail:

# Find all related processes
pgrep -f logstash | xargs kill -9

# Kill entire process group
pkill -9 -g [process_group_id]

# Remove PID files manually
rm -f /var/run/logstash/logstash.pid

3. Advanced Systemd Tactics (if applicable)

systemctl stop logstash
systemctl disable logstash
systemctl mask logstash  # Prevents all future starts
# Verify no Java processes remain
ps aux | grep '[j]ava'

# Check for orphaned file handles
lsof +L1 | grep logstash

# Clean up temporary files
find /tmp -name "*logstash*" -exec rm -rf {} \;

For processes that respawn after reboot:

# Check cron jobs
crontab -l
ls -la /etc/cron*

# Verify rc.local
cat /etc/rc.local

# Check user profiles
cat ~/.bashrc
cat ~/.bash_profile

When all else fails:

# Use gcore to dump core and kill
gcore 2591
kill -9 2591

# Mount namespace isolation
nsenter -t 2591 -m -p -- kill -9 1

# Kernel-level process termination
echo 1 > /proc/2591/oom_adj


When a Java process becomes completely unresponsive to SIGTERM and even SIGKILL (-9) signals, and worse - keeps respawning after termination attempts, you're dealing with what we call a "zombie process on steroids". This particular case involves Logstash 5.0.0~alpha5 running on RHEL 6.7 with OpenJDK 1.8.0_101.

The key indicators from your logs show this behavior:

Sep 15 13:22:17 test init: logstash main process (2546) killed by KILL signal
Sep 15 13:22:17 test init: logstash main process ended, respawning

This reveals the process is being managed by init (probably upstart or sysvinit) which automatically respawns killed processes. The process itself might also be forking child processes that survive the kill attempt.

Here's the step-by-step approach to completely eliminate this persistent process:

# 1. First identify the parent process and entire process tree
pstree -p | grep logstash
# or
ps -ef --forest | grep logstash

# 2. Kill the entire process group (more effective than killing individual PIDs)
kill -9 -$(ps -o pgid= PID | grep -o '[0-9]*')

# 3. If managed by init, prevent respawning by stopping the service first
service logstash stop
# or for systems without service command
/etc/init.d/logstash stop

# 4. For systemd systems (though RHEL 6 uses upstart/sysvinit)
systemctl stop logstash
systemctl disable logstash

# 5. Verify no remaining processes
ps aux | grep logstash

# 6. Remove any PID files that might trigger automatic restart
rm -f /var/run/logstash/logstash.pid

If the process still persists, consider these nuclear options:

# 1. Change the process state to uninterruptible sleep
kill -SIGSTOP PID
# Then kill it while stopped

# 2. Use gdb to attach and terminate
gdb -p PID
(gdb) call exit(1)
(gdb) detach
(gdb) quit

# 3. For containerized environments (Azure VMs might use these)
kill -- -$(cat /proc/PID/stat | cut -d " " -f 5)

# 4. As last resort, remount filesystems read-only to prevent respawning
mount -o remount,ro /

For your specific Logstash case, consider these configuration improvements:

# In /etc/logstash/logstash.yml add:
pipeline:
  workers: 1
  batch:
    size: 125
    delay: 5

# For init scripts, modify respawn behavior in /etc/init/logstash.conf:
respawn limit 5 60  # Allow only 5 respawns in 60 seconds

Remember that Java processes can sometimes ignore signals due to JVM signal handling. Adding JVM shutdown hooks in your application code can help proper cleanup.