Debugging Orphaned TCP Ports: When netstat Shows Open Ports Without Associated Processes


2 views

During routine server maintenance, I encountered a peculiar networking anomaly - port 5666 showed as LISTENING in netstat output, yet no process was attached to it. This occurred on an EC2 instance running NRPE monitoring:

sudo netstat -tulnp | grep 5666
tcp        0      0 0.0.0.0:5666            0.0.0.0:*               LISTEN      -

The standard diagnostic tools yielded conflicting results:

sudo lsof -i :5666  # No output
ps aux | grep nrpe  # No process running

Upon checking kernel messages, I found critical clues:

dmesg | tail -20
[132031.611745] BUG: unable to handle kernel paging request
[132031.612338] IP: [] tcp_v4_do_rcv+0x0/0x820
[132031.613024] Pid: 0, comm: swapper Not tainted 2.6.16-xenU #1

The older Linux kernel (2.6.16) had stability issues with TCP stack management. When the kernel crashed, it failed to properly clean up network resources, leaving the port in a zombie state.

This situation can be artificially reproduced through kernel module manipulation:

# Experimental module to simulate the bug
#include <linux/module.h>
#include <linux/net.h>
#include <net/tcp.h>

static int __init port_hijack_init(void) {
    struct socket *sock;
    int err = sock_create_kern(PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
    if (!err) {
        struct sockaddr_in addr = { .sin_family = AF_INET, .sin_port = htons(5666) };
        kernel_bind(sock, (struct sockaddr *)&addr, sizeof(addr));
        kernel_listen(sock, 1);
        printk(KERN_INFO "Port 5666 bound without proper process context\n");
    }
    return 0;
}

module_init(port_hijack_init);
MODULE_LICENSE("GPL");

For EC2 instances, the solution involved:

# Upgrade kernel on Ubuntu/Debian
sudo apt-get update
sudo apt-get install linux-image-$(uname -r|sed 's,[^-]*-[^-]*-,,')

For immediate port recovery without reboot:

# Reset TCP stack (may drop existing connections)
echo 1 > /proc/sys/net/ipv4/tcp_abort_on_overflow
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle

Modern monitoring solutions should include kernel version checks:

#!/bin/bash
MIN_KERNEL="2.6.32"
CURRENT_KERNEL=$(uname -r | cut -d'-' -f1)

if [ "$(printf '%s\n' "$MIN_KERNEL" "$CURRENT_KERNEL" | sort -V | head -n1)" != "$MIN_KERNEL" ]; then
    echo "CRITICAL: Kernel $CURRENT_KERNEL is vulnerable to orphaned port bugs"
    exit 2
fi

During routine server maintenance, I encountered a puzzling scenario where TCP port 5666 appeared open in netstat output, yet no process was actively bound to it. This manifested through the following diagnostic commands:

# Initial discovery
netstat -ln --program
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State   PID/Program name
tcp   0      0      0.0.0.0:5666    0.0.0.0:*       LISTEN  -

The expected NRPE daemon (part of Opsview monitoring) wasn't running, and attempts to start it failed immediately. Standard process inspection tools came up empty:

lsof -i :5666
# No output
ps aux | grep nrpe
# No relevant processes

When basic diagnostics didn't reveal the culprit, I expanded the investigation scope:

  1. Kernel Connection Tracking:
  2. ss -tulnp | grep 5666
    # Showed empty process field
  3. Kernel Socket State Inspection:
  4. cat /proc/net/tcp | grep 0A
    # Hex value for port 5666 (0x1622)
    grep -a "1622" /proc/net/tcp
  5. Kernel Message Buffer:
  6. dmesg | grep -i "5666\|nrpe\|kernel"
    # Revealed critical kernel panics

The EC2 instance was running an outdated and unstable Linux kernel (2.6.16). During a kernel panic:

  • The NRPE process terminated abnormally
  • Kernel TCP stack maintained the listening socket state
  • Cleanup routines never executed properly

This created an "orphaned port" scenario where the network stack believed the port was in use, but no user-space process owned it.

Immediate Resolution:

# Force kernel to release the port
echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse
# Then reboot the instance

Permanent Fix:

# For AWS EC2 instances specifically
sudo apt-get install linux-image-$(uname -r)-ec2
sudo reboot

Preventive Monitoring (sample Nagios check):

#!/bin/bash
PORT=5666
NETSTAT=$(netstat -ln --program | grep ":${PORT}")
PROCESS=$(lsof -i :${PORT})

if [ -z "$PROCESS" ] && [ ! -z "$NETSTAT" ]; then
  echo "CRITICAL: Orphaned port ${PORT} detected"
  exit 2
fi

For persistent cases, these techniques can help:

# Kernel socket diagnostics
strace -f -e trace=network nc -zv 127.0.0.1 5666

# Kernel module inspection (if suspecting firewall modules)
lsmod | grep nf_
modinfo nf_conntrack

Remember that different Linux distributions may require variant commands. For RHEL-based systems, consider:

sudo semodule -DB  # Rebuild SELinux policy cache
sudo restorecon -Rv /usr/local/nagios/