How to Diagnose and Resolve High numothersock Socket Usage in Linux Servers


2 views

When working with MediaTemple DV servers or any Linux-based system, the numothersock metric in QOS alerts refers to the count of non-TCP sockets being utilized. This includes:

  • UNIX domain sockets (AF_UNIX) for inter-process communication
  • UDP sockets for DNS lookups and other connectionless protocols
  • Raw sockets and other specialized socket types

Here's how to identify the processes creating these sockets:

# Method 1: Using ss command (modern replacement for netstat)
ss -xap | grep -v "tcp"

# Sample output:
# u_str  ESTAB   0    0    /var/run/docker.sock 12345    * 0    users:(("docker",pid=4567,fd=3))
# udp    UNCONN   0    0    0.0.0.0:68          0.0.0.0:*    users:(("dhclient",pid=8910,fd=6))
# Method 2: Using lsof for detailed process information
sudo lsof -U -i udp -n -P

# Breakdown of flags:
# -U: Show UNIX domain sockets
# -i udp: Show UDP sockets
# -n: Show numerical addresses
# -P: Show raw port numbers

Based on my experience troubleshooting similar issues, these are frequent offenders:

1. Docker and Container Runtimes

Docker heavily uses UNIX domain sockets for communication:

# Check docker socket usage
sudo ls -l /var/run/docker.sock
sudo ss -xp | grep docker

2. DNS Resolvers

Applications making frequent DNS queries can accumulate UDP sockets:

# Monitor DNS queries
sudo tcpdump -i any port 53 -n

3. Custom Applications

Poorly written applications might leak sockets. Here's a Python example of proper socket handling:

import socket
import contextlib

def create_udp_socket():
    with contextlib.closing(socket.socket(socket.AF_INET, socket.SOCK_DGRAM)) as sock:
        sock.settimeout(5.0)  # Prevent hanging
        # Perform operations
        sock.sendto(b'test', ('8.8.8.8', 53))
        # Socket automatically closed by context manager

When you've identified the problematic processes, consider these approaches:

Rate Limiting

For DNS-heavy applications, implement client-side caching:

# Python example using cachetools
from cachetools import TTLCache
import socket

dns_cache = TTLCache(maxsize=100, ttl=300)  # 5 minute TTL

def cached_dns_lookup(hostname):
    if hostname not in dns_cache:
        dns_cache[hostname] = socket.gethostbyname(hostname)
    return dns_cache[hostname]

Set up proactive monitoring with these commands:

# Continuous monitoring watch
watch -n 5 "ss -x | wc -l"

# Historical tracking (add to cron)
echo "$(date +%s),$(ss -x | wc -l)" >> /var/log/socket_count.log

For persistent issues, consider modifying system limits:

# Check current limits
cat /proc/sys/fs/file-max

# Temporary increase (surge situations)
sudo sysctl -w fs.file-max=16384

# Permanent change
echo "fs.file-max = 16384" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

When dealing with Linux server resource issues, socket exhaustion is a common pain point that often manifests through cryptic monitoring alerts. The numothersock metric specifically tracks non-TCP sockets - including UNIX domain sockets and UDP sockets used for inter-process communication and DNS resolution.

Here are the most effective commands to identify socket consumers:

# Show all open UNIX domain sockets with process info
ss -xap | grep -v "Netid" | sort -k5

# Display UDP socket count by process
ss -uap | awk '/ESTAB/ {print $6}' | cut -d: -f1 | sort | uniq -c | sort -n

# Real-time socket monitoring (requires sysdig)
sudo sysdig -pc -c topconns

These patterns often indicate specific applications:

  • PHP-FPM: Numerous /var/run/php-fpm.sock connections
  • Docker: /var/run/docker.sock with high connection count
  • DNS: Port 53 UDP sockets from systemd-resolved or named

Let's trace socket creation in real-time:

# Install required tools
sudo apt install sysdig

# Capture socket creation events
sudo sysdig -p "%proc.name %fd.name" "evt.type=connect or evt.type=accept"

For persistent issues, adjust these sysctl parameters:

# Increase socket buffer sizes
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216

# Adjust socket recycling behavior
sudo sysctl -w net.ipv4.tcp_tw_reuse=1
sudo sysctl -w net.ipv4.tcp_fin_timeout=30

Create a cron job to log socket usage:

#!/bin/bash
DATE=$(date +%Y-%m-%d_%H-%M-%S)
ss -x | wc -l > /var/log/socket-count-$DATE.log
lsof -U | awk '{print $1}' | sort | uniq -c >> /var/log/socket-count-$DATE.log

For deep inspection using modern kernel capabilities:

# Trace socket creation system calls
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_socket { printf("%s: %d\n", comm, args->type); }'