When you encounter the "memcached dead but subsys locked
" message on CentOS, it indicates a specific service management state where:
- The process appears dead in service manager's perspective (like
chkconfig
orservice
commands) - System retains lock files preventing service restart
- Actual process might still be running (as shown in your
ps
output)
Your ps
output shows memcached is actually running under user nobody
:
nobody 21983 0.0 1.8 60272 19912 ? Ssl 16:46 0:00 memcached -d -p 11211 -u nobody -c 1024 -m 64
The port appears properly bound as shown in netstat
:
tcp 0 0 :::11211 :::* LISTEN
udp 0 0 0.0.0.0:11211 0.0.0.0:*
This typically occurs when:
- PID file exists but process isn't properly registered in service manager
- Improper shutdown left lock files in
/var/lock/subsys/
- Multiple instances attempting to bind to same port
1. Clean up existing locks:
sudo rm -f /var/lock/subsys/memcached
sudo rm -f /var/run/memcached.pid
2. Verify no conflicting processes:
sudo lsof -i :11211
ps aux | grep memcached
3. Force-clean the service state:
sudo service memcached stop
sudo pkill -9 memcached
Add proper PID file handling in /etc/sysconfig/memcached
:
PORT="11211"
USER="nobody"
MAXCONN="1024"
CACHESIZE="64"
OPTIONS="-l 127.0.0.1 -P /var/run/memcached.pid"
Create a proper init script at /etc/init.d/memcached
:
#!/bin/sh
#
# chkconfig: - 55 45
# description: memcached
PIDFILE=/var/run/memcached.pid
LOCKFILE=/var/lock/subsys/memcached
start() {
[ -f $LOCKFILE ] && return 0
daemon --pidfile $PIDFILE /usr/bin/memcached -d -p 11211 -u nobody -c 1024 -m 64
retval=$?
[ $retval -eq 0 ] && touch $LOCKFILE
return $retval
}
stop() {
[ ! -f $PIDFILE ] && return 0
killproc -p $PIDFILE /usr/bin/memcached
retval=$?
[ $retval -eq 0 ] && rm -f $LOCKFILE
return $retval
}
After implementing fixes, verify with:
sudo service memcached restart
sudo tail -f /var/log/memcached.log
sudo service memcached status
The "dead but subsys locked" message typically appears when a service (in this case memcached) reports as stopped in the system's service management framework, but certain subsystem resources remain allocated. This state often indicates that the process terminated unexpectedly while holding locks or resources.
From your system observations:
# Network status:
tcp 0 0 :::11211 :::* LISTEN
udp 0 0 0.0.0.0:11211 0.0.0.0:*
# Process status:
nobody 21983 0.0 1.8 60272 19912 ? Ssl 16:46 0:00 memcached -d -p 11211 -u nobody -c 1024 -m 64
Several scenarios can lead to this state:
- Improper service shutdown
- Resource leaks preventing cleanup
- PID file remaining after process termination
- Incorrect SELinux contexts
First, check for stale PID files:
ls -l /var/run/memcached/
cat /var/run/memcached/memcached.pid
Verify service unit status:
systemctl status memcached
journalctl -u memcached -n 50
Method 1: Clean Restart
systemctl stop memcached
pkill -9 memcached
rm -f /var/run/memcached/memcached.pid
systemctl start memcached
Method 2: Configuration Check
Review your memcached configuration (/etc/sysconfig/memcached):
PORT="11211"
USER="nobody"
MAXCONN="1024"
CACHESIZE="64"
OPTIONS=""
Method 3: SELinux Context Verification
ls -Z /usr/bin/memcached
restorecon -v /usr/bin/memcached
Consider implementing these best practices:
- Add proper logging to your memcached service:
OPTIONS="-vv >> /var/log/memcached.log 2>&1"
- Implement a monitoring script:
#!/bin/bash
if ! pgrep -x "memcached" > /dev/null
then
systemctl restart memcached
fi
For persistent issues, try running memcached in foreground debug mode:
memcached -vv -u nobody -p 11211
Check system resource limits:
ulimit -a
cat /proc/$(pgrep memcached)/limits