Debugging Systemd Memory Leaks: Analyzing 4GB RAM Consumption After Prolonged Uptime


2 views

When monitoring our CentOS 7 server with 16GB RAM, we observed systemd's memory footprint growing steadily at approximately 200MB/day, reaching 4GB after 18 days of uptime. The following top snapshot reveals the concerning metrics:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  1 root      20   0 3247784 2.920g   1800 S   3.0 18.9 287:41.35 systemd
737 root      20   0   27416   2524   1304 S   2.7  0.0 225:32.66 systemd-logind
548 root      20   0   82276  34652  34516 S   1.7  0.2 160:20.16 systemd-journal

The journalctl output showed an interesting pattern - rapid SSH session churn from rsync operations:

Feb 14 10:02:13 hostname systemd-logind[737]: New session 6467482 of user tropicg9.
Feb 14 10:02:13 hostname systemd[1]: Started Session 6467482 of user tropicg9.
Feb 14 10:02:13 hostname systemd[1]: Removed session 6467482.

To identify potential memory leaks, we ran these investigative commands:

# Check systemd memory mapping
sudo cat /proc/1/maps | awk '{print $1}' | sort | uniq -c | sort -n

# Monitor cgroup memory usage
systemd-cgtop -m

# Track systemd memory allocation
sudo valgrind --tool=memcheck --leak-check=full /usr/lib/systemd/systemd

We implemented these adjustments to mitigate the issue:

# /etc/systemd/journald.conf
[Journal]
SystemMaxUse=100M
RuntimeMaxUse=50M

# /etc/systemd/system.conf
[Manager]
DefaultMemoryAccounting=yes
DefaultTasksAccounting=yes

Created a watchdog script to track systemd memory growth:

#!/bin/bash

LOG_FILE="/var/log/systemd_memory.log"
THRESHOLD=1073741824 # 1GB

while true; do
    MEM_USAGE=$(ps -p 1 -o rss= | awk '{print $1*1024}')
    TIMESTAMP=$(date +"%Y-%m-%d %T")
    
    if [ $MEM_USAGE -gt $THRESHOLD ]; then
        echo "$TIMESTAMP - WARNING: systemd memory usage $MEM_USAGE bytes" >> $LOG_FILE
        systemctl restart systemd-journald
    else
        echo "$TIMESTAMP - OK: $MEM_USAGE bytes" >> $LOG_FILE
    fi
    
    sleep 3600
done

After testing multiple versions, we found these patterns:

  • systemd v219 (CentOS 7 default) shows progressive leaks
  • v230+ includes memory management improvements
  • v245 introduced better session cleanup handling

When monitoring my CentOS 7 web server's resource usage through top, I noticed systemd processes consuming abnormal amounts of memory - nearly 4GB after 18 days of uptime, with a steady increase of ~200MB/day. The systemd, systemd-logind, and dbus-daemon processes were collectively using 10.7% CPU on a quad-core system.

# Sample problematic output from top
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  1 root      20   0 3247784 2.920g   1800 S   3.0 18.9 287:41.35 systemd

Journal logs revealed frequent SSH session creation/destruction cycles from rsync operations:

# Sample from journalctl showing session churn
Feb 14 10:02:13 hostname systemd-logind[737]: New session 6467482 of user tropicg9.
Feb 14 10:02:13 hostname systemd[1]: Started Session 6467482 of user tropicg9.
Feb 14 10:02:13 hostname systemd-logind[737]: Removed session 6467482.

While the CPU usage could be explained by the SSH activity, the memory growth persisted independently. To confirm the leak:

# Track memory usage over time
watch -n 3600 "ps -eo pid,user,%mem,command --sort=-%mem | head -n 5 >> memory_log.txt"

Before implementing permanent solutions, these commands can help manage the issue:

# Reduce journal retention (temporary relief)
journalctl --vacuum-size=100M

# Restart affected services
systemctl restart systemd-journald
systemctl restart systemd-logind

After extensive testing, these configurations resolved the memory leak:

# /etc/systemd/journald.conf
[Journal]
SystemMaxUse=100M
RuntimeMaxUse=100M
MaxRetentionSec=1week

# /etc/systemd/system.conf
[Manager]
DefaultMemoryAccounting=yes
DefaultTasksAccounting=yes

For systems with heavy session churn:

# /etc/systemd/logind.conf
[Login]
RemoveIPC=yes
KillUserProcesses=yes

Create a monitoring script to track systemd memory usage:

#!/bin/bash
# systemd_mem_monitor.sh
while true; do
    DATE=$(date +%Y-%m-%d_%H:%M:%S)
    MEM_USAGE=$(ps -p 1 -o %mem | tail -n 1)
    echo "$DATE - systemd memory usage: $MEM_USAGE%" >> /var/log/systemd_mem.log
    sleep 3600
done

For critical systems where restarting isn't an option, consider cgroup limits:

# Create systemd slice with memory limits
mkdir -p /etc/systemd/system/user.slice.d
cat > /etc/systemd/system/user.slice.d/50-MemoryLimit.conf <

After implementing these changes:

  • Monitor memory usage for at least one full business cycle
  • Consider upgrading to CentOS 8/9 or newer systemd versions if possible
  • Review all custom unit files for potential memory leaks