Investigating Disk Space Discrepancy: When df and du Show Different Results on Linux Servers with Docker


2 views

Recently while managing a Kubernetes worker node running Ubuntu 18.04, I encountered a puzzling situation where df reported 85% disk usage (1.5TB used), but running du only accounted for about 313GB. Here's how I uncovered the truth behind this discrepancy.

The system reported disk pressure through Kubernetes killing pods, which prompted me to check disk usage:

df -h --total
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       1.8T  1.5T  276G  85% /

Yet when running:

sudo du -BG -s /*
313G    /var

This apparent contradiction actually reveals important system behavior:

  • df reports filesystem-level statistics (including space used by deleted files still held open)
  • du measures actual file sizes in directories

The real culprit emerged when checking for deleted but still referenced files:

sudo lsof +L1
COMMAND    PID     USER   FD   TYPE DEVICE SIZE/OFF NLINK    NODE NAME
dockerd   1234     root   10u   REG   8,2 104857600     0 1234567 /var/lib/docker/overlay2/... (deleted)

This showed Docker containers holding onto large deleted files.

In containerized environments, several scenarios can lead to this:

  1. Container logs growing unchecked
  2. Applications writing to container filesystems
  3. Build cache accumulating
  4. Volumes not being properly cleaned

Here are actionable commands to reclaim space:

Clean Docker system:

docker system prune -a --volumes

Find large files in /var/lib/docker:

sudo du -h /var/lib/docker | sort -h | tail -n 20

Check container log sizes:

sudo find /var/lib/docker/containers -name "*.log" -exec ls -lh {} \;

To avoid recurrence:

# Set Docker log rotation in daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Regular maintenance script:

#!/bin/bash
# Weekly cleanup
docker system prune -f
journalctl --vacuum-size=100M
find /var/log -type f -name "*.log" -exec truncate -s 0 {} \;

For more comprehensive analysis:

sudo ncdu /

Or for large files:

sudo find / -type f -size +100M -exec ls -lh {} \;

For K8s worker nodes, add these to your monitoring:

kubectl get nodes -o jsonpath='{.items[*].status.conditions[?(@.type=="DiskPressure")]}'

When your Kubernetes node starts evicting pods due to disk pressure, but du can't account for all used space, you're facing a classic Linux storage mystery. The root cause often lies in one of these scenarios:

# Quick diagnostic commands
df -h              # Shows filesystem usage
du -sh /*          # Summarizes directory sizes
lsof +L1           # Lists deleted but held-open files
docker system df   # Checks Docker disk usage

In containerized environments, Docker/containerd frequently consumes space in ways du won't show:

# Check Docker disk usage
docker system df
docker ps -s       # Shows container sizes
docker images      # Shows image sizes

# Common cleanup commands
docker system prune -a --volumes
docker image prune --all --filter "until=24h"

Processes holding references to deleted files create "ghost" storage usage:

# Find processes with deleted files
lsof | grep deleted

# Alternative approach using /proc
find /proc/*/fd -ls | grep '(deleted)'

# To free space, either:
# 1. Restart the holding process, or
# 2. Truncate the file:
> /proc/[PID]/fd/[FD]

For thorough analysis, try these methods:

# NCurses interactive disk usage analyzer
ncdu /

# Find large directories (adjust depth as needed)
find / -type d -size +100M -exec du -sh {} + 2>/dev/null

# Check for mount points hiding space
mount | grep -v '^/dev'
lsblk

For K8s worker nodes, implement these preventative measures:

# Set up automatic garbage collection in kubelet
--eviction-hard=memory.available<1Gi,nodefs.available<10%
--image-gc-high-threshold=85
--image-gc-low-threshold=80

# Example cron job for node cleanup
0 * * * * /usr/bin/docker system prune -f --filter "until=24h"