Recently while managing a Kubernetes worker node running Ubuntu 18.04, I encountered a puzzling situation where df
reported 85% disk usage (1.5TB used), but running du
only accounted for about 313GB. Here's how I uncovered the truth behind this discrepancy.
The system reported disk pressure through Kubernetes killing pods, which prompted me to check disk usage:
df -h --total
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 1.8T 1.5T 276G 85% /
Yet when running:
sudo du -BG -s /*
313G /var
This apparent contradiction actually reveals important system behavior:
df
reports filesystem-level statistics (including space used by deleted files still held open)du
measures actual file sizes in directories
The real culprit emerged when checking for deleted but still referenced files:
sudo lsof +L1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
dockerd 1234 root 10u REG 8,2 104857600 0 1234567 /var/lib/docker/overlay2/... (deleted)
This showed Docker containers holding onto large deleted files.
In containerized environments, several scenarios can lead to this:
- Container logs growing unchecked
- Applications writing to container filesystems
- Build cache accumulating
- Volumes not being properly cleaned
Here are actionable commands to reclaim space:
Clean Docker system:
docker system prune -a --volumes
Find large files in /var/lib/docker:
sudo du -h /var/lib/docker | sort -h | tail -n 20
Check container log sizes:
sudo find /var/lib/docker/containers -name "*.log" -exec ls -lh {} \;
To avoid recurrence:
# Set Docker log rotation in daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
Regular maintenance script:
#!/bin/bash
# Weekly cleanup
docker system prune -f
journalctl --vacuum-size=100M
find /var/log -type f -name "*.log" -exec truncate -s 0 {} \;
For more comprehensive analysis:
sudo ncdu /
Or for large files:
sudo find / -type f -size +100M -exec ls -lh {} \;
For K8s worker nodes, add these to your monitoring:
kubectl get nodes -o jsonpath='{.items[*].status.conditions[?(@.type=="DiskPressure")]}'
When your Kubernetes node starts evicting pods due to disk pressure, but du
can't account for all used space, you're facing a classic Linux storage mystery. The root cause often lies in one of these scenarios:
# Quick diagnostic commands
df -h # Shows filesystem usage
du -sh /* # Summarizes directory sizes
lsof +L1 # Lists deleted but held-open files
docker system df # Checks Docker disk usage
In containerized environments, Docker/containerd frequently consumes space in ways du
won't show:
# Check Docker disk usage
docker system df
docker ps -s # Shows container sizes
docker images # Shows image sizes
# Common cleanup commands
docker system prune -a --volumes
docker image prune --all --filter "until=24h"
Processes holding references to deleted files create "ghost" storage usage:
# Find processes with deleted files
lsof | grep deleted
# Alternative approach using /proc
find /proc/*/fd -ls | grep '(deleted)'
# To free space, either:
# 1. Restart the holding process, or
# 2. Truncate the file:
> /proc/[PID]/fd/[FD]
For thorough analysis, try these methods:
# NCurses interactive disk usage analyzer
ncdu /
# Find large directories (adjust depth as needed)
find / -type d -size +100M -exec du -sh {} + 2>/dev/null
# Check for mount points hiding space
mount | grep -v '^/dev'
lsblk
For K8s worker nodes, implement these preventative measures:
# Set up automatic garbage collection in kubelet
--eviction-hard=memory.available<1Gi,nodefs.available<10%
--image-gc-high-threshold=85
--image-gc-low-threshold=80
# Example cron job for node cleanup
0 * * * * /usr/bin/docker system prune -f --filter "until=24h"