When your root partition (/dev/sda1
) suddenly reports 100% usage (220G/220G) but then magically recovers to 5% (9.3G/220G) without manual intervention, you're dealing with one of Linux's more puzzling storage mysteries. Let's break down the diagnostic approach.
When space disappears then reappears, start with these commands in sequence:
# 1. Check mounted filesystems
df -hT --exclude-type=tmpfs --exclude-type=devtmpfs
# 2. Find large directories (run as root)
sudo du -h --max-depth=1 / | sort -h -r
# 3. Check for deleted-but-open files
sudo lsof +L1 | grep deleted
# 4. Monitor changes in real-time
watch -n 5 "df -h /; echo; du -h --max-depth=1 / | sort -h -r"
Based on your output, these are likely culprits:
- Log files explosion: Check
/var/log
withjournalctl --disk-usage
- Docker/container storage: Verify with
docker system df
- Temporary mounts: The
/var/lib/ureadahead/debugfs
mirroring root suggests a mounting artifact
Create this emergency cleanup script (disk_emergency.sh
):
#!/bin/bash
# Rotate and compress large logs
sudo logrotate -f /etc/logrotate.conf
sudo journalctl --vacuum-size=200M
# Clear package manager cache
sudo apt-get clean
sudo apt-get autoclean
# Remove old kernels (Ubuntu)
sudo apt-get purge $(dpkg -l | awk '/^ii linux-image-*/{print $2}' | grep -v $(uname -r))
# Docker cleanup
command -v docker && docker system prune -af
# Find and list large files
echo "Top 10 space consumers:"
sudo find / -type f -size +100M -exec ls -lh {} + 2>/dev/null | sort -k5 -h -r | head -10
Set up inotify to track changes:
# Install inotify-tools
sudo apt install inotify-tools
# Monitor root directory changes
inotifywait -m -r --format '%w%f %e' --exclude '^/proc|^/sys' / | while read file action
do
if [[ "$action" == *"DELETE"* || "$action" == *"CREATE"* ]]; then
echo "$(date) - $action - $file" >> /var/log/disk_changes.log
fi
done
For ext4 filesystems (common on Ubuntu):
# Check for filesystem errors
sudo fsck -nf /dev/sda1
# View reserved blocks (typically 5%)
sudo tune2fs -l /dev/sda1 | grep -i 'block count'
# Check for mounted snapshots
sudo cat /proc/mounts | grep sda1
The spontaneous recovery suggests:
- A crashed process holding file descriptors released them
- A temporary mount (possibly debugfs) was unmounted
- Log rotation or systemd journal cleanup executed
Add to your /etc/crontab
:
0 * * * * root df -h / > /var/log/disk_usage.log
30 * * * * root /usr/bin/du -h --max-depth=1 / | sort -h -r >> /var/log/disk_usage.log
Configure systemd journal limits in /etc/systemd/journald.conf
:
[Journal]
SystemMaxUse=500M
RuntimeMaxUse=100M
When your Linux server suddenly reports 100% disk usage (220G/220G) then mysteriously drops to 5% (9.3G/220G) without file deletions, you're dealing with one of these classic scenarios:
# Real-time monitoring command I now keep running:
watch -n 5 "df -h; echo; du -sh /* | sort -rh | head -n 10"
Based on your du
output showing only 3.3G usage versus df
reporting 43G, these are likely suspects:
# Investigate open deleted files (common with log rotation)
lsof +L1 | grep -i deleted
# Check for mount point leaks
findmnt --verify
# Audit tmpfs usage
sudo du -sh /tmp /var/tmp /dev/shm
Your case shows classic signs of unflushed log files held by running processes after rotation. Try this forensic approach:
# Identify which services hold deleted logs
sudo ls -l /proc/*/fd | grep deleted
# Example output you might see:
/proc/1234/fd/4 -> /var/log/nginx/access.log.1 (deleted)
# Force log rotation and flush
sudo systemctl restart rsyslog
sudo journalctl --vacuum-size=100M
Create a cron job with this monitoring script (/usr/local/bin/disk_guardian.sh
):
#!/bin/bash
THRESHOLD=90
CURRENT=$(df --output=pcent / | tail -1 | tr -dc '0-9')
if [ "$CURRENT" -ge "$THRESHOLD" ]; then
logger "DiskGuardian: Cleaning triggered at ${CURRENT}% usage"
# Rotate and compress logs
logrotate -f /etc/logrotate.conf
find /var/log -type f -name "*.log" -exec truncate -s 0 {} \;
# Clear package manager cache
apt-get clean || yum clean all
# Clear tmp directories
find /tmp -type f -mtime +1 -delete
find /var/tmp -type f -mtime +1 -delete
# Optional: Restart services holding deleted logs
systemctl restart nginx php-fpm mysql
fi
When the issue recurs, use these nuclear options:
# 1. ncdu (NCurses Disk Usage)
sudo apt install ncdu
ncdu -x /
# 2. Track file creations in real-time
sudo apt install inotify-tools
inotifywait -m -r --format '%w%f' /var | tee file_creations.log
# 3. Check for rogue containers/docker
docker system df
podman ps -a --size
Add these to your /etc/sysctl.conf
:
# Prevent cached memory from consuming disk
vm.vfs_cache_pressure = 50
vm.swappiness = 10
# Limit tmpfs sizes (add to /etc/fstab)
tmpfs /tmp tmpfs size=512M,nr_inodes=10k,mode=1777 0 0