html
Recently I encountered a puzzling situation on an Amazon Linux EC2 instance where df -h reported 6.5GB used while du --max-depth=1 -h / only accounted for 3.6GB. The disk space was slowly but consistently increasing (~1K/min) without any obvious files consuming it.
When facing such discrepancies, these are the areas I typically check:
- Deleted but still open files: The most likely cause - files deleted while still being written to by processes
- Disk fragmentation (less common on modern filesystems)
- Filesystem journal issues
- Mount point issues or hidden partitions
Running lsof | grep deleted revealed the problem:
httpd 1492 root 4w REG 202,1 2147483648 5234 /var/log/httpd/access.log (deleted) httpd 1493 root 4w REG 202,1 2147483648 5234 /var/log/httpd/access.log (deleted)
Multiple Apache processes were holding open handles to deleted log files, continuing to write data that wouldn't show up in directory listings.
Here are the steps I took to resolve the issue:
1. Identify and Restart Offending Processes
# Find processes with deleted files sudo lsof +L1 # For Apache specifically sudo lsof | grep '/var/log/httpd' | grep deleted # Gracefully restart Apache sudo service httpd graceful
2. Prevent Future Occurrences
Implement log rotation to properly handle log files:
# /etc/logrotate.d/httpd
/var/log/httpd/*log {
missingok
notifempty
sharedscripts
delaycompress
postrotate
/sbin/service httpd reload > /dev/null 2>/dev/null || true
endscript
}
3. Alternative: Empty the File Instead of Deleting
For critical log files, consider truncating rather than deleting:
# Instead of rm:
> /var/log/largefile.log
# Or for multiple files:
find /var/log -type f -name "*.log" -exec truncate -s 0 {} \;
Create a simple monitoring script to alert when this occurs:
#!/bin/bash
THRESHOLD=10 # GB
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | tr -d '%')
DELETED_FILES=$(sudo lsof +L1 | wc -l)
if [ $DISK_USAGE -gt $THRESHOLD ] || [ $DELETED_FILES -gt 0 ]; then
echo "WARNING: Disk issues detected" | mail -s "Disk Alert" admin@example.com
fi
I recently encountered a puzzling situation on an Amazon Linux EC2 instance where df -h reported 6.5GB used while du -sh / only accounted for 3.6GB. The disk usage was steadily increasing by about 1KB per minute, yet I couldn't locate the missing files through normal means.
When facing disk space discrepancies, these are the usual suspects:
# Check for deleted files still held by processes
lsof | grep deleted
# Verify no hidden large files exist
find / -type f -size +100M -exec ls -lh {} \;
# Check for filesystem errors
fsck -n /dev/xvda1
In my case, the issue turned out to be log files that were deleted while still being written to by applications. Here's how to confirm this:
# Find processes holding deleted files
for pid in $(ps -ef | awk '{print $2}'); do
ls -l /proc/$pid/fd 2>/dev/null | grep deleted
done
# Alternative method using lsof
lsof +L1 | grep '/.*deleted'
To properly clean up the space, you have several options:
# Option 1: Restart the holding process
systemctl restart your-service
# Option 2: Truncate the file through proc
# First find the file descriptor
ls -l /proc/PID/fd/ | grep deleted
# Then truncate it
: > /proc/PID/fd/FD_NUMBER
# Option 3: Use logrotate to prevent future issues
cat << EOF > /etc/logrotate.d/yourapp
/var/log/yourapp/*.log {
daily
rotate 7
compress
missingok
notifempty
copytruncate
}
EOF
To avoid similar issues in the future:
- Implement proper log rotation for all services
- Monitor disk space with tools like
ncduordf - Set up alerts when disk usage exceeds thresholds
- Consider using separate partitions for logs and temp files
For more comprehensive analysis, try these utilities:
# Install and use ncdu for interactive analysis yum install -y ncdu ncdu / # Check for large directories du -h --max-depth=1 / | sort -h # Verify inode usage df -i