html
Recently I encountered a puzzling situation on an Amazon Linux EC2 instance where df -h
reported 6.5GB used while du --max-depth=1 -h /
only accounted for 3.6GB. The disk space was slowly but consistently increasing (~1K/min) without any obvious files consuming it.
When facing such discrepancies, these are the areas I typically check:
- Deleted but still open files: The most likely cause - files deleted while still being written to by processes
- Disk fragmentation (less common on modern filesystems)
- Filesystem journal issues
- Mount point issues or hidden partitions
Running lsof | grep deleted
revealed the problem:
httpd 1492 root 4w REG 202,1 2147483648 5234 /var/log/httpd/access.log (deleted) httpd 1493 root 4w REG 202,1 2147483648 5234 /var/log/httpd/access.log (deleted)
Multiple Apache processes were holding open handles to deleted log files, continuing to write data that wouldn't show up in directory listings.
Here are the steps I took to resolve the issue:
1. Identify and Restart Offending Processes
# Find processes with deleted files sudo lsof +L1 # For Apache specifically sudo lsof | grep '/var/log/httpd' | grep deleted # Gracefully restart Apache sudo service httpd graceful
2. Prevent Future Occurrences
Implement log rotation to properly handle log files:
# /etc/logrotate.d/httpd /var/log/httpd/*log { missingok notifempty sharedscripts delaycompress postrotate /sbin/service httpd reload > /dev/null 2>/dev/null || true endscript }
3. Alternative: Empty the File Instead of Deleting
For critical log files, consider truncating rather than deleting:
# Instead of rm: > /var/log/largefile.log # Or for multiple files: find /var/log -type f -name "*.log" -exec truncate -s 0 {} \;
Create a simple monitoring script to alert when this occurs:
#!/bin/bash THRESHOLD=10 # GB DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | tr -d '%') DELETED_FILES=$(sudo lsof +L1 | wc -l) if [ $DISK_USAGE -gt $THRESHOLD ] || [ $DELETED_FILES -gt 0 ]; then echo "WARNING: Disk issues detected" | mail -s "Disk Alert" admin@example.com fi
I recently encountered a puzzling situation on an Amazon Linux EC2 instance where df -h
reported 6.5GB used while du -sh /
only accounted for 3.6GB. The disk usage was steadily increasing by about 1KB per minute, yet I couldn't locate the missing files through normal means.
When facing disk space discrepancies, these are the usual suspects:
# Check for deleted files still held by processes lsof | grep deleted # Verify no hidden large files exist find / -type f -size +100M -exec ls -lh {} \; # Check for filesystem errors fsck -n /dev/xvda1
In my case, the issue turned out to be log files that were deleted while still being written to by applications. Here's how to confirm this:
# Find processes holding deleted files for pid in $(ps -ef | awk '{print $2}'); do ls -l /proc/$pid/fd 2>/dev/null | grep deleted done # Alternative method using lsof lsof +L1 | grep '/.*deleted'
To properly clean up the space, you have several options:
# Option 1: Restart the holding process systemctl restart your-service # Option 2: Truncate the file through proc # First find the file descriptor ls -l /proc/PID/fd/ | grep deleted # Then truncate it : > /proc/PID/fd/FD_NUMBER # Option 3: Use logrotate to prevent future issues cat << EOF > /etc/logrotate.d/yourapp /var/log/yourapp/*.log { daily rotate 7 compress missingok notifempty copytruncate } EOF
To avoid similar issues in the future:
- Implement proper log rotation for all services
- Monitor disk space with tools like
ncdu
ordf
- Set up alerts when disk usage exceeds thresholds
- Consider using separate partitions for logs and temp files
For more comprehensive analysis, try these utilities:
# Install and use ncdu for interactive analysis yum install -y ncdu ncdu / # Check for large directories du -h --max-depth=1 / | sort -h # Verify inode usage df -i