Debugging Unexplained Disk Space Usage on EC2: When df Shows Higher Usage Than du


4 views

html

Recently I encountered a puzzling situation on an Amazon Linux EC2 instance where df -h reported 6.5GB used while du --max-depth=1 -h / only accounted for 3.6GB. The disk space was slowly but consistently increasing (~1K/min) without any obvious files consuming it.

When facing such discrepancies, these are the areas I typically check:

  • Deleted but still open files: The most likely cause - files deleted while still being written to by processes
  • Disk fragmentation (less common on modern filesystems)
  • Filesystem journal issues
  • Mount point issues or hidden partitions

Running lsof | grep deleted revealed the problem:

httpd     1492   root    4w      REG  202,1 2147483648     5234 /var/log/httpd/access.log (deleted)
httpd     1493   root    4w      REG  202,1 2147483648     5234 /var/log/httpd/access.log (deleted)

Multiple Apache processes were holding open handles to deleted log files, continuing to write data that wouldn't show up in directory listings.

Here are the steps I took to resolve the issue:

1. Identify and Restart Offending Processes

# Find processes with deleted files
sudo lsof +L1

# For Apache specifically
sudo lsof | grep '/var/log/httpd' | grep deleted

# Gracefully restart Apache
sudo service httpd graceful

2. Prevent Future Occurrences

Implement log rotation to properly handle log files:

# /etc/logrotate.d/httpd
/var/log/httpd/*log {
    missingok
    notifempty
    sharedscripts
    delaycompress
    postrotate
        /sbin/service httpd reload > /dev/null 2>/dev/null || true
    endscript
}

3. Alternative: Empty the File Instead of Deleting

For critical log files, consider truncating rather than deleting:

# Instead of rm:
> /var/log/largefile.log

# Or for multiple files:
find /var/log -type f -name "*.log" -exec truncate -s 0 {} \;

Create a simple monitoring script to alert when this occurs:

#!/bin/bash
THRESHOLD=10 # GB
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | tr -d '%')
DELETED_FILES=$(sudo lsof +L1 | wc -l)

if [ $DISK_USAGE -gt $THRESHOLD ] || [ $DELETED_FILES -gt 0 ]; then
    echo "WARNING: Disk issues detected" | mail -s "Disk Alert" admin@example.com
fi

I recently encountered a puzzling situation on an Amazon Linux EC2 instance where df -h reported 6.5GB used while du -sh / only accounted for 3.6GB. The disk usage was steadily increasing by about 1KB per minute, yet I couldn't locate the missing files through normal means.

When facing disk space discrepancies, these are the usual suspects:

# Check for deleted files still held by processes
lsof | grep deleted

# Verify no hidden large files exist
find / -type f -size +100M -exec ls -lh {} \;

# Check for filesystem errors
fsck -n /dev/xvda1

In my case, the issue turned out to be log files that were deleted while still being written to by applications. Here's how to confirm this:

# Find processes holding deleted files
for pid in $(ps -ef | awk '{print $2}'); do 
    ls -l /proc/$pid/fd 2>/dev/null | grep deleted
done

# Alternative method using lsof
lsof +L1 | grep '/.*deleted'

To properly clean up the space, you have several options:

# Option 1: Restart the holding process
systemctl restart your-service

# Option 2: Truncate the file through proc
# First find the file descriptor
ls -l /proc/PID/fd/ | grep deleted
# Then truncate it
: > /proc/PID/fd/FD_NUMBER

# Option 3: Use logrotate to prevent future issues
cat << EOF > /etc/logrotate.d/yourapp
/var/log/yourapp/*.log {
    daily
    rotate 7
    compress
    missingok
    notifempty
    copytruncate
}
EOF

To avoid similar issues in the future:

  • Implement proper log rotation for all services
  • Monitor disk space with tools like ncdu or df
  • Set up alerts when disk usage exceeds thresholds
  • Consider using separate partitions for logs and temp files

For more comprehensive analysis, try these utilities:

# Install and use ncdu for interactive analysis
yum install -y ncdu
ncdu /

# Check for large directories
du -h --max-depth=1 / | sort -h

# Verify inode usage
df -i