Debugging Unexplained Disk Space Usage on EC2: When df Shows Higher Usage Than du

html

Recently I encountered a puzzling situation on an Amazon Linux EC2 instance where df -h reported 6.5GB used while du --max-depth=1 -h / only accounted for 3.6GB. The disk space was slowly but consistently increasing (~1K/min) without any obvious files consuming it.

When facing such discrepancies, these are the areas I typically check:

Deleted but still open files: The most likely cause - files deleted while still being written to by processes
Disk fragmentation (less common on modern filesystems)
Filesystem journal issues
Mount point issues or hidden partitions

Running lsof | grep deleted revealed the problem:

httpd     1492   root    4w      REG  202,1 2147483648     5234 /var/log/httpd/access.log (deleted)
httpd     1493   root    4w      REG  202,1 2147483648     5234 /var/log/httpd/access.log (deleted)

Multiple Apache processes were holding open handles to deleted log files, continuing to write data that wouldn't show up in directory listings.

Here are the steps I took to resolve the issue:

1. Identify and Restart Offending Processes

# Find processes with deleted files
sudo lsof +L1

# For Apache specifically
sudo lsof | grep '/var/log/httpd' | grep deleted

# Gracefully restart Apache
sudo service httpd graceful

2. Prevent Future Occurrences

Implement log rotation to properly handle log files:

# /etc/logrotate.d/httpd
/var/log/httpd/*log {
    missingok
    notifempty
    sharedscripts
    delaycompress
    postrotate
        /sbin/service httpd reload > /dev/null 2>/dev/null || true
    endscript
}

3. Alternative: Empty the File Instead of Deleting

For critical log files, consider truncating rather than deleting:

# Instead of rm:
> /var/log/largefile.log

# Or for multiple files:
find /var/log -type f -name "*.log" -exec truncate -s 0 {} \;

Create a simple monitoring script to alert when this occurs:

#!/bin/bash
THRESHOLD=10 # GB
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | tr -d '%')
DELETED_FILES=$(sudo lsof +L1 | wc -l)

if [ $DISK_USAGE -gt $THRESHOLD ] || [ $DELETED_FILES -gt 0 ]; then
    echo "WARNING: Disk issues detected" | mail -s "Disk Alert" admin@example.com
fi

I recently encountered a puzzling situation on an Amazon Linux EC2 instance where df -h reported 6.5GB used while du -sh / only accounted for 3.6GB. The disk usage was steadily increasing by about 1KB per minute, yet I couldn't locate the missing files through normal means.

When facing disk space discrepancies, these are the usual suspects:

# Check for deleted files still held by processes
lsof | grep deleted

# Verify no hidden large files exist
find / -type f -size +100M -exec ls -lh {} \;

# Check for filesystem errors
fsck -n /dev/xvda1

In my case, the issue turned out to be log files that were deleted while still being written to by applications. Here's how to confirm this:

# Find processes holding deleted files
for pid in $(ps -ef | awk '{print $2}'); do 
    ls -l /proc/$pid/fd 2>/dev/null | grep deleted
done

# Alternative method using lsof
lsof +L1 | grep '/.*deleted'

To properly clean up the space, you have several options:

# Option 1: Restart the holding process
systemctl restart your-service

# Option 2: Truncate the file through proc
# First find the file descriptor
ls -l /proc/PID/fd/ | grep deleted
# Then truncate it
: > /proc/PID/fd/FD_NUMBER

# Option 3: Use logrotate to prevent future issues
cat << EOF > /etc/logrotate.d/yourapp
/var/log/yourapp/*.log {
    daily
    rotate 7
    compress
    missingok
    notifempty
    copytruncate
}
EOF

To avoid similar issues in the future:

Implement proper log rotation for all services
Monitor disk space with tools like ncdu or df
Set up alerts when disk usage exceeds thresholds
Consider using separate partitions for logs and temp files

For more comprehensive analysis, try these utilities:

# Install and use ncdu for interactive analysis
yum install -y ncdu
ncdu /

# Check for large directories
du -h --max-depth=1 / | sort -h

# Verify inode usage
df -i

ServerDevWorker

Debugging Unexplained Disk Space Usage on EC2: When df Shows Higher Usage Than du

1. Identify and Restart Offending Processes

2. Prevent Future Occurrences

3. Alternative: Empty the File Instead of Deleting

Related Articles