How to Identify and Clean Up Space-Hogging Files on a Linux Web Server: A Sysadmin’s Guide


4 views

When your Linux server's disk space keeps mysteriously disappearing despite cleaning obvious targets, it's time for some advanced forensic analysis. Let me walk through my battle-tested approach.

Start with the classic disk usage command:

df -h

This shows all mounted filesystems and their usage. Then drill down with:

du -sh /*

This reveals which top-level directories consume the most space.

The nuclear option is ncdu (NCurses Disk Usage):

sudo apt install ncdu   # Debian/Ubuntu
sudo yum install ncdu   # CentOS/RHEL
ncdu /

This interactive tool lets you navigate through directories and sort by size. Pro tip: run it with -x to avoid crossing filesystem boundaries:

ncdu -x /

Common culprits include:

  • Log files: sudo du -sh /var/log
  • Temporary files: sudo du -sh /tmp
  • Docker containers: docker system df
  • LVM snapshots: lvdisplay

For large files modified recently:

find / -type f -size +100M -mtime -30 -exec ls -lh {} \;

Find and delete old log files:

find /var/log -type f -name "*.log" -mtime +30 -delete

Recently I found a server where df showed 90% usage but du only accounted for 50%. The culprit? Deleted files still held by running processes:

lsof | grep deleted

The solution was to restart the affected services.

Set up monitoring with tools like:

  • logrotate for log management
  • cron jobs to clean temp files
  • Filesystem quotas

Example cron job to clean /tmp weekly:

0 3 * * 0 root find /tmp -type f -atime +7 -delete

When your Linux web server's storage keeps mysteriously filling up despite removing obvious culprits like core dumps and old backups, it's time for some advanced forensic analysis. Here's how I systematically track down storage anomalies on production servers.

Start with the standard df -h to confirm which partition is filling up:


$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        50G   47G  1.2G  98% /

My favorite tool for this job is ncdu (NCurses Disk Usage). Install it with:


sudo apt install ncdu  # Debian/Ubuntu
sudo yum install ncdu  # CentOS/RHEL

Then run it on the root filesystem:


sudo ncdu / --exclude /mnt

The exclude flag prevents scanning mounted network shares. Navigate with arrow keys - it visually shows directory sizes in descending order.

For servers without ncdu, this classic combination works:


sudo du -h --max-depth=1 / | sort -h -r

This shows all top-level directories sorted by size. Drill down into suspicious ones by adjusting max-depth.

Some often-overlooked space consumers:


# Check for large deleted files still held by processes
sudo lsof | grep deleted

# Audit systemd journal logs
journalctl --disk-usage
sudo journalctl --vacuum-size=100M

# Check Docker disk usage
docker system df

Web server logs can silently consume gigabytes. Check with:


# For Apache
sudo du -sh /var/log/apache2
sudo ls -lah /var/log/apache2/*.log

# For Nginx
sudo du -sh /var/log/nginx
sudo ls -lah /var/log/nginx/*.log

MySQL/MariaDB can accumulate huge binary logs and temporary files:


# Check MySQL data directory
sudo du -sh /var/lib/mysql

# Check binary logs
sudo mysql -e "SHOW BINARY LOGS;"
sudo mysql -e "PURGE BINARY LOGS BEFORE NOW() - INTERVAL 7 DAY;"

For recurring issues, set up a cron job to alert when disk usage crosses thresholds:


#!/bin/bash
THRESHOLD=80
CURRENT=$(df / --output=pcent | tail -1 | tr -d '% ')

if [ "$CURRENT" -gt "$THRESHOLD" ]; then
  echo "WARNING: Disk usage at ${CURRENT}%" | mail -s "Disk Alert" admin@example.com
fi