When dealing with storage issues on Linux systems, identifying large files and directories is a common troubleshooting step. This article provides practical methods to recursively scan your filesystem and pinpoint space-consuming items.
The du
(disk usage) command is the most straightforward tool for this task. Here's a powerful combination:
du -ahx --max-depth=1 /path/to/directory | sort -rh | head -n 20
This command:
- Shows human-readable sizes (-h)
- Includes files (-a)
- Prevents crossing filesystem boundaries (-x)
- Limits to one directory level (--max-depth=1)
- Sorts by size in reverse order
- Displays top 20 results
For interactive exploration, install ncdu
(NCurses Disk Usage):
sudo apt install ncdu # Debian/Ubuntu
ncdu /path/to/scan
Key features:
- Interactive navigation with arrow keys
- Percentage-based visualization
- Option to delete files directly
- Fast scanning with progress indicator
To specifically target large files (e.g., >100MB):
find /path/to/search -type f -size +100M -exec ls -lh {} + | \
awk '{ print $5 ": " $9 }' | sort -hr
Here's a bash script that provides a hierarchical view:
#!/bin/bash
depth=${1:-3}
top=${2:-10}
du -ak /path/to/scan | sort -nr | \
awk -v depth=$depth -v top=$top '
BEGIN { prev_size=0; prev_name=""; indent=0 }
{
curr_size=$1;
curr_name=$2;
split(curr_name,path,"/");
curr_depth=length(path)-1;
if (curr_depth <= depth) {
if (curr_size != prev_size) {
rank++;
}
if (rank <= top) {
printf "%"indent"s", "";
printf "%s %s\n", curr_size/1024"MB", curr_name;
prev_size=curr_size;
prev_name=curr_name;
indent=curr_depth*4;
}
}
}'
To find space consumed by particular file extensions:
find /var/log -name "*.log" -exec du -ch {} + | grep total$
For a modern alternative with Go implementation:
# Install
go install github.com/dundee/gdu/v4/cmd/gdu@latest
# Usage
gdu --show-disks /
When managing server storage or debugging disk space issues, identifying space-consuming files and directories is crucial. Modern systems often contain millions of files, making manual inspection impractical.
The most efficient approach combines GNU coreutils with sorting:
du -ah /path/to/directory | sort -rh | head -n 20
Breakdown:
du
estimates file space usage
-a
shows all files (not just directories)
-h
human-readable format
sort -rh
sorts human-readable numbers in reverse order
For more control over file types and modification times:
find /path -type f -exec du -h {} + 2>/dev/null | sort -rh | head -n 50
For GUI-oriented users, consider these tools:
# NCurses-based
sudo apt install ncdu
ncdu /path/to/scan
# Graphical alternative
sudo apt install baobab
baobab
To examine directory structures at specific levels:
du -h --max-depth=3 / | sort -rh | head -n 15
For production environments:
# Real-time monitoring
inotifywait -m -r /path -e create,delete,modify,move
# Periodic reporting
#!/bin/bash
REPORT_FILE="/var/log/disk_usage_$(date +%Y%m%d).log"
du -ah / 2>/dev/null | sort -rh | head -100 > "$REPORT_FILE"
For distributed systems:
# Parallel processing with GNU parallel
find / -type f -print0 | parallel -0 du -h | sort -rh | head -n 50
- IO-intensive operations may impact production systems
- Consider running during off-peak hours
- For large filesystems, sample a subset first