When working with large directory structures, finding files exceeding a certain size threshold becomes essential for disk space management. While the du
command doesn't have a direct filter option, we can combine it with other tools for effective results.
# Basic syntax to find files >100MB in current directory
du -ah | awk '$1 > 100000'
For more precise control, we can combine find
with du
:
# Find files >500MB and show their sizes
find /path/to/dir -type f -size +500M -exec du -h {} \;
To get a hierarchical view of directory sizes while filtering by threshold:
# Show directories >1GB with human-readable format
du -h --threshold=1G | sort -hr
Here's a real-world example to find large log files:
# Find log files >100MB in /var/log
find /var/log -name "*.log" -size +100M -exec du -h {} \; | sort -rh
For generating comprehensive reports of large files:
# Generate report of files >50MB sorted by size
du -ah / | awk '$1 > 50000' | sort -nr > large_files_report.txt
Consider using ncdu
for interactive exploration:
# Install and run ncdu
sudo apt install ncdu
ncdu /path/to/scan
The du
command doesn't directly support filtering by file size, but we can combine it with other commands to achieve this. Here's the most effective approach:
# Find files larger than 100MB in current directory and subdirectories
find . -type f -size +100M -exec du -h {} + | sort -rh
# Alternative using xargs (faster for many files)
find . -type f -size +100M -print0 | xargs -0 du -h | sort -rh
For visualizing directory sizes with variable depth, consider these alternatives:
# Basic recursive size listing
du -h --max-depth=1
# Tree-like output with sizes (requires tree command)
tree -h --du
# Graphical representation (requires ncdu)
ncdu
Combine filters for more precise results:
# Find .log files > 50MB modified in last 30 days
find /var/log -name "*.log" -size +50M -mtime -30 -exec du -h {} +
# Exclude specific directories
find /path -type f -size +1G -not -path "./exclude_dir/*" -exec du -h {} +
Customize your output for better readability:
# CSV output with full paths
find . -type f -size +500M -exec du -b {} + | awk -F'\t' '{print $2 "," $1}' > large_files.csv
# JSON output (requires jq)
find . -type f -size +10M -exec du -b {} + | awk '{print "{\"size\": "$1", \"path\": \""$2"\"}"}' | jq -s .