When dealing with massive directories containing thousands of Ruby files, a simple grep -R "pattern" *.rb
can become painfully slow with no visibility into progress. Traditional grep gives no indication of how much work remains or which files are being processed.
The pv
command (pipe viewer) is perfect for this scenario as it can monitor data through a pipeline. The trick is to structure our command to give pv meaningful progress information.
find . -name "*.rb" -print0 | pv -0 | xargs -0 grep "search_string"
Here's what each part does:
find . -name "*.rb" -print0
- Recursively finds all Ruby files with null terminationpv -0
- Counts null-terminated items with progress barxargs -0 grep
- Safely passes filenames to grep
For even better monitoring:
find . -name "*.rb" -print0 | pv -0 -s $(find . -name "*.rb" | wc -l) | \
xargs -0 grep -l "search_string" | pv -l > results.txt
This version:
- Uses
-s
to show total file count - Adds a second pv to count matching files
- Outputs to results.txt
If you need per-line progress (for huge files):
find . -name "*.rb" -exec cat {} + | pv | grep "search_string"
Note: This shows bytes processed rather than files.
While pv adds minimal overhead, these optimizations help:
- Use
-H
flag with grep to show filenames - Consider
--color=always
for better output - For extremely large datasets, combine with
parallel
Searching a Rails application's 12,000 Ruby files:
find /path/to/rails_app -name "*.rb" -print0 | \
pv -0 -s $(find /path/to/rails_app -name "*.rb" | wc -l) | \
xargs -0 grep -nH "before_action"
This gives:
- Total file count
- Progress percentage
- ETA calculation
- Line numbers in matches
When dealing with massive codebases, running recursive grep operations can feel like staring at a blank screen waiting for who-knows-how-long. A typical search command like:
grep -R "search_pattern" *.rb
works fine for small projects, but becomes painfully opaque when scanning thousands of Ruby files across deep directory structures.
The pv
command (pipe viewer) provides exactly what we need - real-time progress tracking through pipes. Here's how to combine it effectively with grep:
find . -name "*.rb" | pv | xargs grep "search_pattern"
This approach gives us three key benefits:
- Visual progress bar showing files processed
- Transfer rate statistics
- Estimated time remaining
The basic version above has some limitations. Here's a more robust implementation:
find . -type f -name "*.rb" -print0 | \
pv -0 -s $(find . -type f -name "*.rb" -printf "\%s\n" | awk '{sum += $1} END {print sum}') | \
xargs -0 grep --color=auto "search_pattern"
Key improvements:
-print0
and-0
handle filenames with spaces-s
provides accurate total size for progress calculation--color=auto
maintains grep's highlighting
For truly massive searches, consider these optimizations:
find . -type f -name "*.rb" -print0 | \
pv -0 -s $(find . -type f -name "*.rb" | wc -l) | \
xargs -0 -P 4 grep "search_pattern"
The -P 4
flag enables parallel processing across 4 CPU cores, significantly speeding up large searches while still maintaining progress visibility.
For those who prefer modern alternatives, ripgrep offers built-in progress reporting:
rg --stats --progress "search_pattern" -g "*.rb"
While not using pv, this provides similar functionality with excellent performance characteristics.