How to Display Progress Information When Using Linux find Command on Large Filesystems


11 views

When working with large storage systems or performing recursive searches across deep directory structures, the standard find command operates silently without any progress feedback. This becomes particularly problematic when:

  • Scanning multi-terabyte storage arrays
  • Processing directories with millions of files
  • Running complex find operations with multiple conditions

While find doesn't have built-in progress reporting, we can leverage its existing features:


# Basic directory counter
find /path/to/search -type d -exec sh -c 'echo "Processing: $1" >&2' sh {} \;

# File count with percentage estimation
total=$(find /path -type f | wc -l)
find /path -type f -exec sh -c 'count=$((count+1)); printf "Progress: %.1f%%\r" "$(echo "scale=1; $count*100/$total" | bc)"' sh {} \;

The pv (pipe viewer) tool can help visualize progress:


# Install pv if needed
sudo apt-get install pv  # Debian/Ubuntu
sudo yum install pv      # RHEL/CentOS

# Usage example
find /large/filesystem -type f | pv -l -s $(find /large/filesystem -type f | wc -l) > /dev/null

Here's a more comprehensive bash script that provides directory-level progress:


#!/bin/bash

SEARCH_PATH="$1"
TOTAL_DIRS=$(find "$SEARCH_PATH" -type d | wc -l)
CURRENT_DIR=0

find "$SEARCH_PATH" -type d -print0 | while IFS= read -r -d '' dir; do
    CURRENT_DIR=$((CURRENT_DIR + 1))
    PERCENT=$((100 * CURRENT_DIR / TOTAL_DIRS))
    printf "Scanning: %-50s [%3d%%]\r" "$dir" "$PERCENT"
    
    # Your actual find operations here
    find "$dir" -maxdepth 1 -type f -name "*.log" -exec ls -lh {} \;
done
echo ""

Consider these alternatives when progress tracking is essential:


# fd (fd-find)
fd --hidden --follow --progress "pattern" /search/path

# ripgrep (rg)
rg --files --progress "pattern" /search/path

# GNU parallel with find
find /path -type f | parallel --bar 'process_file {}'

For interactive sessions, use the dialog package:


#!/bin/bash

SEARCH_PATH="$1"
TOTAL=$(find "$SEARCH_PATH" -type f | wc -l)
COUNT=0

find "$SEARCH_PATH" -type f -print0 | while IFS= read -r -d '' file; do
    COUNT=$((COUNT + 1))
    PERCENT=$((100 * COUNT / TOTAL))
    echo "$PERCENT" | dialog --gauge "Searching files..." 10 70 0
    # Process file here
done

When dealing with massive storage systems or deep directory structures, the standard find command operates silently without any feedback. This becomes problematic when:

  • Searching multi-terabyte filesystems
  • Processing directories with millions of files
  • Running complex find operations with multiple conditions

While find doesn't have built-in progress reporting, we can leverage these techniques:

# Print current directory being processed
find /path/to/search -type f -exec sh -c 'echo "Processing: $PWD/$1"' sh {} \;

# Count files and show progress
total=$(find /path -type f | wc -l)
find /path -type f -exec sh -c 'processed=$((processed+1)); printf "Progress: %.2f%%\\r" "$(echo "scale=2; $processed/$total*100" | bc)"' sh {} \;

1. Using fd (fd-find)

The modern fd alternative includes a progress bar:

fd --hidden --show-progress --size +1M . /path/to/search

2. Combining with pv

Pipe find output through pv for basic progress:

find /path -type f | pv -l -s $(find /path -type f | wc -l) > results.txt

Here's a comprehensive bash script with multiple progress indicators:

#!/bin/bash

SEARCH_PATH="$1"
LOG_FILE="/tmp/find_progress.log"

# Initialize counters
total_dirs=$(find "$SEARCH_PATH" -type d | wc -l)
processed_dirs=0

echo "Starting search in $SEARCH_PATH (Total dirs: $total_dirs)"

find "$SEARCH_PATH" -type d -print0 | while IFS= read -r -d '' dir; do
    processed_dirs=$((processed_dirs + 1))
    percentage=$((100 * processed_dirs / total_dirs))
    
    # Update progress
    printf "\\r[%-50s] %d%% Current: %s" \
        "$(printf '#%.0s' $(seq 1 $((percentage / 2))))" \
        "$percentage" \
        "${dir:0:50}"
        
    # Process files in current directory
    find "$dir" -maxdepth 1 -type f -print >> "$LOG_FILE"
done

echo -e "\\nSearch complete. Results saved to $LOG_FILE"

When implementing progress indicators:

  • Directory counting adds overhead (do it once at start)
  • Frequent console updates can slow down the search
  • For fastest results, consider simple echo statements every N directories

For enterprise environments, log progress to monitoring systems:

find /data -type d | while read dir; do
    echo "$(date '+%FT%T%z'),$dir" >> /var/log/find_progress.csv
    # Additional processing here
done