How to Display Progress Information When Using Linux find Command on Large Filesystems


1 views

When working with large storage systems or performing recursive searches across deep directory structures, the standard find command operates silently without any progress feedback. This becomes particularly problematic when:

  • Scanning multi-terabyte storage arrays
  • Processing directories with millions of files
  • Running complex find operations with multiple conditions

While find doesn't have built-in progress reporting, we can leverage its existing features:


# Basic directory counter
find /path/to/search -type d -exec sh -c 'echo "Processing: $1" >&2' sh {} \;

# File count with percentage estimation
total=$(find /path -type f | wc -l)
find /path -type f -exec sh -c 'count=$((count+1)); printf "Progress: %.1f%%\r" "$(echo "scale=1; $count*100/$total" | bc)"' sh {} \;

The pv (pipe viewer) tool can help visualize progress:


# Install pv if needed
sudo apt-get install pv  # Debian/Ubuntu
sudo yum install pv      # RHEL/CentOS

# Usage example
find /large/filesystem -type f | pv -l -s $(find /large/filesystem -type f | wc -l) > /dev/null

Here's a more comprehensive bash script that provides directory-level progress:


#!/bin/bash

SEARCH_PATH="$1"
TOTAL_DIRS=$(find "$SEARCH_PATH" -type d | wc -l)
CURRENT_DIR=0

find "$SEARCH_PATH" -type d -print0 | while IFS= read -r -d '' dir; do
    CURRENT_DIR=$((CURRENT_DIR + 1))
    PERCENT=$((100 * CURRENT_DIR / TOTAL_DIRS))
    printf "Scanning: %-50s [%3d%%]\r" "$dir" "$PERCENT"
    
    # Your actual find operations here
    find "$dir" -maxdepth 1 -type f -name "*.log" -exec ls -lh {} \;
done
echo ""

Consider these alternatives when progress tracking is essential:


# fd (fd-find)
fd --hidden --follow --progress "pattern" /search/path

# ripgrep (rg)
rg --files --progress "pattern" /search/path

# GNU parallel with find
find /path -type f | parallel --bar 'process_file {}'

For interactive sessions, use the dialog package:


#!/bin/bash

SEARCH_PATH="$1"
TOTAL=$(find "$SEARCH_PATH" -type f | wc -l)
COUNT=0

find "$SEARCH_PATH" -type f -print0 | while IFS= read -r -d '' file; do
    COUNT=$((COUNT + 1))
    PERCENT=$((100 * COUNT / TOTAL))
    echo "$PERCENT" | dialog --gauge "Searching files..." 10 70 0
    # Process file here
done

When dealing with massive storage systems or deep directory structures, the standard find command operates silently without any feedback. This becomes problematic when:

  • Searching multi-terabyte filesystems
  • Processing directories with millions of files
  • Running complex find operations with multiple conditions

While find doesn't have built-in progress reporting, we can leverage these techniques:

# Print current directory being processed
find /path/to/search -type f -exec sh -c 'echo "Processing: $PWD/$1"' sh {} \;

# Count files and show progress
total=$(find /path -type f | wc -l)
find /path -type f -exec sh -c 'processed=$((processed+1)); printf "Progress: %.2f%%\\r" "$(echo "scale=2; $processed/$total*100" | bc)"' sh {} \;

1. Using fd (fd-find)

The modern fd alternative includes a progress bar:

fd --hidden --show-progress --size +1M . /path/to/search

2. Combining with pv

Pipe find output through pv for basic progress:

find /path -type f | pv -l -s $(find /path -type f | wc -l) > results.txt

Here's a comprehensive bash script with multiple progress indicators:

#!/bin/bash

SEARCH_PATH="$1"
LOG_FILE="/tmp/find_progress.log"

# Initialize counters
total_dirs=$(find "$SEARCH_PATH" -type d | wc -l)
processed_dirs=0

echo "Starting search in $SEARCH_PATH (Total dirs: $total_dirs)"

find "$SEARCH_PATH" -type d -print0 | while IFS= read -r -d '' dir; do
    processed_dirs=$((processed_dirs + 1))
    percentage=$((100 * processed_dirs / total_dirs))
    
    # Update progress
    printf "\\r[%-50s] %d%% Current: %s" \
        "$(printf '#%.0s' $(seq 1 $((percentage / 2))))" \
        "$percentage" \
        "${dir:0:50}"
        
    # Process files in current directory
    find "$dir" -maxdepth 1 -type f -print >> "$LOG_FILE"
done

echo -e "\\nSearch complete. Results saved to $LOG_FILE"

When implementing progress indicators:

  • Directory counting adds overhead (do it once at start)
  • Frequent console updates can slow down the search
  • For fastest results, consider simple echo statements every N directories

For enterprise environments, log progress to monitoring systems:

find /data -type d | while read dir; do
    echo "$(date '+%FT%T%z'),$dir" >> /var/log/find_progress.csv
    # Additional processing here
done