How to Compare Two Directories in Linux and Find Missing Files Using Command Line Tools


3 views

When working with file systems, a common task is comparing directories to identify files present in one location but missing in another. This occurs frequently during synchronization, backup verification, or when auditing file structures.

Linux offers several built-in tools that can be combined effectively:


# Using diff with process substitution
diff <(ls -1 dir1/) <(ls -1 dir2/)

# More robust version with find
diff <(find dir1 -type f -printf '%P\n' | sort) <(find dir2 -type f -printf '%P\n' | sort)

For production environments, this method proves most reliable:


comm -23 <(ls dir1 | sort) <(ls dir2 | sort)

Where -23 suppresses lines unique to second directory and common lines.

For recursive comparisons with relative paths:


find dir1 -type f -printf '%P\n' | sort > dir1_files
find dir2 -type f -printf '%P\n' | sort > dir2_files
comm -23 dir1_files dir2_files

Combining find with grep for efficient comparison:


find dir1 -type f -printf '%P\n' | grep -Fxvf <(find dir2 -type f -printf '%P\n')

When needing content verification (not just filenames):


diff -rq dir1 dir2

The -q flag shows only when files differ.

For very large directories:


rsync -n -aviu --delete dir1/ dir2/ | grep '^deleting'

The dry-run (-n) flag makes this safe for analysis.


When working with file systems in Linux/Unix environments, a common task is comparing two directories to identify files present in one but missing in the other. This is particularly useful for synchronization, backup verification, or debugging file-related issues.

The comm command compares two sorted files line by line. Here's how to use it for directory comparison:

comm -23 <(ls dir1 | sort) <(ls dir2 | sort)

This outputs files present in dir1 but not in dir2. The options:

- -23 suppresses lines unique to file2 and common lines

- Process substitution (<(command)) treats command output as files

For more detailed comparison, combine diff with file listings:

diff <(ls -1 dir1 | sort) <(ls -1 dir2 | sort) | grep "^<"

The grep "^<" filters to show only files unique to the first directory.

A more robust solution using find:

find dir1 -type f -printf '%P\n' | sort > dir1_files
find dir2 -type f -printf '%P\n' | sort > dir2_files
comm -23 dir1_files dir2_files

This handles filenames with spaces and special characters better than simple ls.

For recursive directory comparison:

diff -rq dir1 dir2 | grep "Only in dir1"

Or using find with relative paths:

(cd dir1 && find . -type f -print0 | sort -z) > dir1_files
(cd dir2 && find . -type f -print0 | sort -z) > dir2_files
comm -23 dir1_files dir2_files

Here's a complete script to verify backup completeness:

#!/bin/bash
SOURCE="/data/important_files"
BACKUP="/backup/important_files"

echo "Files missing in backup:"
comm -23 <(find "$SOURCE" -type f -printf '%P\n' | sort) \
          <(find "$BACKUP" -type f -printf '%P\n' | sort)

echo "Checking file contents..."
diff -rq "$SOURCE" "$BACKUP"

For large directories:

- Use -print0 with xargs -0 for safety

- Consider mktemp for temporary files

- Parallel processing with parallel or xargs -P can speed up content comparisons

Other useful utilities:

- rsync -n -av --delete dir1/ dir2/ (dry run)

- fdupes for finding duplicate files

- tree for visual directory comparisons