How to Copy Directory Tree Excluding Empty Directories in Linux/Unix


1 views

When working with complex directory structures in Linux/Unix environments, you often encounter situations where you need to replicate a directory tree while excluding empty folders. This is particularly common when:

  • Migrating project structures between environments
  • Creating minimal deployment packages
  • Archiving only content-bearing directories

Here's a reliable method using standard Unix tools:

find /source/path -type f -print | cpio -pdm /destination/path

Breakdown of the command:

  • find /source/path -type f: Locates all files in the source tree
  • cpio -pdm: Creates directories as needed (-d), preserves timestamps (-m), and passes input from pipe (-p)

For more control, rsync offers a robust solution:

rsync -a --include='*/' --include='*e*' --exclude='*' --prune-empty-dirs /source/A/ /destination/

This specifically handles the example case where we want to preserve directory 'C' and file 'e'. Adjust the include pattern as needed.

For complex filtering needs, a Python script provides maximum flexibility:

import os
import shutil

def copy_non_empty(src, dst):
    for root, dirs, files in os.walk(src):
        rel_path = os.path.relpath(root, src)
        dest_path = os.path.join(dst, rel_path)
        
        if files:  # Only create directory if it contains files
            os.makedirs(dest_path, exist_ok=True)
            for file in files:
                shutil.copy2(os.path.join(root, file), 
                           os.path.join(dest_path, file))

copy_non_empty('/source/A', '/destination')

Consider these additional scenarios:

# Preserve empty directories that match specific patterns
find /source -type f -o -type d -name "*.git*" | cpio -pdm /destination

# Exclude certain file types while copying structure
rsync -a --include='*/' --include='*.txt' --exclude='*' --prune-empty-dirs src/ dst/

For large directory trees:

  • The find/cpio method is generally fastest
  • rsync provides better progress reporting
  • Python offers most flexibility but has higher overhead

When working with directory structures in Unix-like systems, you might encounter situations where you need to replicate a directory tree while automatically excluding empty directories. This is particularly common when:

  • Migrating project assets
  • Creating backups of active data
  • Preparing deployment packages
  • Analyzing file structures

Given the example structure:

A
|-- B
|-- C
    |-- D
    |-- e
|-- F
    |-- G

We want to preserve only directories containing files, resulting in:

C
|-- e

The most efficient method uses rsync with specific parameters:

rsync -a --include='*/' --include='*/*/' --include='*/*/*/' \
--exclude='*' --prune-empty-dirs /source/path/A/ /destination/path/

Explanation of flags:

  • -a: Archive mode (preserves permissions, ownership, etc.)
  • --include='*/': Includes all directories at first level
  • --include patterns for nested directories
  • --exclude='*': Excludes everything else by default
  • --prune-empty-dirs: Removes empty directories

For systems without rsync, this combination works well:

cd /source/path
find A -type f -print | cpio -pdum /destination/path

Breakdown:

  • find -type f: Locates only files
  • cpio -p: Pass-through mode
  • -d: Creates directories as needed
  • -u: Unconditional copy
  • -m: Preserves modification times

For deeply nested structures, adjust the include patterns:

rsync -a \
--include='*/' \
--include='*/*/' \
--include='*/*/*/' \
--include='*/*/*/*/' \
--exclude='*' \
--prune-empty-dirs \
/source/path/A/ /destination/path/

After copying, verify with:

find /destination/path -type d -empty -delete

This ensures no empty directories remain.

For large directory trees:

  • Rsync is generally faster for remote transfers
  • find/cpio may perform better locally
  • Consider --numeric-ids with rsync for system-to-system transfers