When working with complex directory structures in Linux/Unix environments, you often encounter situations where you need to replicate a directory tree while excluding empty folders. This is particularly common when:
- Migrating project structures between environments
- Creating minimal deployment packages
- Archiving only content-bearing directories
Here's a reliable method using standard Unix tools:
find /source/path -type f -print | cpio -pdm /destination/path
Breakdown of the command:
find /source/path -type f
: Locates all files in the source treecpio -pdm
: Creates directories as needed (-d), preserves timestamps (-m), and passes input from pipe (-p)
For more control, rsync offers a robust solution:
rsync -a --include='*/' --include='*e*' --exclude='*' --prune-empty-dirs /source/A/ /destination/
This specifically handles the example case where we want to preserve directory 'C' and file 'e'. Adjust the include pattern as needed.
For complex filtering needs, a Python script provides maximum flexibility:
import os
import shutil
def copy_non_empty(src, dst):
for root, dirs, files in os.walk(src):
rel_path = os.path.relpath(root, src)
dest_path = os.path.join(dst, rel_path)
if files: # Only create directory if it contains files
os.makedirs(dest_path, exist_ok=True)
for file in files:
shutil.copy2(os.path.join(root, file),
os.path.join(dest_path, file))
copy_non_empty('/source/A', '/destination')
Consider these additional scenarios:
# Preserve empty directories that match specific patterns
find /source -type f -o -type d -name "*.git*" | cpio -pdm /destination
# Exclude certain file types while copying structure
rsync -a --include='*/' --include='*.txt' --exclude='*' --prune-empty-dirs src/ dst/
For large directory trees:
- The find/cpio method is generally fastest
- rsync provides better progress reporting
- Python offers most flexibility but has higher overhead
When working with directory structures in Unix-like systems, you might encounter situations where you need to replicate a directory tree while automatically excluding empty directories. This is particularly common when:
- Migrating project assets
- Creating backups of active data
- Preparing deployment packages
- Analyzing file structures
Given the example structure:
A
|-- B
|-- C
|-- D
|-- e
|-- F
|-- G
We want to preserve only directories containing files, resulting in:
C
|-- e
The most efficient method uses rsync with specific parameters:
rsync -a --include='*/' --include='*/*/' --include='*/*/*/' \
--exclude='*' --prune-empty-dirs /source/path/A/ /destination/path/
Explanation of flags:
-a
: Archive mode (preserves permissions, ownership, etc.)--include='*/'
: Includes all directories at first level--include
patterns for nested directories--exclude='*'
: Excludes everything else by default--prune-empty-dirs
: Removes empty directories
For systems without rsync, this combination works well:
cd /source/path
find A -type f -print | cpio -pdum /destination/path
Breakdown:
find -type f
: Locates only filescpio -p
: Pass-through mode-d
: Creates directories as needed-u
: Unconditional copy-m
: Preserves modification times
For deeply nested structures, adjust the include patterns:
rsync -a \
--include='*/' \
--include='*/*/' \
--include='*/*/*/' \
--include='*/*/*/*/' \
--exclude='*' \
--prune-empty-dirs \
/source/path/A/ /destination/path/
After copying, verify with:
find /destination/path -type d -empty -delete
This ensures no empty directories remain.
For large directory trees:
- Rsync is generally faster for remote transfers
- find/cpio may perform better locally
- Consider
--numeric-ids
with rsync for system-to-system transfers