How to Gzip/Tar Multiple Subdirectories into Individual Compressed Archives in Linux


3 views

When working with directory structures in Linux, you'll often encounter scenarios where you need to compress multiple subdirectories separately. The standard tar -czf approach creates a single archive containing all subdirectories, which isn't always what we want.

Here's the most efficient method I've found after years of sysadmin work:

find /path/to/directory -maxdepth 1 -mindepth 1 -type d -exec tar -czf {}.tar.gz {} \;

Let's break down why this works:

  • -maxdepth 1: Only processes immediate subdirectories
  • -mindepth 1: Excludes the parent directory itself
  • -type d: Only matches directories
  • The -exec parameter runs tar for each found directory

For those who prefer more readable shell scripts:

for dir in /path/to/directory/*/; do
  dirname=$(basename "$dir")
  tar -czf "${dirname}.tar.gz" "$dir"
done

Some edge cases to consider:

# For directories with spaces:
find . -maxdepth 1 -mindepth 1 -type d -print0 | while IFS= read -r -d '' dir; do
  tar -czf "${dir//\//_}.tar.gz" "$dir"
done

# Excluding certain directories:
find . -maxdepth 1 -mindepth 1 -type d ! -name "exclude_this*" -exec tar ... \;

When dealing with thousands of subdirectories, consider:

  • Using GNU parallel to process multiple directories simultaneously
  • Adding --use-compress-program=pigz for faster compression (requires pigz installed)
  • Using -T flag with tar to exclude unnecessary files
# Parallel processing example:
find . -maxdepth 1 -mindepth 1 -type d | parallel -j 4 'tar -czf {}.tar.gz {}'

Always verify your archives:

for archive in *.tar.gz; do
  if ! tar -tzf "$archive" >/dev/null; then
    echo "Corrupt archive: $archive" >&2
  fi
done

When working with directory structures in Linux, there are frequent scenarios where you need to compress each subdirectory into its own archive file while preserving the original directory hierarchy. This is particularly useful for:

  • Creating incremental backups
  • Distributing modular components
  • Preparing datasets for transfer

The traditional method involves using a combination of find and tar commands:

find directory/ -maxdepth 1 -mindepth 1 -type d -exec tar -czvf {}.tar.gz {} \;

This command breaks down as:

  • -maxdepth 1: Only process immediate subdirectories
  • -mindepth 1: Exclude the parent directory itself
  • -exec: Execute the following command for each match

For more complex scenarios with nested directories, we might want to preserve the full path:

find path/to/parent -type d -exec sh -c 'tar -czvf "${1%/}.tar.gz" "$1"' _ {} \;

This approach properly handles directories containing spaces and special characters.

When dealing with numerous subdirectories, we can leverage GNU Parallel for faster processing:

find directory/ -maxdepth 1 -mindepth 1 -type d | parallel tar -czvf {}.tar.gz {}

This distributes the compression tasks across available CPU cores.

For frequent use, consider adding this to your .bashrc:

function tar-subdirs() {
  if [ -z "$1" ]; then
    echo "Usage: tar-subdirs /path/to/directory"
    return 1
  fi
  find "$1" -maxdepth 1 -mindepth 1 -type d -print0 | while IFS= read -r -d '' dir; do
    tar -czvf "${dir}.tar.gz" "$dir"
  done
}

Usage becomes simply:

tar-subdirs /path/to/directory

Always verify your archives with:

for archive in *.tar.gz; do
  echo "Checking $archive..."
  tar -tzvf "$archive" | head -5
done