How to Create a Tarball Without Preserving Parent Directory Paths in Linux


2 views

When creating tarballs using the standard tar -cvjf command, the archive preserves the full directory structure of the source files. For example:

tar -cvjf backup.tar.bz2 /home/user/projects/myapp

When extracted, this creates the full path /home/user/projects/myapp in the destination directory. This is often undesirable when you just want the contents of myapp without the parent directory hierarchy.

The most efficient solution is to use tar's -C (change directory) option combined with relative paths:

tar -cvjf clean_backup.tar.bz2 -C /home/user/projects myapp

This command:

  1. Changes to /home/user/projects before processing
  2. Adds only the myapp directory contents
  3. Creates an archive that will extract directly to ./myapp

For more complex path manipulation, GNU tar offers the --transform option:

tar -cvjf transformed.tar.bz2 \
    --transform 's,^home/user/projects/myapp/,,' \
    /home/user/projects/myapp

This uses a sed-like expression to strip the prefix from paths in the archive.

Here's a complete example showing both creation and verification:

# Create archive without parent paths
tar -cvjf app_release.tar.bz2 -C /builds/v1.2.3 app

# Verify contents
tar -tvf app_release.tar.bz2 | head -5
# Should show paths starting with 'app/' not '/builds/v1.2.3/app/'

# Extract test
mkdir test_extract && cd test_extract
tar -xvjf ../app_release.tar.bz2
ls
# Should show 'app' directory directly in test_extract

This technique is particularly valuable when:

  • Creating deployment packages that need predictable extraction paths
  • Building Docker images where layer paths matter
  • Sharing code where recipients expect flat directory structures
  • Automating builds where absolute paths would break portability

When creating tarballs of specific directories, you might have noticed that the archive preserves the full directory path structure by default. This means when you run:


tar -cvjf archive.tar.bz2 /path/to/folder/source

The resulting tarball will contain the entire "/path/to/folder/" hierarchy when extracted. This is often undesirable because:

1. It forces extracted files into specific directory locations
2. Makes the archive less portable between systems
3. Requires additional steps to move files after extraction

The most reliable method uses tar's -C (capital C) option to change the working directory before processing files:


tar -cvjf destination.tar.bz2 -C /path/to/folder source

This command:
1. First changes to /path/to/folder (-C option)
2. Then archives just the "source" directory
3. Preserves relative paths within source

For more complex path modifications, GNU tar offers the --transform option:


tar -cvjf destination.tar.bz2 --transform 's,^path/to/folder/,,' /path/to/folder/source

The transform flag uses sed-style replacement patterns to strip paths. This is particularly useful when:
- You need to modify multiple path components
- Want to replace paths rather than just remove them
- Need to handle complex archive restructuring

Here are some real-world scenarios with solutions:

Example 1: Archive a web application's public directory


# Bad (preserves full path):
tar -czvf app.tar.gz /var/www/html/public

# Good (clean paths):
tar -czvf app.tar.gz -C /var/www/html public

Example 2: Create a backup of user home directories


# Without parent paths:
tar -cjf users_backup.tar.bz2 -C /home user1 user2 user3

# With directory flattening:
tar -cjf flat_backup.tar.bz2 --transform 's,^home/,,' /home/*

  • Always test your archives with -t (list) before extraction: tar -tvf archive.tar.gz
  • Combine these techniques with exclude patterns for more control
  • For maximum portability, use relative paths rather than absolute
  • Remember that -C affects all subsequent path arguments

For ultimate control over paths, you can combine find with tar:


find /path/to/folder/source -printf "%P\\n" | \\
tar -cvjf destination.tar.bz2 --no-recursion -C /path/to/folder/source -T -

This approach:
1. Uses find to list relative paths (%P)
2. Pipes them to tar's -T (files-from) option
3. -C ensures proper base directory
4. --no-recursion prevents automatic directory traversal