When creating tarballs using the standard tar -cvjf
command, the archive preserves the full directory structure of the source files. For example:
tar -cvjf backup.tar.bz2 /home/user/projects/myapp
When extracted, this creates the full path /home/user/projects/myapp
in the destination directory. This is often undesirable when you just want the contents of myapp
without the parent directory hierarchy.
The most efficient solution is to use tar's -C
(change directory) option combined with relative paths:
tar -cvjf clean_backup.tar.bz2 -C /home/user/projects myapp
This command:
- Changes to
/home/user/projects
before processing - Adds only the
myapp
directory contents - Creates an archive that will extract directly to
./myapp
For more complex path manipulation, GNU tar offers the --transform
option:
tar -cvjf transformed.tar.bz2 \
--transform 's,^home/user/projects/myapp/,,' \
/home/user/projects/myapp
This uses a sed-like expression to strip the prefix from paths in the archive.
Here's a complete example showing both creation and verification:
# Create archive without parent paths
tar -cvjf app_release.tar.bz2 -C /builds/v1.2.3 app
# Verify contents
tar -tvf app_release.tar.bz2 | head -5
# Should show paths starting with 'app/' not '/builds/v1.2.3/app/'
# Extract test
mkdir test_extract && cd test_extract
tar -xvjf ../app_release.tar.bz2
ls
# Should show 'app' directory directly in test_extract
This technique is particularly valuable when:
- Creating deployment packages that need predictable extraction paths
- Building Docker images where layer paths matter
- Sharing code where recipients expect flat directory structures
- Automating builds where absolute paths would break portability
When creating tarballs of specific directories, you might have noticed that the archive preserves the full directory path structure by default. This means when you run:
tar -cvjf archive.tar.bz2 /path/to/folder/source
The resulting tarball will contain the entire "/path/to/folder/" hierarchy when extracted. This is often undesirable because:
1. It forces extracted files into specific directory locations
2. Makes the archive less portable between systems
3. Requires additional steps to move files after extraction
The most reliable method uses tar's -C (capital C) option to change the working directory before processing files:
tar -cvjf destination.tar.bz2 -C /path/to/folder source
This command:
1. First changes to /path/to/folder (-C option)
2. Then archives just the "source" directory
3. Preserves relative paths within source
For more complex path modifications, GNU tar offers the --transform option:
tar -cvjf destination.tar.bz2 --transform 's,^path/to/folder/,,' /path/to/folder/source
The transform flag uses sed-style replacement patterns to strip paths. This is particularly useful when:
- You need to modify multiple path components
- Want to replace paths rather than just remove them
- Need to handle complex archive restructuring
Here are some real-world scenarios with solutions:
Example 1: Archive a web application's public directory
# Bad (preserves full path):
tar -czvf app.tar.gz /var/www/html/public
# Good (clean paths):
tar -czvf app.tar.gz -C /var/www/html public
Example 2: Create a backup of user home directories
# Without parent paths:
tar -cjf users_backup.tar.bz2 -C /home user1 user2 user3
# With directory flattening:
tar -cjf flat_backup.tar.bz2 --transform 's,^home/,,' /home/*
- Always test your archives with -t (list) before extraction:
tar -tvf archive.tar.gz
- Combine these techniques with exclude patterns for more control
- For maximum portability, use relative paths rather than absolute
- Remember that -C affects all subsequent path arguments
For ultimate control over paths, you can combine find with tar:
find /path/to/folder/source -printf "%P\\n" | \\
tar -cvjf destination.tar.bz2 --no-recursion -C /path/to/folder/source -T -
This approach:
1. Uses find to list relative paths (%P)
2. Pipes them to tar's -T (files-from) option
3. -C ensures proper base directory
4. --no-recursion prevents automatic directory traversal