How to Strip Leading Directory Paths When Extracting Tar Archives in Linux/Unix


2 views

When working with tar archives, you've probably encountered this frustrating scenario:

# Creating archive
tar cf /backups/project.tar /home/user/projects/webapp

# Extracting archive
tar xf /backups/project.tar -C /tmp

# Results in:
/tmp/home/user/projects/webapp

What we actually want is just the final directory (/tmp/webapp) without the full path hierarchy.

The most straightforward solution is using tar's --strip-components option:

tar xf project.tar --strip-components=3 -C /tmp

The number indicates how many leading directory components to strip:

  • /home/user/projects/webapp has 4 components
  • Stripping 3 leaves just webapp

You can control this behavior when creating the archive using -C (change directory):

# First cd to the parent directory
cd /home/user/projects

# Then create archive with relative paths
tar cf webapp.tar webapp

Now extracting will only create the webapp directory.

For more complex path manipulations, use --transform:

tar xf project.tar --transform='s|^home/user/projects/||' -C /tmp

This uses a sed-like expression to modify paths during extraction.

Example 1: Deploying Web Applications

# Archive created from development environment
tar cf app.tar /opt/dev/apps/production/v1.2.3/

# Clean deployment to production
tar xf app.tar --strip-components=4 -C /var/www/html

Example 2: Backup Restoration

# Backup of user home directory
tar cf home_backup.tar /home/username/

# Restore just the Documents folder elsewhere
tar xf home_backup.tar --strip-components=2 \
    --wildcards '*/Documents/*' \
    -C /mnt/backup_drive
  • Count path components carefully - stripping too many will break paths
  • For automation, consider using tar tf archive.tar | head -1 to inspect structure
  • The --transform option is more flexible but requires regex knowledge
  • Always test extraction with -v (verbose) first to verify paths

If you need more advanced archive manipulation:

  • bsdtar (libarchive) - Often has better path handling
  • pax - POSIX standard archive tool with robust path options
  • unzip - For ZIP files, use -j to junk paths

When working with tar archives, you might encounter this common scenario:

# Creating archive
tar cf /var/www_bak/site.tar /var/www/site

# Extracting archive
tar xf /var/www/site.tar -C /tmp

This results in the full directory structure being preserved:

/tmp/var/www/site

But often, you just want the contents of the final directory without recreating the entire path structure.

The --strip-components option is exactly what you need:

tar xf /var/www/site.tar -C /tmp --strip-components=3

This will extract the contents directly to:

/tmp/site

The number after --strip-components specifies how many leading directory components to remove:

Original path: /var/www/site
--strip-components=1 → www/site
--strip-components=2 → site
--strip-components=3 → contents of site directly

Here are some common use cases:

# Example 1: Extract Apache config without full path
tar xf configs.tar --strip-components=2 -C /etc/httpd

# Example 2: Deploy application files
tar xzf app_release.tar.gz --strip-components=1 -C /opt/myapp

# Example 3: Multiple level stripping
tar xf deep_archive.tar --strip-components=4 -C ~/projects

Sometimes it's better to create archives with the desired structure from the beginning:

cd /var/www
tar cf /backup/site.tar site/

Then extraction becomes simpler:

tar xf /backup/site.tar -C /tmp

--strip-components works well with other tar options:

# Verbose extraction
tar xvf archive.tar --strip-components=2

# With compression
tar xzpf archive.tar.gz --strip-components=1 -C /target

# Listing contents first
tar tf archive.tar | grep -v '/$'  # Check paths before extraction

Be careful with:

  • Setting too high a number (might strip too much)
  • Relative vs absolute paths in the archive
  • Symbolic links in the archive