Block-Level Deduplication Filesystems for Linux: ASIS Alternatives and FUSE Implementations


2 views

True block-level deduplication differs fundamentally from file-level approaches like hardlinks. While hardlinks create multiple directory entries pointing to the same inode, block-level deduplication operates at the storage layer by identifying identical blocks across the entire storage pool.

Btrfs offers inline deduplication capabilities through kernel-space mechanisms. While not identical to NetApp's ASIS, it provides similar functionality:


# Enable dedupe on Btrfs
btrfs filesystem defrag -r -v -csum /path/to/mount

ZFS (available on Linux through ZFS on Linux project) implements block-level deduplication with its dedup property:


# Enable deduplication on a ZFS dataset
zfs set dedup=on pool/dataset

LessFS implements block-level deduplication through FUSE with a backend database:


# Configuration example in /etc/lessfs.cfg
DEDUP_ENABLED=yes
DEDUP_BLOCK_SIZE=4096

All these solutions require significant RAM for deduplication tables. ZFS recommends 5GB RAM per TB of storage when dedup is enabled. Performance benchmarks show:

  • Btrfs: 15-20% overhead during writes
  • ZFS: 5-15% overhead with ARC hits
  • LessFS: 25-40% overhead due to FUSE layer

ZFS deduplication requires careful planning:


# Check deduplication ratio
zpool list -o dedupratio

For Btrfs, defragmentation with dedupe can cause significant IO load:


# Monitor during operation
iotop -oPa

For cases where native filesystem support isn't feasible:


# Using dduper tool on existing filesystems
dduper --fast-mode /dev/sdx

While NetApp's ASIS provides excellent block-level deduplication, many developers seek open-source alternatives that can run on standard Linux distributions. Here's a deep dive into available options:

Originally from OpenSolaris, ZFS offers robust block-level deduplication:


# Create a ZFS pool with deduplication enabled
zpool create -o dedup=on tank mirror /dev/sda /dev/sdb

# Verify deduplication ratio
zpool list -v

Note that ZFS requires significant RAM (5GB per TB of storage is recommended for dedup).

Btrfs supports deduplication through external tools like duperemove:


# Install duperemove
sudo apt install duperemove

# Run block-level deduplication
duperemove -rdh /path/to/files

While not native to Btrfs, this approach provides good results for many use cases.

For specialized needs, consider these options:

  • VDO (Virtual Data Optimizer): Now included in RHEL/CentOS
  • LessFS: A FUSE-based solution with deduplication
  • OpenDedup: A userspace deduplication filesystem

When implementing any deduplication solution, monitor these metrics:


# For ZFS:
zpool get dedupratio tank

# For Btrfs:
btrfs filesystem df /path

Remember that deduplication always trades CPU/RAM for storage savings.

Block-level deduplication isn't always the right choice. Avoid it when:

  • Working with already compressed data
  • Storage is abundant but RAM is limited
  • Workloads involve many small, unique files