Debugging “No Space Left on Device” Error Despite Available Storage in Linux Backup Systems


2 views

When your backup system screams "no space left" while df -h shows ample free space, you're facing one of Linux's more puzzling storage issues. Let's examine the concrete symptoms from our case:

# Disk space shows 623GB free
$ df -h /dev/sdg1
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdg1       2.7T  2.0T  623G  77% /mnt/backupsys/shd

# Inodes are plentiful
$ df -i /dev/sdg1
Filesystem        Inodes   IUsed     IFree IUse% Mounted on
/dev/sdg1      183148544 2810146 180338398    2%

Several technical factors could trigger this behavior:

# Check for filesystem quotas
$ repquota -a

# Verify user/group disk limits
$ edquota -u root

# Examine reserved blocks (typically 5% for root)
$ tune2fs -l /dev/sdg1 | grep "Reserved block count"

The timing suggests ACL activation modified file metadata:

# Before ACL activation:
$ getfacl /mnt/md0/somefile | wc -l
5

# After ACL activation:
$ getfacl /mnt/md0/somefile | wc -l
15

This metadata change forces rsync to treat files as modified, potentially overwhelming temporary buffers during incremental backups.

For immediate resolution:

# 1. Force new full backup
$ dirvish --vault shd --init

# 2. Adjust rsync buffer sizes
$ rsync --block-size=32768 --temp-dir=/mnt/tmp ...

# 3. Check kernel message buffer for filesystem errors
$ dmesg | grep -i "ext3"

Modify your Dirvish configuration:

# /etc/dirvish/master.conf
bank:
    /mnt/backupsys/shd
rsync-options:
    --temp-dir=/mnt/backupsys/tmp
    --block-size=32768
    --partial-dir=.rsync-partial
xdev: 1

For filesystems handling many small files:

# /etc/fstab
/dev/sdg1  /mnt/backupsys/shd  ext3  defaults,noatime,data=writeback  0  2

When working with Dirvish backup systems on Ubuntu servers, encountering "No space left on device" errors despite having ample storage can be particularly frustrating. The scenario typically manifests with these characteristics:

# Typical error pattern
rsync: write "/mnt/backup/target/file.XyZ123": No space left on device (28)
rsync: writefd_unbuffered failed to write 4 bytes [sender]: Broken pipe (32)
filesystem full
write error, filesystem probably full

The system showed:

# Storage metrics
df -h
/dev/sdg1       2.7T  2.0T  623G  77% /mnt/backup

# Inode status
df -i
/dev/sdg1      183148544 2810146 180338398    2% /mnt/backup

The recent activation of Access Control Lists (ACLs) on the source filesystem appears to be the root cause:

/dev/md0 on /mnt/md0 type ext4 (rw,acl)

ACL changes trigger rsync to perceive all files as modified, causing:

  • Massive temporary file creation
  • Buffer overflow situations
  • Incorrect space calculation during hardlink operations

The rsync temp file pattern reveals the issue:

*.eDJiD9
*.RHuUAJ
*.9tVK8Z

These temporary suffixes indicate rsync's failed attempts to create temporary copies during the ACL metadata transfer.

Immediate Workaround

# For ext3/ext4 filesystems:
sudo tune2fs -o acl /dev/sdg1
sudo mount -o remount,acl /mnt/backup

Permanent Fixes

Option 1: Rsync parameter adjustment

rsync -av --inplace --partial --no-whole-file \
      --temp-dir=/var/tmp/rsync/ source/ target/

Option 2: Dirvish configuration update

# In /etc/dirvish/default.conf
temp: /var/tmp/dirvish
rsync-options: --inplace --partial --no-whole-file

For future backups:

# Monitor buffer usage
watch -n 5 'cat /proc/sys/fs/file-nr'

# Increase system limits
echo "fs.file-max = 2097152" >> /etc/sysctl.conf
sysctl -p

After implementing changes:

# Check ACL status
getfacl /mnt/backup/testfile

# Verify temp space usage
df -h /var/tmp
df -i /var/tmp

Key takeaways from this debugging experience:

  • ACL changes constitute metadata modifications that affect rsync behavior
  • Temp file handling becomes critical during mass metadata operations
  • Filesystem monitoring should include both storage and inode metrics