How to Calculate Total File Size from a List of Files in Linux (Using du, awk, and xargs)


1 views

When working with file lists in Linux, one common challenge is handling filenames containing spaces or special characters. Traditional approaches using simple shell commands often fail when encountering these cases.

# This would fail for filenames with spaces:
cat filelist.txt | xargs du -ch

Here are several reliable methods to get the total size of files listed in a text file:

Method 1: Using while loop with IFS

total=0
while IFS= read -r file; do
    if [ -e "$file" ]; then
        size=$(du -b "$file" | awk '{print $1}')
        total=$((total + size))
    fi
done < filelist.txt
echo "Total size: $total bytes"

Method 2: Using find with -exec

find $(cat filelist.txt) -type f -exec du -ch {} + | grep total$

Note: This requires GNU find and may have limitations with very large file lists.

Method 3: Python Alternative

For more complex cases, a Python script provides better error handling:

#!/usr/bin/env python3
import os

total_size = 0
with open('filelist.txt') as f:
    for line in f:
        path = line.strip()
        try:
            total_size += os.path.getsize(path)
        except FileNotFoundError:
            continue

print(f"Total size: {total_size} bytes")

For large directories with thousands of files, the while loop method tends to be most efficient. The Python solution, while slower, offers better error reporting.

To get both count and size:

count=0
total=0
while IFS= read -r file; do
    if [ -e "$file" ]; then
        size=$(du -b "$file" | awk '{print $1}')
        total=$((total + size))
        count=$((count + 1))
    fi
done < filelist.txt
echo "Files: $count, Total size: $total bytes"

Always include checks for:

  • Nonexistent files
  • Permission issues
  • Symbolic links (if you want to follow them)
  • Directories (whether to include their contents)

When working with file management on Linux systems (especially on resource-constrained devices like QNAP NAS), you might encounter situations where you need to calculate the total size of files listed in a text file. This commonly occurs when:

  • Managing backups or archives
  • Processing batch operations
  • Analyzing disk space usage

Here's what your file list might look like:

/share/archive/Bailey Test/BD006/0.tga
/share/archive/Bailey/BD007/1 version 1.tga
/share/archive/Bailey 2/BD007/example.tga

Here's a robust command that handles spaces in filenames and calculates the total size:

cat file_list.txt | while read -r line; do 
    [ -f "$line" ] && du -b "$line"; 
done | awk '{sum += $1} END {print sum}'

For human-readable output (in KB, MB, GB):

cat file_list.txt | while read -r line; do 
    [ -f "$line" ] && du -b "$line"; 
done | awk '{sum += $1} END {print sum/1024 " KB"}'

Using xargs:

xargs -a file_list.txt -d '\n' stat -c %s | awk '{sum += $1} END {print sum}'

For systems without stat:

while read -r file; do 
    [ -f "$file" ] && wc -c < "$file"; 
done < file_list.txt | awk '{sum += $1} END {print sum}'

These solutions account for:

  • Spaces in filenames
  • Missing files
  • Large file lists
  • Different line ending formats

For very large file lists (thousands of files), the xargs method is generally fastest as it processes files in batches rather than one-by-one.