How to Split Large Files into Multiple ZIP Archives with Size Limit Using Command Line and Scripts


1 views

When dealing with large datasets or numerous files, we often need to create compressed archives. However, many systems have limitations on maximum file upload sizes (e.g., email attachments or cloud storage). A common requirement is to split files into multiple ZIP archives when they exceed a certain size threshold (like 10MB).

For Windows users, the built-in zip command doesn't support size splitting natively, but we can use alternatives:

# Using 7-Zip (Windows)
7z a -v10m archive_name.7z folder_to_compress

# Using split command (Linux/macOS)
zip -r archive_name.zip folder_to_compress
split -b 10m archive_name.zip archive_part.

Here's a more flexible Python script that handles the splitting automatically:

import os
import zipfile
from itertools import count

def zip_with_size_limit(folder_path, output_prefix, max_size_mb):
    max_size = max_size_mb * 1024 * 1024  # Convert MB to bytes
    current_size = 0
    part_num = count(1)
    current_zip = None
    
    for root, _, files in os.walk(folder_path):
        for file in files:
            file_path = os.path.join(root, file)
            file_size = os.path.getsize(file_path)
            
            if current_zip is None or current_size + file_size > max_size:
                if current_zip:
                    current_zip.close()
                zip_path = f"{output_prefix}_{next(part_num)}.zip"
                current_zip = zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED)
                current_size = 0
            
            current_zip.write(file_path, os.path.relpath(file_path, folder_path))
            current_size += file_size
    
    if current_zip:
        current_zip.close()

# Usage example:
zip_with_size_limit('/path/to/folder', 'output_archive', 10)

Windows PowerShell provides another approach:

# PowerShell script
$source = "C:\path\to\folder"
$destination = "C:\output\archive"
$maxSize = 10MB

$files = Get-ChildItem -Path $source -File
$currentSize = 0
$part = 1

foreach ($file in $files) {
    if ($currentSize + $file.Length -gt $maxSize -or $zip -eq $null) {
        if ($zip -ne $null) { $zip.Dispose() }
        $zip = [System.IO.Compression.ZipFile]::Open("$($destination)_$part.zip", 'Create')
        $part++
        $currentSize = 0
    }
    
    [System.IO.Compression.ZipFileExtensions]::CreateEntryFromFile($zip, $file.FullName, $file.Name)
    $currentSize += $file.Length
}

if ($zip -ne $null) { $zip.Dispose() }

Consider these additional scenarios:

  • Single files larger than the size limit (requires splitting the file itself)
  • Preserving directory structure in the archives
  • Adding progress indicators for large operations
  • Including checksum verification

When dealing with thousands of files or very large archives:

  • Batch process files to minimize memory usage
  • Consider using temporary files for very large operations
  • Implement parallel processing where possible

In data processing workflows, we often encounter situations where we need to archive files while adhering to strict size limitations. This requirement is particularly common when:

  • Preparing attachments for email systems with size restrictions
  • Uploading to systems with maximum file size limits
  • Creating backups optimized for storage media

The most efficient approach is using the zip command with the -s (split size) parameter:

zip -r -s 10m archive_name.zip source_folder/

This creates multiple ZIP files (archive_name.z01, archive_name.z02, etc.) each capped at 10MB. To extract:

zip -s 0 archive_name.zip --out complete.zip
unzip complete.zip

For more control, here's a Python script using the zipfile module:

import os
import zipfile

def zip_with_size_limit(source_folder, output_base, max_size_mb):
    max_size = max_size_mb * 1024 * 1024
    current_zip = None
    current_size = 0
    part_num = 1
    
    for root, _, files in os.walk(source_folder):
        for file in files:
            file_path = os.path.join(root, file)
            file_size = os.path.getsize(file_path)
            
            if current_zip is None or current_size + file_size > max_size:
                if current_zip:
                    current_zip.close()
                zip_name = f"{output_base}_{part_num}.zip"
                current_zip = zipfile.ZipFile(zip_name, 'w', zipfile.ZIP_DEFLATED)
                part_num += 1
                current_size = 0
                
            current_zip.write(file_path, os.path.relpath(file_path, source_folder))
            current_size += file_size
    
    if current_zip:
        current_zip.close()

# Usage example:
zip_with_size_limit('/path/to/source', 'output_archive', 10)

When implementing size-limited archiving, consider these factors:

# Bash script example with additional checks
MAX_SIZE=10M
COUNTER=1
CURRENT_SIZE=0
CURRENT_ZIP="archive_part_${COUNTER}.zip"

find source_folder/ -type f | while read file; do
    file_size=$(stat -c%s "$file")
    if [ $((CURRENT_SIZE + file_size)) -gt $((10 * 1024 * 1024)) ]; then
        COUNTER=$((COUNTER + 1))
        CURRENT_ZIP="archive_part_${COUNTER}.zip"
        CURRENT_SIZE=0
    fi
    zip -q -r "$CURRENT_ZIP" "$file"
    CURRENT_SIZE=$((CURRENT_SIZE + file_size))
done

Special situations to account for:

  • Single files larger than the size limit (requires splitting at binary level)
  • Preserving file permissions and metadata
  • Handling Unicode filenames across different systems
Tool Command Pros Cons
7-Zip 7z a -v10m archive.7z folder Better compression Windows-centric
RAR rar a -v10m archive.rar folder Strong encryption Proprietary
tar + split tar cf - folder | split -b 10m - archive.tar. Unix native No compression