When dealing with large datasets or numerous files, we often need to create compressed archives. However, many systems have limitations on maximum file upload sizes (e.g., email attachments or cloud storage). A common requirement is to split files into multiple ZIP archives when they exceed a certain size threshold (like 10MB).
For Windows users, the built-in zip
command doesn't support size splitting natively, but we can use alternatives:
# Using 7-Zip (Windows)
7z a -v10m archive_name.7z folder_to_compress
# Using split command (Linux/macOS)
zip -r archive_name.zip folder_to_compress
split -b 10m archive_name.zip archive_part.
Here's a more flexible Python script that handles the splitting automatically:
import os
import zipfile
from itertools import count
def zip_with_size_limit(folder_path, output_prefix, max_size_mb):
max_size = max_size_mb * 1024 * 1024 # Convert MB to bytes
current_size = 0
part_num = count(1)
current_zip = None
for root, _, files in os.walk(folder_path):
for file in files:
file_path = os.path.join(root, file)
file_size = os.path.getsize(file_path)
if current_zip is None or current_size + file_size > max_size:
if current_zip:
current_zip.close()
zip_path = f"{output_prefix}_{next(part_num)}.zip"
current_zip = zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED)
current_size = 0
current_zip.write(file_path, os.path.relpath(file_path, folder_path))
current_size += file_size
if current_zip:
current_zip.close()
# Usage example:
zip_with_size_limit('/path/to/folder', 'output_archive', 10)
Windows PowerShell provides another approach:
# PowerShell script
$source = "C:\path\to\folder"
$destination = "C:\output\archive"
$maxSize = 10MB
$files = Get-ChildItem -Path $source -File
$currentSize = 0
$part = 1
foreach ($file in $files) {
if ($currentSize + $file.Length -gt $maxSize -or $zip -eq $null) {
if ($zip -ne $null) { $zip.Dispose() }
$zip = [System.IO.Compression.ZipFile]::Open("$($destination)_$part.zip", 'Create')
$part++
$currentSize = 0
}
[System.IO.Compression.ZipFileExtensions]::CreateEntryFromFile($zip, $file.FullName, $file.Name)
$currentSize += $file.Length
}
if ($zip -ne $null) { $zip.Dispose() }
Consider these additional scenarios:
- Single files larger than the size limit (requires splitting the file itself)
- Preserving directory structure in the archives
- Adding progress indicators for large operations
- Including checksum verification
When dealing with thousands of files or very large archives:
- Batch process files to minimize memory usage
- Consider using temporary files for very large operations
- Implement parallel processing where possible
In data processing workflows, we often encounter situations where we need to archive files while adhering to strict size limitations. This requirement is particularly common when:
- Preparing attachments for email systems with size restrictions
- Uploading to systems with maximum file size limits
- Creating backups optimized for storage media
The most efficient approach is using the zip
command with the -s
(split size) parameter:
zip -r -s 10m archive_name.zip source_folder/
This creates multiple ZIP files (archive_name.z01, archive_name.z02, etc.) each capped at 10MB. To extract:
zip -s 0 archive_name.zip --out complete.zip
unzip complete.zip
For more control, here's a Python script using the zipfile module:
import os
import zipfile
def zip_with_size_limit(source_folder, output_base, max_size_mb):
max_size = max_size_mb * 1024 * 1024
current_zip = None
current_size = 0
part_num = 1
for root, _, files in os.walk(source_folder):
for file in files:
file_path = os.path.join(root, file)
file_size = os.path.getsize(file_path)
if current_zip is None or current_size + file_size > max_size:
if current_zip:
current_zip.close()
zip_name = f"{output_base}_{part_num}.zip"
current_zip = zipfile.ZipFile(zip_name, 'w', zipfile.ZIP_DEFLATED)
part_num += 1
current_size = 0
current_zip.write(file_path, os.path.relpath(file_path, source_folder))
current_size += file_size
if current_zip:
current_zip.close()
# Usage example:
zip_with_size_limit('/path/to/source', 'output_archive', 10)
When implementing size-limited archiving, consider these factors:
# Bash script example with additional checks
MAX_SIZE=10M
COUNTER=1
CURRENT_SIZE=0
CURRENT_ZIP="archive_part_${COUNTER}.zip"
find source_folder/ -type f | while read file; do
file_size=$(stat -c%s "$file")
if [ $((CURRENT_SIZE + file_size)) -gt $((10 * 1024 * 1024)) ]; then
COUNTER=$((COUNTER + 1))
CURRENT_ZIP="archive_part_${COUNTER}.zip"
CURRENT_SIZE=0
fi
zip -q -r "$CURRENT_ZIP" "$file"
CURRENT_SIZE=$((CURRENT_SIZE + file_size))
done
Special situations to account for:
- Single files larger than the size limit (requires splitting at binary level)
- Preserving file permissions and metadata
- Handling Unicode filenames across different systems
Tool | Command | Pros | Cons |
---|---|---|---|
7-Zip | 7z a -v10m archive.7z folder |
Better compression | Windows-centric |
RAR | rar a -v10m archive.rar folder |
Strong encryption | Proprietary |
tar + split | tar cf - folder | split -b 10m - archive.tar. |
Unix native | No compression |