When dealing with batch downloads from a URL list file, many developers encounter these common requirements:
- Processing URLs sequentially (one-at-a-time)
- Preventing parallel downloads that might overload servers
- Maintaining simple bash-based solutions without complex dependencies
Here's the most straightforward approach using a while loop in bash:
while read -r url; do
curl -O "$url"
done < urls.txt
Key components:
-O
flag saves files with remote names-r
prevents backslash interpretation- Process substitution handles the file input
For production environments, consider adding these improvements:
while read -r url; do
if ! curl -fL --connect-timeout 30 --retry 3 -O "$url"; then
echo "Failed to download: $url" >> download_errors.log
fi
sleep 1 # Optional delay between requests
done < urls.txt
New flags explained:
-f
: Fails silently on server errors-L
: Follows redirects--connect-timeout
: Sets timeout threshold--retry
: Attempts retries on failures
For those preferring xargs syntax:
cat urls.txt | xargs -n 1 -P 1 curl -O
Where:
-n 1
passes one URL at a time-P 1
ensures single-process execution
To create a clean download log:
while read -r url; do
filename=$(basename "$url")
echo "Downloading $filename..."
curl -s -O "$url" && echo "$url -> $filename" >> download_success.log
done < urls.txt
For large files that might interrupt:
while read -r url; do
curl -C - -O "$url"
done < urls.txt
The -C -
flag tells curl to automatically resume interrupted downloads.
When automating downloads from a list of URLs, we often face two critical requirements:
- Processing URLs sequentially (one after another)
- Ensuring each download completes before starting the next
This becomes particularly important when dealing with rate-limited APIs, fragile servers, or when maintaining download order is crucial.
curl itself doesn't have a built-in queuing system, but we can achieve sequential downloading through these methods:
Basic Sequential Download
while read url; do curl -O "$url" done < urls.txt
Advanced Implementation with Error Handling
#!/bin/bash LOG_FILE="download.log" URL_FILE="urls.txt" echo "$(date) Starting downloads" >> $LOG_FILE while IFS= read -r url || [[ -n "$url" ]]; do filename=$(basename "$url") echo "Downloading $filename..." | tee -a $LOG_FILE if curl -f -L -O "$url"; then echo "Success: $filename" | tee -a $LOG_FILE else echo "Failed: $filename" >> $LOG_FILE fi sleep 1 # Optional delay between requests done < "$URL_FILE" echo "$(date) Download completed" >> $LOG_FILE
Using xargs for Parallel Control
For cases where limited parallelism is acceptable:
xargs -n 1 -P 1 curl -O < urls.txt
Python Implementation
For more complex scenarios:
import requests with open('urls.txt', 'r') as f: urls = [line.strip() for line in f if line.strip()] for url in urls: try: response = requests.get(url, stream=True) filename = url.split('/')[-1] with open(filename, 'wb') as f: for chunk in response.iter_content(chunk_size=8192): if chunk: f.write(chunk) print(f"Downloaded {filename}") except Exception as e: print(f"Failed to download {url}: {str(e)}")
- Add proper timeouts (--connect-timeout and --max-time in curl)
- Implement retry logic for failed downloads
- Consider bandwidth throttling (--limit-rate in curl)
- Add user-agent rotation for web scraping scenarios
For large-scale operations:
curl -Z -K urls.txt # For curl 7.66.0+ with limited parallelism
Or consider specialized tools like aria2c or wget2 for better performance while maintaining control.