How to Sequentially Download URLs from a File Using curl with Single-Threaded Execution


2 views

When dealing with batch downloads from a URL list file, many developers encounter these common requirements:

  • Processing URLs sequentially (one-at-a-time)
  • Preventing parallel downloads that might overload servers
  • Maintaining simple bash-based solutions without complex dependencies

Here's the most straightforward approach using a while loop in bash:

while read -r url; do
    curl -O "$url"
done < urls.txt

Key components:

  • -O flag saves files with remote names
  • -r prevents backslash interpretation
  • Process substitution handles the file input

For production environments, consider adding these improvements:

while read -r url; do
    if ! curl -fL --connect-timeout 30 --retry 3 -O "$url"; then
        echo "Failed to download: $url" >> download_errors.log
    fi
    sleep 1 # Optional delay between requests
done < urls.txt

New flags explained:

  • -f: Fails silently on server errors
  • -L: Follows redirects
  • --connect-timeout: Sets timeout threshold
  • --retry: Attempts retries on failures

For those preferring xargs syntax:

cat urls.txt | xargs -n 1 -P 1 curl -O

Where:

  • -n 1 passes one URL at a time
  • -P 1 ensures single-process execution

To create a clean download log:

while read -r url; do
    filename=$(basename "$url")
    echo "Downloading $filename..."
    curl -s -O "$url" && echo "$url -> $filename" >> download_success.log
done < urls.txt

For large files that might interrupt:

while read -r url; do
    curl -C - -O "$url"
done < urls.txt

The -C - flag tells curl to automatically resume interrupted downloads.


When automating downloads from a list of URLs, we often face two critical requirements:

  1. Processing URLs sequentially (one after another)
  2. Ensuring each download completes before starting the next

This becomes particularly important when dealing with rate-limited APIs, fragile servers, or when maintaining download order is crucial.

curl itself doesn't have a built-in queuing system, but we can achieve sequential downloading through these methods:

Basic Sequential Download

while read url; do
    curl -O "$url"
done < urls.txt

Advanced Implementation with Error Handling

#!/bin/bash

LOG_FILE="download.log"
URL_FILE="urls.txt"

echo "$(date) Starting downloads" >> $LOG_FILE

while IFS= read -r url || [[ -n "$url" ]]; do
    filename=$(basename "$url")
    echo "Downloading $filename..." | tee -a $LOG_FILE
    
    if curl -f -L -O "$url"; then
        echo "Success: $filename" | tee -a $LOG_FILE
    else
        echo "Failed: $filename" >> $LOG_FILE
    fi
    
    sleep 1  # Optional delay between requests
done < "$URL_FILE"

echo "$(date) Download completed" >> $LOG_FILE

Using xargs for Parallel Control

For cases where limited parallelism is acceptable:

xargs -n 1 -P 1 curl -O < urls.txt

Python Implementation

For more complex scenarios:

import requests

with open('urls.txt', 'r') as f:
    urls = [line.strip() for line in f if line.strip()]

for url in urls:
    try:
        response = requests.get(url, stream=True)
        filename = url.split('/')[-1]
        with open(filename, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:
                    f.write(chunk)
        print(f"Downloaded {filename}")
    except Exception as e:
        print(f"Failed to download {url}: {str(e)}")
  • Add proper timeouts (--connect-timeout and --max-time in curl)
  • Implement retry logic for failed downloads
  • Consider bandwidth throttling (--limit-rate in curl)
  • Add user-agent rotation for web scraping scenarios

For large-scale operations:

curl -Z -K urls.txt  # For curl 7.66.0+ with limited parallelism

Or consider specialized tools like aria2c or wget2 for better performance while maintaining control.