Efficient Bulk File Downloading Using curl and xargs: Solving Multiple URL Processing Issues


8 views

When trying to download multiple files listed in a text file, many developers intuitively try this common pattern:

cat listfile.txt | xargs curl -O

While this works for the first URL, subsequent files get printed to stdout instead of being downloaded properly. This happens because curl's -O (remote file name) option only applies to the first URL when processing multiple URLs in a single command.

The correct approach leverages xargs to spawn multiple curl processes (one per URL) while preserving the -O behavior for each file:

xargs -n 1 curl -O < listfile.txt

Key components:

  • -n 1 tells xargs to pass exactly one argument per command
  • Input redirection (<) is cleaner than using cat
  • Each curl instance gets its own -O option

For better performance and control, consider these enhanced versions:

# Download with 4 parallel connections
xargs -P 4 -n 1 curl -O < listfile.txt

# Add progress bars and retry failed downloads
xargs -P 4 -n 1 curl -O --progress-bar --retry 3 < listfile.txt

# Custom output directory
mkdir -p downloads && xargs -n 1 -I {} curl -o downloads/{/} {} < listfile.txt

For URLs requiring authentication or special headers:

# With basic auth
xargs -n 1 curl -O -u user:password < listfile.txt

# With custom headers
xargs -n 1 curl -O -H "Authorization: Bearer token" < listfile.txt

While xargs+curl is versatile, here are two alternatives:

# Using GNU parallel (faster for many files)
parallel -j 4 curl -O {} :::: listfile.txt

# Pure bash approach
while read url; do curl -O "$url"; done < listfile.txt

For production use, add proper error handling:

# Log successes and failures separately
xargs -n 1 sh -c 'curl -O "$0" && echo "$0" >> success.log || echo "$0" >> failed.log' < listfile.txt

# Continue past errors with email notification
xargs -P 4 -n 1 curl -f -O || mail -s "Download errors" admin@example.com

When trying to download multiple files using a simple command like:

cat listfile.txt | xargs curl -O

The first file downloads correctly, but subsequent files get printed to stdout instead of being saved. This happens because cURL's -O flag only applies to the first URL when multiple URLs are passed in a single command.

To make cURL process each URL separately with the -O flag, we need xargs to pass one URL at a time:

cat listfile.txt | xargs -n 1 curl -O

The -n 1 parameter tells xargs to pass just one argument per command invocation.

For more control over the download process, consider these enhanced versions:

# With parallel downloads (4 at a time)
cat listfile.txt | xargs -n 1 -P 4 curl -O

# With silent mode and retry on failure
cat listfile.txt | xargs -n 1 curl -fSLO --retry 3

# With custom output naming
cat listfile.txt | xargs -n 1 -I {} curl -o "{##*/}" {}

If your URLs contain special characters, use -d '\n' to properly handle line endings:

xargs -d '\n' -n 1 curl -O < listfile.txt

For complex scenarios, a while-read loop offers more flexibility:

while IFS= read -r url; do
  curl -O "$url"
done < listfile.txt

This method is particularly useful when you need to perform additional processing for each URL.