How to Prevent wget from Creating index.html When Redirecting Output in Cron Jobs


2 views

Many developers encounter an unexpected behavior when using wget in cron jobs. While the command works perfectly in interactive shell sessions:

wget example.com > /dev/null 2>&1

The same command in cron:

*/5 * * * * wget example.com > /dev/null 2>&1

Mysteriously creates index.html files in the user's home directory, despite the output redirection.

The issue stems from how wget handles output when running non-interactively. Unlike curl which defaults to silent output, wget has different behavior in cron environments due to:

  • Different environment variables
  • Absence of terminal connection
  • Default output handling in non-interactive mode

Here are three reliable approaches to solve this issue:

Method 1: Use --output-document

The most straightforward solution:

*/5 * * * * wget --output-document=/dev/null example.com > /dev/null 2>&1

Method 2: Silent Mode with -q

For completely silent operation:

*/5 * * * * wget -q -O /dev/null example.com

Method 3: Change Working Directory

If you must keep the output:

*/5 * * * * cd /tmp && wget example.com

For production systems, consider these additional best practices:

*/5 * * * * /usr/bin/wget --user-agent="Cron-job" --tries=1 --timeout=30 -q -O /dev/null https://example.com/ping

This configuration includes:

  • Full path to wget
  • Custom user agent for identification
  • Timeout and retry settings
  • Silent operation with output to /dev/null

For simple health checks, consider these alternatives to wget:

# Using curl
*/5 * * * * curl -s -o /dev/null https://example.com

# Using httping
*/5 * * * * httping -c 1 -q https://example.com > /dev/null

Many developers encounter this curious behavior when scheduling wget commands via cron. While the command works perfectly in interactive shell sessions, running it through cron mysteriously creates index.html files. Let's dissect why this happens and implement robust solutions.

The fundamental issue stems from how wget handles output redirection in different environments. When you run:

wget mysite.com > /dev/null 2>&1

In an interactive shell, the shell handles the redirection before executing wget. However, cron executes commands directly without a full shell environment by default, causing wget to fall back to its default behavior of saving downloaded content.

Here are several professional-grade approaches to solve this issue:

1. Explicit Output Control with wget Flags

The cleanest solution is using wget's built-in output control:

*/5 * * * * wget -q -O /dev/null mysite.com

Key flags:
- -q: Quiet mode (suppresses output)
- -O /dev/null: Explicit output destination

2. Force Shell Interpretation

Alternatively, ensure cron uses shell interpretation:

*/5 * * * * /bin/sh -c 'wget mysite.com > /dev/null 2>&1'

3. Using curl Instead

For modern systems, curl often provides better behavior:

*/5 * * * * curl -s -o /dev/null mysite.com

For production monitoring scenarios, consider these enhancements:

*/5 * * * * wget -q -O /dev/null --method=HEAD mysite.com || service passenger restart

This version:
- Uses HEAD method to reduce bandwidth
- Includes failover logic
- Still prevents file creation