How to Pipe Output to a Command That Expects a Filename in Linux/Bash


2 views

We've all encountered this situation: you need to feed data to a command-line tool that stubbornly insists on reading from a file rather than standard input. The classic example is when you want to monitor progress with pv while processing a file:

# What we WANT to do (but can't, because foo expects a filename):
pv large_file.log | foo

The PHP script (or any similar program) might be using file-specific operations like fseek() or ftell(), making it impossible to use standard pipes directly.

Linux provides special files in the /dev directory that can solve this exact problem:

pv large_file.log | foo /dev/stdin

This works because /dev/stdin is a symbolic link to the current process's standard input (file descriptor 0). When foo opens this "file", it's actually reading from the pipe.

For commands that absolutely need a regular file (not just any file descriptor), we can use process substitution:

foo <(pv large_file.log)

This creates a named pipe (FIFO) that appears as a file to foo while being fed by pv in the background.

Some programs may perform multiple passes over the input file or use memory mapping. In these cases, you might need:

# Using a temporary file (when multiple reads are needed)
pv large_file.log > tempfile && foo tempfile && rm tempfile

# Using bash's anonymous temporary files (Linux only)
foo <(pv large_file.log | tee >(sleep infinity))

While convenient, remember that:

  • /dev/stdin approach has minimal overhead
  • Process substitution creates actual pipes with some buffering
  • Temporary files use disk I/O but allow random access

Here's how I recently processed a 10GB CSV file while monitoring progress:

pv huge_data.csv | csvtool --file /dev/stdin process-each-row

Many Linux commands and scripts are designed to work with files rather than standard input. This becomes problematic when you want to:

  • Pipe data from one command to another
  • Monitor progress with tools like pv
  • Process transformed data without creating temporary files

Most Unix-like systems provide special device files that can solve this problem:

# Basic usage with /dev/stdin
command_generating_output | target_command /dev/stdin

# Practical example with pv
pv large_file.log | processing_script /dev/stdin

For commands that strictly require actual filenames, consider these solutions:

Process Substitution

# For commands that need multiple inputs
diff <(command1) <(command2)

# With progress monitoring
process_file <(pv input_file)

Named Pipes (FIFOs)

mkfifo my_pipe
command1 > my_pipe &
command2 my_pipe
rm my_pipe

For PHP scripts that expect filenames, you might need to modify the script slightly:

<?php
// In your PHP script, handle STDIN case
if ($argv[1] === '-' || $argv[1] === '/dev/stdin') {
    $handle = fopen('php://stdin', 'r');
} else {
    $handle = fopen($argv[1], 'r');
}
?>

When processing large files:

  • /dev/stdin adds minimal overhead
  • Named pipes work well for inter-process communication
  • For maximum performance, consider modifying the target command to accept stdin directly

While /dev/stdin works on most Unix systems, alternatives include:

# POSIX compliant alternative
command_generating_output | target_command /dev/fd/0

# Windows (WSL/Cygwin) compatibility
command_generating_output | target_command /proc/self/fd/0