We've all encountered this situation: you need to feed data to a command-line tool that stubbornly insists on reading from a file rather than standard input. The classic example is when you want to monitor progress with pv
while processing a file:
# What we WANT to do (but can't, because foo expects a filename):
pv large_file.log | foo
The PHP script (or any similar program) might be using file-specific operations like fseek()
or ftell()
, making it impossible to use standard pipes directly.
Linux provides special files in the /dev
directory that can solve this exact problem:
pv large_file.log | foo /dev/stdin
This works because /dev/stdin
is a symbolic link to the current process's standard input (file descriptor 0). When foo
opens this "file", it's actually reading from the pipe.
For commands that absolutely need a regular file (not just any file descriptor), we can use process substitution:
foo <(pv large_file.log)
This creates a named pipe (FIFO) that appears as a file to foo
while being fed by pv
in the background.
Some programs may perform multiple passes over the input file or use memory mapping. In these cases, you might need:
# Using a temporary file (when multiple reads are needed)
pv large_file.log > tempfile && foo tempfile && rm tempfile
# Using bash's anonymous temporary files (Linux only)
foo <(pv large_file.log | tee >(sleep infinity))
While convenient, remember that:
/dev/stdin
approach has minimal overhead- Process substitution creates actual pipes with some buffering
- Temporary files use disk I/O but allow random access
Here's how I recently processed a 10GB CSV file while monitoring progress:
pv huge_data.csv | csvtool --file /dev/stdin process-each-row
Many Linux commands and scripts are designed to work with files rather than standard input. This becomes problematic when you want to:
- Pipe data from one command to another
- Monitor progress with tools like
pv
- Process transformed data without creating temporary files
Most Unix-like systems provide special device files that can solve this problem:
# Basic usage with /dev/stdin
command_generating_output | target_command /dev/stdin
# Practical example with pv
pv large_file.log | processing_script /dev/stdin
For commands that strictly require actual filenames, consider these solutions:
Process Substitution
# For commands that need multiple inputs
diff <(command1) <(command2)
# With progress monitoring
process_file <(pv input_file)
Named Pipes (FIFOs)
mkfifo my_pipe
command1 > my_pipe &
command2 my_pipe
rm my_pipe
For PHP scripts that expect filenames, you might need to modify the script slightly:
<?php
// In your PHP script, handle STDIN case
if ($argv[1] === '-' || $argv[1] === '/dev/stdin') {
$handle = fopen('php://stdin', 'r');
} else {
$handle = fopen($argv[1], 'r');
}
?>
When processing large files:
/dev/stdin
adds minimal overhead- Named pipes work well for inter-process communication
- For maximum performance, consider modifying the target command to accept stdin directly
While /dev/stdin
works on most Unix systems, alternatives include:
# POSIX compliant alternative
command_generating_output | target_command /dev/fd/0
# Windows (WSL/Cygwin) compatibility
command_generating_output | target_command /proc/self/fd/0