When working with command-line data processing, we often need to extract statistical metrics from numerical outputs. Here's a robust way to compute average, maximum, and minimum values from a stream of numbers in Bash.
your_command_producing_numbers | awk '
BEGIN {
sum = 0
count = 0
min = ""
max = ""
}
{
if (min == "" || $1 < min) min = $1
if (max == "" || $1 > max) max = $1
sum += $1
count++
}
END {
printf "Min: %.2f\nMax: %.2f\nAverage: %.2f\n", min, max, sum/count
}'
Let's say you're processing log files to extract response times:
grep "response_time" app.log | awk '{print $NF}' | sed 's/ms//' | \
awk 'BEGIN {sum=0; count=0; min=""; max=""}
{
if (min=="" || $1max) max=$1
sum+=$1; count++
}
END {
print "Statistics:"
printf "Min: %.2fms\nMax: %.2fms\nAvg: %.2fms\n", min, max, sum/count
}'
For more complex statistics, consider these tools:
Using datamash
your_command | datamash min 1 max 1 mean 1
Using R in the Pipeline
your_command | Rscript -e 'd <- scan("stdin"); summary(d)'
The pure AWK solution is most efficient for large datasets as it processes data in a single pass with minimal memory overhead. The other methods may be more readable but have higher startup costs.
When working with command-line data processing, we often need to extract statistical metrics from numerical outputs. Here's a comprehensive solution for calculating average, maximum, and minimum values from a stream of numbers in Bash.
The most efficient approach combines awk
with basic shell utilities:
your_command | awk '
BEGIN {
sum = 0
count = 0
max = -inf
min = +inf
}
{
sum += $1
count++
if ($1 > max) max = $1
if ($1 < min) min = $1
}
END {
printf "Average: %.2f\nMax: %.2f\nMin: %.2f\n", sum/count, max, min
}'
For those preferring separate commands:
# Get count and sum for average
your_command | awk '{sum+=$1} END {print "Average:",sum/NR}'
# Get maximum value
your_command | sort -n | tail -1
# Get minimum value
your_command | sort -n | head -1
To make the solution more robust:
your_command | awk '
BEGIN {
sum = 0
count = 0
max = -inf
min = +inf
}
NF && $1 ~ /^-?[0-9]+(\.[0-9]+)?$/ {
sum += $1
count++
if (count == 1) { max = $1; min = $1 }
if ($1 > max) max = $1
if ($1 < min) min = $1
}
END {
if (count > 0) {
printf "Average: %.2f\nMax: %.2f\nMin: %.2f\n", sum/count, max, min
} else {
print "No valid numbers found"
}
}'
For large datasets, the single-pass awk solution is significantly faster than multiple sort operations. The awk script processes numbers in O(n) time while sort-based solutions require O(n log n) time.
Processing server response times:
grep 'response_time' logfile | awk '{print $NF}' | awk '
BEGIN {sum=count=0; max=-inf; min=+inf}
{
sum+=$1; count++
if (count==1) {max=$1; min=$1}
if ($1>max) max=$1
if ($1