How to Calculate Average, Max and Min from List of Numbers in Bash Pipeline


2 views

When working with command-line data processing, we often need to extract statistical metrics from numerical outputs. Here's a robust way to compute average, maximum, and minimum values from a stream of numbers in Bash.

your_command_producing_numbers | awk '
BEGIN {
    sum = 0
    count = 0
    min = ""
    max = ""
}
{
    if (min == "" || $1 < min) min = $1
    if (max == "" || $1 > max) max = $1
    sum += $1
    count++
}
END {
    printf "Min: %.2f\nMax: %.2f\nAverage: %.2f\n", min, max, sum/count
}'

Let's say you're processing log files to extract response times:

grep "response_time" app.log | awk '{print $NF}' | sed 's/ms//' | \
awk 'BEGIN {sum=0; count=0; min=""; max=""} 
     {
         if (min=="" || $1max) max=$1
         sum+=$1; count++
     } 
     END {
         print "Statistics:"
         printf "Min: %.2fms\nMax: %.2fms\nAvg: %.2fms\n", min, max, sum/count
     }'

For more complex statistics, consider these tools:

Using datamash

your_command | datamash min 1 max 1 mean 1

Using R in the Pipeline

your_command | Rscript -e 'd <- scan("stdin"); summary(d)'

The pure AWK solution is most efficient for large datasets as it processes data in a single pass with minimal memory overhead. The other methods may be more readable but have higher startup costs.


When working with command-line data processing, we often need to extract statistical metrics from numerical outputs. Here's a comprehensive solution for calculating average, maximum, and minimum values from a stream of numbers in Bash.

The most efficient approach combines awk with basic shell utilities:

your_command | awk '
BEGIN {
    sum = 0
    count = 0
    max = -inf
    min = +inf
}
{
    sum += $1
    count++
    if ($1 > max) max = $1
    if ($1 < min) min = $1
}
END {
    printf "Average: %.2f\nMax: %.2f\nMin: %.2f\n", sum/count, max, min
}'

For those preferring separate commands:

# Get count and sum for average
your_command | awk '{sum+=$1} END {print "Average:",sum/NR}'

# Get maximum value
your_command | sort -n | tail -1

# Get minimum value
your_command | sort -n | head -1

To make the solution more robust:

your_command | awk '
BEGIN {
    sum = 0
    count = 0
    max = -inf
    min = +inf
}
NF && $1 ~ /^-?[0-9]+(\.[0-9]+)?$/ {
    sum += $1
    count++
    if (count == 1) { max = $1; min = $1 }
    if ($1 > max) max = $1
    if ($1 < min) min = $1
}
END {
    if (count > 0) {
        printf "Average: %.2f\nMax: %.2f\nMin: %.2f\n", sum/count, max, min
    } else {
        print "No valid numbers found"
    }
}'

For large datasets, the single-pass awk solution is significantly faster than multiple sort operations. The awk script processes numbers in O(n) time while sort-based solutions require O(n log n) time.

Processing server response times:

grep 'response_time' logfile | awk '{print $NF}' | awk '
BEGIN {sum=count=0; max=-inf; min=+inf}
{
    sum+=$1; count++
    if (count==1) {max=$1; min=$1}
    if ($1>max) max=$1
    if ($1