Apache Log Analysis: Measuring Requests Per Second (req/sec) for Production Stress Testing


2 views

When analyzing Apache 2.2 logs for request rate measurement, the key fields in your LogFormat are:

%t - Time when request was received (for request counting)
%D - Time taken to process request in microseconds (for performance analysis)

Here's a practical approach using common Unix tools:

# Count requests per second from access.log
awk -F'[:[]' '{print $2}' access.log | cut -d' ' -f1 | sort | uniq -c

# Breakdown example output:
# 42 02/Nov/2023:15:00:01
# 38 02/Nov/2023:15:00:02
# 45 02/Nov/2023:15:00:03

For more precise measurements across specific time intervals:

#!/bin/bash
# Analyze req/sec in 10-second buckets
awk -F'[:[]' '{bucket=int(substr($2,15,2)/10)*10; counts[bucket]++} 
END {for (b in counts) print b"-"b+9"s: "counts[b]/10" req/sec"}' access.log

To correlate request rate with performance (%D field):

# Generate req/sec with average response time
awk -F' ' '{split($4,datetime,":"); 
   sec=datetime[3];
   count[sec]++;
   total_time[sec]+=$12;
} 
END {
   for (s in count) {
      printf "%s: %.2f req/sec (avg %dμs)\n", 
             s, count▼显示, total_time▼显示/count▼显示
   }
}' access.log

For continuous monitoring across rotated logs:

# Real-time monitoring using tail
tail -f /var/log/apache2/access.log | awk -F'[:[]' '{
   current_second=substr($2,15,2);
   if (current_second != last_second) {
      print last_second": "count" req/sec";
      count=0;
      last_second=current_second;
   }
   count++;
}'

For longer-term analysis, consider this R approach:

library(ggplot2)
logs <- read.table("access.log", sep=" ", col.names=c("vhost","port","ip","user",
           "time","request","status","size","referer","ua","latency"))
logs$time <- as.POSIXct(logs$time, format="[%d/%b/%Y:%H:%M:%S")
logs$second <- format(logs$time, "%H:%M:%S")

req_rate <- aggregate(request~second, data=logs, FUN=length)
ggplot(req_rate, aes(x=second, y=request)) + 
   geom_line(group=1) + 
   labs(title="Requests per Second", y="req/sec")

The key to measuring requests per second lies in properly configuring your Apache log format. Based on your example, you've already included the crucial elements:

LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" %D" combined

The relevant parameters for RPS calculation are:

  • %t: Time when the request was received
  • %D: Time taken to serve the request in microseconds

Here's a simple AWK script to calculate RPS from your access logs:

# Calculate requests per second from Apache logs
awk -F'[][]' '{
    split($2, parts, ":")
    date_str = parts[1]
    time_str = parts[2]
    gsub(/:/, " ", time_str)
    timestamp = mktime(date_str " " time_str)
    counts[timestamp]++
}
END {
    for (ts in counts) {
        printf "%s: %d requests\n", strftime("%Y-%m-%d %H:%M:%S", ts), counts[ts]
    }
}' /var/log/apache2/access.log

For more sophisticated analysis, here's a Python script that calculates RPS and provides distribution:

import re
from collections import defaultdict
from datetime import datetime

log_pattern = re.compile(
    r'\[(?P<timestamp>[^\]]+)\] '
    r'"(?P<request>[^"]+)" '
    r'(?P<status>\d+) '
    r'(?P<size>\d+) '
    r'"(?P<referer>[^"]*)" '
    r'"(?P<user_agent>[^"]*)" '
    r'(?P<time_taken>\d+)'
)

def calculate_rps(logfile_path):
    time_counts = defaultdict(int)
    
    with open(logfile_path) as f:
        for line in f:
            match = log_pattern.search(line)
            if match:
                log_data = match.groupdict()
                timestamp = datetime.strptime(
                    log_data['timestamp'], 
                    '%d/%b/%Y:%H:%M:%S %z'
                )
                time_key = timestamp.strftime('%Y-%m-%d %H:%M:00')
                time_counts[time_key] += 1
    
    for minute, count in sorted(time_counts.items()):
        print(f"{minute}: {count} requests ({count/60:.2f} req/s)")

if __name__ == "__main__":
    calculate_rps('/var/log/apache2/access.log')

For a ready-made solution, GoAccess provides excellent RPS metrics:

goaccess /var/log/apache2/access.log --log-format='%v:%p %h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i" %D' --date-format='%d/%b/%Y' --time-format='%H:%M:%S'

This will generate an interactive report showing requests per second, minute, and hour.

When comparing your stress test results with production logs:

  1. Filter logs to match your test time window
  2. Compare the RPS distribution with your test plan
  3. Check for anomalies in response times (%D) during peak periods
  4. Verify that response codes (%>s) remain stable