While tools like GoAccess and AWStats provide comprehensive reports, sometimes you need surgical precision when troubleshooting live server issues. Text processing utilities give you real-time insights without configuration overhead.
This classic pipeline identifies top requesters for specific resources:
grep 'GET /admin.php' access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -10
Track traffic patterns by hour with time-based bucketing:
awk -F'[ :]' '{print $4":"$5"\t"$1}' access.log | sort | uniq -c | sort -nr
Identify suspicious user agents with high request rates:
awk -F'"' '{print $6}' access.log | sort | uniq -c | sort -nr | head -20
Create real-time error dashboards with this pipeline:
tail -f access.log | awk '{print $9}' | sort | uniq -c | sort -rn
Parse complex log formats with custom field separators:
awk -F'[ "[]' '{print $1,$7,$9,$(NF-1)}' access.log | column -t
Chain conditions to find 404s on POST requests:
awk '$9 == 404 && $6 == "\"POST"' access.log | less
Enhance your analysis with geolocation data:
awk '{print $1}' access.log | sort -u | xargs -n1 geoiplookup | sort | uniq -c | sort -nr
While comprehensive log analyzers like GoAccess or AWStats are great for historical analysis, sometimes you need immediate insights from your Apache logs. Here are my go-to command-line solutions for real-time log parsing.
For quick request counting:
# Count total requests
grep -c "" access.log
# Count requests by HTTP method
awk '{print $6}' access.log | sort | uniq -c
# Count requests by status code
awk '{print $9}' access.log | sort | uniq -c | sort -rn
Building on your example, here's a more robust version:
# Top 10 IPs accessing specific path (case insensitive)
grep -iE "/path/to/resource" access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -n 10
# With full request details
grep -iE "/path/to/resource" access.log | awk '{print $1,$6,$7,$9}' | sort | uniq -c | sort -rn | head
For analyzing recent activity (last 15 minutes):
# Using GNU date (Linux)
awk -v d1="$(date --date="-15 min" "+[%d/%b/%Y:%H:%M:%S")" -v d2="$(date "+[%d/%b/%Y:%H:%M:%S")" '$4 >= d1 && $4 <= d2' access.log
# For BSD/MacOS
awk -v d1="$(date -v-15M "+[%d/%b/%Y:%H:%M:%S")" -v d2="$(date "+[%d/%b/%Y:%H:%M:%S")" '$4 >= d1 && $4 <= d2' access.log
Detecting suspicious patterns:
# Find SQL injection attempts
grep -iE "union.*select|1=1|' OR" access.log
# Find common exploit paths
grep -iE "(phpmyadmin|wp-admin|\.env|\.git)" access.log
# Find slow requests (>5 seconds)
awk '$NF > 5 {print $0}' access.log
# Top user agents
awk -F\" '{print $6}' access.log | sort | uniq -c | sort -rn | head -n 20
# Bot traffic only
grep -i "bot" access.log | awk -F\" '{print $6}' | sort | uniq -c | sort -rn
For complex analysis:
# Requests from specific IP with 4xx/5xx errors
awk '$1 == "192.168.1.100" && ($9 ~ /^4|5/) {print $0}' access.log
# POST requests to API endpoints returning errors
grep "POST /api/" access.log | awk '$9 >= 400 {print $1,$7,$9}' | sort | uniq -c | sort -rn
For continuous monitoring, use watch:
# Refresh top IPs every 2 seconds
watch -n 2 "grep $(date +%d/%b/%Y:%H) access.log | awk '{print \$1}' | sort | uniq -c | sort -rn | head"