Powerful AWK & Grep One-Liners for Real-Time Apache Log Analysis


3 views

While tools like GoAccess and AWStats provide comprehensive reports, sometimes you need surgical precision when troubleshooting live server issues. Text processing utilities give you real-time insights without configuration overhead.

This classic pipeline identifies top requesters for specific resources:

grep 'GET /admin.php' access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -10

Track traffic patterns by hour with time-based bucketing:

awk -F'[ :]' '{print $4":"$5"\t"$1}' access.log | sort | uniq -c | sort -nr

Identify suspicious user agents with high request rates:

awk -F'"' '{print $6}' access.log | sort | uniq -c | sort -nr | head -20

Create real-time error dashboards with this pipeline:

tail -f access.log | awk '{print $9}' | sort | uniq -c | sort -rn

Parse complex log formats with custom field separators:

awk -F'[ "[]' '{print $1,$7,$9,$(NF-1)}' access.log | column -t

Chain conditions to find 404s on POST requests:

awk '$9 == 404 && $6 == "\"POST"' access.log | less

Enhance your analysis with geolocation data:

awk '{print $1}' access.log | sort -u | xargs -n1 geoiplookup | sort | uniq -c | sort -nr

While comprehensive log analyzers like GoAccess or AWStats are great for historical analysis, sometimes you need immediate insights from your Apache logs. Here are my go-to command-line solutions for real-time log parsing.

For quick request counting:

# Count total requests
grep -c "" access.log

# Count requests by HTTP method
awk '{print $6}' access.log | sort | uniq -c

# Count requests by status code
awk '{print $9}' access.log | sort | uniq -c | sort -rn

Building on your example, here's a more robust version:

# Top 10 IPs accessing specific path (case insensitive)
grep -iE "/path/to/resource" access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -n 10

# With full request details
grep -iE "/path/to/resource" access.log | awk '{print $1,$6,$7,$9}' | sort | uniq -c | sort -rn | head

For analyzing recent activity (last 15 minutes):

# Using GNU date (Linux)
awk -v d1="$(date --date="-15 min" "+[%d/%b/%Y:%H:%M:%S")" -v d2="$(date "+[%d/%b/%Y:%H:%M:%S")" '$4 >= d1 && $4 <= d2' access.log

# For BSD/MacOS
awk -v d1="$(date -v-15M "+[%d/%b/%Y:%H:%M:%S")" -v d2="$(date "+[%d/%b/%Y:%H:%M:%S")" '$4 >= d1 && $4 <= d2' access.log

Detecting suspicious patterns:

# Find SQL injection attempts
grep -iE "union.*select|1=1|' OR" access.log

# Find common exploit paths
grep -iE "(phpmyadmin|wp-admin|\.env|\.git)" access.log

# Find slow requests (>5 seconds)
awk '$NF > 5 {print $0}' access.log
# Top user agents
awk -F\" '{print $6}' access.log | sort | uniq -c | sort -rn | head -n 20

# Bot traffic only
grep -i "bot" access.log | awk -F\" '{print $6}' | sort | uniq -c | sort -rn

For complex analysis:

# Requests from specific IP with 4xx/5xx errors
awk '$1 == "192.168.1.100" && ($9 ~ /^4|5/) {print $0}' access.log

# POST requests to API endpoints returning errors
grep "POST /api/" access.log | awk '$9 >= 400 {print $1,$7,$9}' | sort | uniq -c | sort -rn

For continuous monitoring, use watch:

# Refresh top IPs every 2 seconds
watch -n 2 "grep $(date +%d/%b/%Y:%H) access.log | awk '{print \$1}' | sort | uniq -c | sort -rn | head"