Powerful AWK & Grep One-Liners for Real-Time Apache Log Analysis


13 views

While tools like GoAccess and AWStats provide comprehensive reports, sometimes you need surgical precision when troubleshooting live server issues. Text processing utilities give you real-time insights without configuration overhead.

This classic pipeline identifies top requesters for specific resources:

grep 'GET /admin.php' access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -10

Track traffic patterns by hour with time-based bucketing:

awk -F'[ :]' '{print $4":"$5"\t"$1}' access.log | sort | uniq -c | sort -nr

Identify suspicious user agents with high request rates:

awk -F'"' '{print $6}' access.log | sort | uniq -c | sort -nr | head -20

Create real-time error dashboards with this pipeline:

tail -f access.log | awk '{print $9}' | sort | uniq -c | sort -rn

Parse complex log formats with custom field separators:

awk -F'[ "[]' '{print $1,$7,$9,$(NF-1)}' access.log | column -t

Chain conditions to find 404s on POST requests:

awk '$9 == 404 && $6 == "\"POST"' access.log | less

Enhance your analysis with geolocation data:

awk '{print $1}' access.log | sort -u | xargs -n1 geoiplookup | sort | uniq -c | sort -nr

While comprehensive log analyzers like GoAccess or AWStats are great for historical analysis, sometimes you need immediate insights from your Apache logs. Here are my go-to command-line solutions for real-time log parsing.

For quick request counting:

# Count total requests
grep -c "" access.log

# Count requests by HTTP method
awk '{print $6}' access.log | sort | uniq -c

# Count requests by status code
awk '{print $9}' access.log | sort | uniq -c | sort -rn

Building on your example, here's a more robust version:

# Top 10 IPs accessing specific path (case insensitive)
grep -iE "/path/to/resource" access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -n 10

# With full request details
grep -iE "/path/to/resource" access.log | awk '{print $1,$6,$7,$9}' | sort | uniq -c | sort -rn | head

For analyzing recent activity (last 15 minutes):

# Using GNU date (Linux)
awk -v d1="$(date --date="-15 min" "+[%d/%b/%Y:%H:%M:%S")" -v d2="$(date "+[%d/%b/%Y:%H:%M:%S")" '$4 >= d1 && $4 <= d2' access.log

# For BSD/MacOS
awk -v d1="$(date -v-15M "+[%d/%b/%Y:%H:%M:%S")" -v d2="$(date "+[%d/%b/%Y:%H:%M:%S")" '$4 >= d1 && $4 <= d2' access.log

Detecting suspicious patterns:

# Find SQL injection attempts
grep -iE "union.*select|1=1|' OR" access.log

# Find common exploit paths
grep -iE "(phpmyadmin|wp-admin|\.env|\.git)" access.log

# Find slow requests (>5 seconds)
awk '$NF > 5 {print $0}' access.log
# Top user agents
awk -F\" '{print $6}' access.log | sort | uniq -c | sort -rn | head -n 20

# Bot traffic only
grep -i "bot" access.log | awk -F\" '{print $6}' | sort | uniq -c | sort -rn

For complex analysis:

# Requests from specific IP with 4xx/5xx errors
awk '$1 == "192.168.1.100" && ($9 ~ /^4|5/) {print $0}' access.log

# POST requests to API endpoints returning errors
grep "POST /api/" access.log | awk '$9 >= 400 {print $1,$7,$9}' | sort | uniq -c | sort -rn

For continuous monitoring, use watch:

# Refresh top IPs every 2 seconds
watch -n 2 "grep $(date +%d/%b/%Y:%H) access.log | awk '{print \$1}' | sort | uniq -c | sort -rn | head"