While tools like Google Analytics provide excellent user behavior tracking, they miss critical server-side data that's essential for infrastructure optimization:
- Exact file request frequencies (including static assets)
- Server-generated 404/50x errors not caught by client-side JS
- Request patterns for non-HTML resources (API endpoints, media files)
- Geographic distribution at the TCP/IP level
Here are three practical ways to extract meaningful insights from your Nginx logs:
1. Command Line Power Tools
Basic but effective for quick insights:
# Top requested files
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -20
# 404 errors with originating URLs
grep ' 404 ' /var/log/nginx/access.log | awk '{print $7, $11}' | sort | uniq -c | sort -nr
# Requests by HTTP status
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -nr
2. GoAccess - Real-time Terminal Dashboard
Installation and basic usage:
# Install
sudo apt-get install goaccess
# Generate HTML report
zcat /var/log/nginx/access.log.*.gz | goaccess -a -o report.html --log-format=COMBINED
# Real-time monitoring
goaccess /var/log/nginx/access.log -o /var/www/html/report.html --real-time-html
3. ELK Stack for Enterprise-grade Analysis
Sample Filebeat configuration for Nginx (/etc/filebeat/filebeat.yml):
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/access.log
fields:
type: nginx-access
processors:
- dissect:
tokenizer: "%{} %{} %{} [%{}] \"%{} %{} HTTP/%{}\" %{} %{} \"%{}\" \"%{}\""
field: "message"
target_prefix: "nginx"
For teams needing turnkey solutions:
- Loggly: Cloud-based with Nginx-specific dashboards
- Papertrail: Simple log aggregation with alerting
- Datadog: Full-stack observability including log correlation
For custom reporting needs:
from collections import defaultdict
import re
log_pattern = re.compile(r'(\S+) (\S+) (\S+) $$(.*?)$$ "(.*?)" (\d+) (\d+) "(.*?)" "(.*?)"')
def analyze_nginx_log(file_path):
hits = defaultdict(int)
errors = defaultdict(int)
with open(file_path) as f:
for line in f:
match = log_pattern.match(line)
if match:
status = int(match.group(6))
resource = match.group(5).split()[1] if ' ' in match.group(5) else match.group(5)
hits[resource] += 1
if status >= 400:
errors[(resource, status)] += 1
return hits, errors
Ensure your analysis tools can handle rotated logs:
# Analyze multiple compressed logs
zgrep ' 404 ' /var/log/nginx/access.log.*.gz | awk '{print $7}' | sort | uniq -c
# Continuous monitoring with inotify
inotifywait -m /var/log/nginx -e create |
while read path action file; do
if [[ "$file" =~ access.log ]]; then
# Trigger analysis
fi
done
Unlike Apache with built-in CGI support for tools like AWStats, Nginx requires alternative approaches for log analysis. The need to track file request patterns and error responses (especially 404s) goes beyond what standard analytics tools like Google Analytics provide.
First, ensure your Nginx log format captures sufficient detail. Here's an enhanced configuration example:
http {
log_format detailed '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'$request_time $upstream_response_time';
access_log /var/log/nginx/access.log detailed;
}
For quick insights, GNU command line tools offer powerful analysis:
# Top requested files
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -20
# 404 Error analysis
grep ' 404 ' /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -nr
GoAccess provides terminal-based and HTML visualization:
# Installation
sudo apt-get install goaccess
# Generate HTML report
zcat /var/log/nginx/access.log.*.gz | goaccess -a -o report.html --log-format=COMBINED
For large-scale deployments, consider Elasticsearch-Logstash-Kibana:
# Sample Logstash configuration
input {
file {
path => "/var/log/nginx/access.log"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "%{IPORHOST:clientip} - %{USER:ident} $$%{HTTPDATE:timestamp}$$ \"%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response} %{NUMBER:bytes} \"%{DATA:referrer}\" \"%{DATA:agent}\"" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
For managed solutions without server maintenance:
- Datadog Log Management
- Loggly
- Papertrail
- Splunk
Create custom alerts for critical errors using Fail2Ban:
# /etc/fail2ban/filter.d/nginx-404.conf
[Definition]
failregex = ^ .* "GET .*" 404 .*$
# /etc/fail2ban/jail.local
[nginx-404]
enabled = true
port = http,https
filter = nginx-404
logpath = /var/log/nginx/access.log
maxretry = 5