Nginx Log Analysis: Tracking File Hits, 404 Errors and Server Performance Metrics


2 views

While tools like Google Analytics provide excellent user behavior tracking, they miss critical server-side data that's essential for infrastructure optimization:

  • Exact file request frequencies (including static assets)
  • Server-generated 404/50x errors not caught by client-side JS
  • Request patterns for non-HTML resources (API endpoints, media files)
  • Geographic distribution at the TCP/IP level

Here are three practical ways to extract meaningful insights from your Nginx logs:

1. Command Line Power Tools

Basic but effective for quick insights:

# Top requested files
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -20

# 404 errors with originating URLs
grep ' 404 ' /var/log/nginx/access.log | awk '{print $7, $11}' | sort | uniq -c | sort -nr

# Requests by HTTP status
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -nr

2. GoAccess - Real-time Terminal Dashboard

Installation and basic usage:

# Install
sudo apt-get install goaccess

# Generate HTML report
zcat /var/log/nginx/access.log.*.gz | goaccess -a -o report.html --log-format=COMBINED

# Real-time monitoring
goaccess /var/log/nginx/access.log -o /var/www/html/report.html --real-time-html

3. ELK Stack for Enterprise-grade Analysis

Sample Filebeat configuration for Nginx (/etc/filebeat/filebeat.yml):

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log
  fields:
    type: nginx-access
  processors:
    - dissect:
        tokenizer: "%{} %{} %{} [%{}] \"%{} %{} HTTP/%{}\" %{} %{} \"%{}\" \"%{}\""
        field: "message"
        target_prefix: "nginx"

For teams needing turnkey solutions:

  • Loggly: Cloud-based with Nginx-specific dashboards
  • Papertrail: Simple log aggregation with alerting
  • Datadog: Full-stack observability including log correlation

For custom reporting needs:

from collections import defaultdict
import re

log_pattern = re.compile(r'(\S+) (\S+) (\S+) $$(.*?)$$ "(.*?)" (\d+) (\d+) "(.*?)" "(.*?)"')

def analyze_nginx_log(file_path):
    hits = defaultdict(int)
    errors = defaultdict(int)
    
    with open(file_path) as f:
        for line in f:
            match = log_pattern.match(line)
            if match:
                status = int(match.group(6))
                resource = match.group(5).split()[1] if ' ' in match.group(5) else match.group(5)
                
                hits[resource] += 1
                if status >= 400:
                    errors[(resource, status)] += 1
    
    return hits, errors

Ensure your analysis tools can handle rotated logs:

# Analyze multiple compressed logs
zgrep ' 404 ' /var/log/nginx/access.log.*.gz | awk '{print $7}' | sort | uniq -c

# Continuous monitoring with inotify
inotifywait -m /var/log/nginx -e create |
while read path action file; do
    if [[ "$file" =~ access.log ]]; then
        # Trigger analysis
    fi
done

Unlike Apache with built-in CGI support for tools like AWStats, Nginx requires alternative approaches for log analysis. The need to track file request patterns and error responses (especially 404s) goes beyond what standard analytics tools like Google Analytics provide.

First, ensure your Nginx log format captures sufficient detail. Here's an enhanced configuration example:

http {
    log_format detailed '$remote_addr - $remote_user [$time_local] '
                       '"$request" $status $body_bytes_sent '
                       '"$http_referer" "$http_user_agent" '
                       '$request_time $upstream_response_time';
    
    access_log /var/log/nginx/access.log detailed;
}

For quick insights, GNU command line tools offer powerful analysis:

# Top requested files
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -20

# 404 Error analysis
grep ' 404 ' /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -nr

GoAccess provides terminal-based and HTML visualization:

# Installation
sudo apt-get install goaccess

# Generate HTML report
zcat /var/log/nginx/access.log.*.gz | goaccess -a -o report.html --log-format=COMBINED

For large-scale deployments, consider Elasticsearch-Logstash-Kibana:

# Sample Logstash configuration
input {
  file {
    path => "/var/log/nginx/access.log"
    start_position => "beginning"
  }
}

filter {
  grok {
    match => { "message" => "%{IPORHOST:clientip} - %{USER:ident} $$%{HTTPDATE:timestamp}$$ \"%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response} %{NUMBER:bytes} \"%{DATA:referrer}\" \"%{DATA:agent}\"" }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
  }
}

For managed solutions without server maintenance:

  • Datadog Log Management
  • Loggly
  • Papertrail
  • Splunk

Create custom alerts for critical errors using Fail2Ban:

# /etc/fail2ban/filter.d/nginx-404.conf
[Definition]
failregex = ^ .* "GET .*" 404 .*$

# /etc/fail2ban/jail.local
[nginx-404]
enabled = true
port = http,https
filter = nginx-404
logpath = /var/log/nginx/access.log
maxretry = 5