How to Process and Analyze Multiple Gzipped Log Files with Awstats for Nginx Access Logs


2 views

When working with Awstats to analyze Nginx access logs, many developers face the challenge of processing multiple compressed log files (like access.log.1.gz through access.log.40.gz). The default Awstats configuration typically points to a single uncompressed log file, leaving these valuable historical logs unanalyzed.

Awstats requires uncompressed log files for processing. For gzipped logs, we need to implement a preprocessing step before feeding them to Awstats. Here's the complete workflow:


1. Locate all gzipped log files
2. Uncompress them (either temporarily or permanently)
3. Process them sequentially with Awstats
4. Optionally: recompress or delete temporary files

The most efficient method is to use zcat to stream uncompressed logs directly to Awstats without creating temporary files:


for gzfile in /var/log/nginx/access.log.*.gz; do
    zcat "$gzfile" | /usr/lib/cgi-bin/awstats.pl -config=yourconfig -LogFile=-
done

Key points:

  • -LogFile=- tells Awstats to read from stdin
  • Processes files in alphabetical/numerical order
  • No temporary storage needed

For environments where streaming isn't possible, create a temporary combined log:


TEMPFILE=$(mktemp)
for gzfile in /var/log/nginx/access.log.*.gz; do
    zcat "$gzfile" >> "$TEMPFILE"
done

/usr/lib/cgi-bin/awstats.pl -config=yourconfig -LogFile="$TEMPFILE"
rm "$TEMPFILE"

Create a script to handle daily log rotations and processing:


#!/bin/bash
CONFIG="yourconfig"
LOG_DIR="/var/log/nginx"
AWSTATS="/usr/lib/cgi-bin/awstats.pl"

# Process current uncompressed log
$AWSTATS -config=$CONFIG -LogFile="$LOG_DIR/access.log"

# Process all compressed logs
find "$LOG_DIR" -name "access.log.*.gz" -print0 | sort -z | xargs -0 -I {} zcat {} | \
    $AWSTATS -config=$CONFIG -LogFile=-

# Update the database
$AWSTATS -config=$CONFIG -update

When dealing with large numbers of gzipped logs:

  • Process files sequentially to avoid memory issues
  • Consider using pigz (parallel gzip) for faster decompression
  • Schedule processing during low-traffic periods
  • Monitor disk I/O during processing

To analyze logs between specific dates (e.g., for monthly reports):


START_DATE="2023-01-01"
END_DATE="2023-01-31"

for gzfile in /var/log/nginx/access.log-${START_DATE}.gz \
              /var/log/nginx/access.log-${END_DATE}.gz; do
    if [ -f "$gzfile" ]; then
        zcat "$gzfile" | $AWSTATS -config=$CONFIG -LogFile=-
    fi
done

When working with AWStats and Nginx, one common obstacle is processing archived log files in compressed format. Unlike the active access.log that AWStats handles easily, gzipped historical logs (access.log.1.gz through access.log.40.gz) require special configuration.

The most efficient approach involves:

  • Creating a shell script to process multiple gzipped files
  • Modifying AWStats configuration to handle compressed files
  • Setting up log rotation compatibility

First, configure your AWStats config file (/etc/awstats/awstats.yourdomain.conf):

LogFile="/usr/bin/zcat /var/log/nginx/access.log.*.gz |"
LogFormat=1
SiteDomain="yourdomain.com"
HostAliases="www.yourdomain.com localhost 127.0.0.1"

Create a processing script (process_logs.sh):

#!/bin/bash

# Process all compressed logs
for i in $(ls -r /var/log/nginx/access.log.*.gz); do
    echo "Processing $i"
    /usr/lib/cgi-bin/awstats.pl -config=yourdomain -update -LogFile="zcat $i |"
done

# Process current log
/usr/lib/cgi-bin/awstats.pl -config=yourdomain -update

Set up a cron job for regular processing:

0 3 * * * /path/to/process_logs.sh > /var/log/awstats_processing.log 2>&1

For large log collections:

  • Add DNSCache=1 to AWStats config
  • Consider DNSLastUpdate=0 for static IPs
  • Use LoadPlugin="decodeutfkeys" for international domains

If you encounter errors:

# Test log format with:
zcat /var/log/nginx/access.log.1.gz | head -n 1 | \
/usr/lib/cgi-bin/awstats.pl -config=yourdomain -debug=1