When dealing with Apache log files, command line tools offer the perfect balance of power and portability. They allow you to:
- Process logs offline without touching production servers
- Chain multiple operations together
- Handle large files efficiently
- Automate analysis through scripts
1. awk - The Swiss Army Knife
Perfect for extracting specific fields from Apache combined logs:
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -10
This one-liner shows top 10 IP addresses by request count.
2. grep - Pattern Matching Powerhouse
Filter logs by status code, URL, or other patterns:
grep " 404 " access.log | awk '{print $7}' | sort | uniq -c | sort -nr
Finds all 404 errors and counts occurrences by URL.
3. GoAccess - Real-time Web Log Analyzer
Install with:
sudo apt-get install goaccess
Generate HTML report:
goaccess access.log --log-format=COMBINED -o report.html
4. Apache logparser
Specialized tool for Apache logs with powerful filtering:
logparser -i apache -o csv "SELECT TOP 10 cs-uri-stem, COUNT(*) as hits FROM access.log GROUP BY cs-uri-stem ORDER BY hits DESC" > top_pages.csv
Tracking Bot Traffic
grep -i 'bot\|spider\|crawl' access.log | awk '{print $1,$12}' | sort | uniq
Identifying Slow Requests
For logs with %D (time taken in microseconds):
awk '$NF > 1000000 {print $7, $NF/1000000"sec"}' access.log | sort -k2 -nr | head
- Use
zcat
for compressed logs:zcat access.log.*.gz | grep "POST"
- Combine tools with pipes for complex analysis
- Save common queries as shell scripts or aliases
- Consider
logrotate
for managing large log files
While command line tools are great, you might need a full solution when:
- Dealing with multiple servers
- Needing historical trend analysis
- Requiring real-time monitoring
In these cases, look at ELK Stack or Splunk.
When dealing with Apache log files (typically access.log or error.log), command line tools provide the fastest way to extract valuable insights without complex setups. Unlike GUI tools or web-based analyzers, CLI tools can process large files efficiently and integrate with your existing automation workflows.
These come pre-installed on most Linux/Unix systems:
# Top requested URLs: cat access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -20 # Status code statistics: awk '{print $9}' access.log | sort | uniq -c | sort -rn # Client IP analysis: awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20
Install with:
sudo apt-get install goaccess # Debian/Ubuntu brew install goaccess # macOS
Basic usage examples:
# Generate HTML report (interactive) goaccess access.log -o report.html --log-format=COMBINED # Real-time monitoring tail -f access.log | goaccess -a
While primarily web-based, AWStats has command line components:
perl awstats.pl -config=example -update -output -staticlinks > report.html
Microsoft's powerful tool for Windows servers:
logparser.exe "SELECT TOP 20 cs-uri-stem, COUNT(*) as Hits FROM ex*.log GROUP BY cs-uri-stem ORDER BY Hits DESC"
For complex patterns, Python offers flexibility:
python -c "from collections import Counter; import re; c = Counter(re.findall(r'\"(GET|POST)\s([^\s]+)', open('access.log').read())); print(c.most_common(10))"
- Use
zcat
for compressed logs:zcat access.log.*.gz | grep "pattern"
- Process logs in parallel:
parallel -j 4 'grep "404" {}' ::: access.log*
- Monitor in real-time:
multitail -cS apache access.log