When analyzing web traffic, nginx logs contain valuable User-Agent strings that look something like this:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1
We need to extract just the browser name and major version. Here's how we can approach this with command-line tools:
Assuming standard nginx log format, this AWK command extracts and summarizes browser versions:
awk '{split($(NF-1),ua," ");
for(i in ua) {
if(ua[i] ~ /[Mm]ozilla|Chrome|Safari|Firefox|Edge|Opera|IE|Trident/) {
match(ua[i], /(Firefox|Chrome|Safari|Opera|Edge|MSIE|Trident)[\/ ]([0-9]+)/, matches);
if(matches[1]) {
browser = matches[1];
if(browser == "Trident") browser = "IE"; # Handle IE11+
versions[browser matches[2]]++
}
}
}
}
END {
for(v in versions) print versions[v], v
}' access.log | sort -nr
For more robust parsing, consider these specialized tools:
- GoAccess: Real-time web log analyzer with built-in UA parsing
- AWStats: Advanced log file analyzer with detailed browser reports
- ELK Stack: For large-scale log analysis with User-Agent processor plugin
For more control, here's a Python script using the user-agents
library:
from collections import defaultdict
import user_agents
import re
counts = defaultdict(int)
with open('access.log') as f:
for line in f:
# Extract User-Agent string (adjust based on your log format)
ua_string = re.search(r'"([^"]*)"', line.split('"')[-2]).group(1)
ua = user_agents.parse(ua_string)
key = f"{ua.browser.family} {ua.browser.version_string.split('.')[0]}"
counts[key] += 1
for browser, count in sorted(counts.items(), key=lambda x: x[1], reverse=True):
print(f"{count} {browser}")
Some User-Agent strings require special handling:
# Microsoft Edge (Chromium-based)
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.74 Safari/537.36 Edg/79.0.309.43"
# Internet Explorer 11
"Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko"
For these cases, you'll need additional pattern matching rules in your parsing logic.
To get clean, sorted output like in your example, pipe the results through additional Unix tools:
your_parsing_command | sort -nr | head -20
When analyzing web traffic, we often need to identify browser usage patterns from Nginx access logs. The typical log format contains the User-Agent string, which looks like:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
For market share analysis, we usually only care about the browser family and major version (e.g., Chrome 91), not minor versions or operating systems.
Here are three effective methods to extract and count browser major versions:
1. Using awk and sed
cat access.log | awk -F\" '{print $6}' | \
sed -n 's/.*$Firefox\|Chrome\|Safari\|Opera\|Edge\|IE\|MSIE\|Trident$.*/\1/p' | \
sort | uniq -c | sort -rn
This gives you raw counts but doesn't extract versions. For version extraction:
cat access.log | awk -F\" '{print $6}' | \
grep -Eo '(Firefox|Chrome|Safari|Opera|Edge|IE|MSIE|Trident)[/ ]+[0-9]+' | \
cut -d/ -f1 | sort | uniq -c | sort -rn
2. Using logparser Tools
For more sophisticated analysis, use GoAccess:
goaccess access.log --log-format=COMBINED --browsers-file=/path/to/browsers.list
Or AWStats with proper configuration for browser detection.
3. Python Script Solution
For maximum flexibility, here's a Python script:
import re
from collections import defaultdict
pattern = re.compile(
r'(?:Firefox|Chrome|Safari|Opera|Edge|IE|MSIE|Trident)[/ ]+([0-9]+)',
re.IGNORECASE
)
counts = defaultdict(int)
with open('access.log') as f:
for line in f:
# Extract User-Agent (6th field in combined log format)
ua = line.split('"')[5]
match = pattern.search(ua)
if match:
browser = match.group(0).split('/')[0]
version = match.group(1)
key = f"{browser}{version[0]}" # First digit of version
counts[key] += 1
for browser, count in sorted(counts.items(), key=lambda x: x[1], reverse=True):
print(f"{count} {browser}")
Some User-Agents require special handling:
- Internet Explorer 11 masquerades as other browsers
- Mobile browsers often include the word "Mobile"
- Bots and crawlers should be filtered out
Here's an enhanced pattern that handles these cases:
pattern = re.compile(
r'(?:Firefox|Chrome|Safari|Opera|Edg|IE|MSIE|Trident|Android)[/ ]+([0-9]+)|'
r'(?:iPhone|iPod|iPad).+Version/(\d+)',
re.IGNORECASE
)
For better presentation, pipe the results to a simple bar chart:
python analyze_browsers.py | \
awk '{printf("%-8s ", $2); for(i=0;i<$1/50;i++) {printf("#")}; print ""}'
Or generate a CSV for spreadsheet import:
Browser,Count
Chrome9,1200
Firefox8,900
Safari14,600