When working with Apache web server logs, a common requirement is to process entries as they appear without re-reading the entire file. Traditional approaches like cron jobs or periodic scanning waste resources and introduce delays. What we really want is to detect and process log entries the moment they're written.
While tail -f
shows new entries in real-time, it doesn't provide a programmatic way to process them. We need a solution that:
- Maintains position in the file between runs
- Only processes new entries
- Can trigger specific actions on matches
- Handles log rotation gracefully
Here's a robust solution for Linux systems that uses inotify to monitor file changes:
#!/usr/bin/env python3
import pyinotify
import re
class LogEventHandler(pyinotify.ProcessEvent):
def __init__(self, pattern, callback):
self.pattern = re.compile(pattern)
self.callback = callback
def process_default(self, event):
if event.mask & pyinotify.IN_MODIFY:
with open(event.pathname, 'r') as f:
f.seek(self.last_pos if hasattr(self, 'last_pos') else 0)
new_lines = f.readlines()
self.last_pos = f.tell()
for line in new_lines:
if self.pattern.search(line):
self.callback(line)
def process_matched_line(line):
print(f"Matched line: {line.strip()}")
# Add your custom logic here
wm = pyinotify.WatchManager()
handler = LogEventHandler(r'404', process_matched_line)
notifier = pyinotify.Notifier(wm, handler)
wdd = wm.add_watch('/var/log/apache2/access.log', pyinotify.IN_MODIFY)
notifier.loop()
For simpler cases, you can use a combination of tail and awk:
#!/bin/bash
LOG_FILE="/var/log/apache2/access.log"
TEMP_FILE="/tmp/last_position.tmp"
# Get last position or start at beginning
[ -f "$TEMP_FILE" ] && LAST_POS=$(cat "$TEMP_FILE") || LAST_POS=0
while true; do
CURRENT_SIZE=$(stat -c %s "$LOG_FILE")
if [ "$CURRENT_SIZE" -lt "$LAST_POS" ]; then
# Log file was rotated
LAST_POS=0
fi
if [ "$CURRENT_SIZE" -gt "$LAST_POS" ]; then
# Read new content
dd if="$LOG_FILE" bs=1 skip="$LAST_POS" 2>/dev/null | while read -r line; do
if [[ "$line" =~ "404" ]]; then
echo "Found 404 error: $line"
# Add your action here
fi
done
LAST_POS="$CURRENT_SIZE"
echo "$LAST_POS" > "$TEMP_FILE"
fi
sleep 1
done
Production systems require special handling for log rotation. The Python example automatically detects file changes while the bash script checks for size reduction. For more robust rotation handling:
# In the Python solution, add this to the process_default method:
if event.mask & pyinotify.IN_DELETE or event.mask & pyinotify.IN_MOVE_SELF:
self.last_pos = 0
wm.add_watch(event.pathname, pyinotify.IN_MODIFY)
For high-traffic servers:
- Batch process multiple lines at once
- Use efficient pattern matching (pre-compiled regex)
- Consider buffering matches before taking action
- Offload intensive processing to separate threads
When monitoring Apache logs, repeatedly scanning entire files or even recent chunks creates unnecessary overhead. The ideal solution should:
- Process entries exactly once
- Respond in real-time
- Minimize resource usage
- Handle log rotation gracefully
The tail -f
command (follow mode) is specifically designed for this use case. Here's a basic implementation:
#!/bin/bash
LOG_FILE="/var/log/apache2/access.log"
tail -n0 -F "$LOG_FILE" | while read LINE
do
# Your processing logic here
if [[ "$LINE" =~ "404" ]]; then
echo "Found 404 error: $LINE" >> /var/log/my_monitor.log
# Trigger additional actions
fi
done
The magic happens with these options:
-n0
: Start reading at the end of file (no existing lines)-F
: Follow by name (handles log rotation)while read LINE
: Processes each new line as it arrives
For more complex filtering, consider this regex example:
#!/bin/bash
LOG_FILE="/var/log/apache2/error.log"
tail -n0 -F "$LOG_FILE" | while read LINE
do
if [[ "$LINE" =~ "PHP (Fatal|Parse) error" ]]; then
send_alert_email "Critical PHP error detected: ${BASH_REMATCH[1]}"
logger -t apache_monitor "PHP error: $LINE"
fi
done
For busy servers, consider these optimizations:
#!/bin/bash
# Buffer output for performance
LOG_FILE="/var/log/apache2/access.log"
BUFFER_SIZE=100
TIMEOUT=5
tail -n0 -F "$LOG_FILE" | stdbuf -oL awk -v buffer="$BUFFER_SIZE" -v timeout="$TIMEOUT" '
{
lines[NR % buffer] = $0
if (NR % buffer == 0 || systime() - last_flush > timeout) {
process_buffer(lines)
last_flush = systime()
}
}
function process_buffer(arr) {
for (i in arr) {
if (arr[i] ~ /POST \/admin/) {
# Security alert action
}
}
}'
For production environments, consider these robust solutions:
- systemd journal:
journalctl -f -u apache2
- SWATCH: The simple watcher utility
- Filebeat: Elastic's log shipper with processors
Always test your script with:
# Simulate log rotation
mv access.log access.log.old
touch access.log
service apache2 restart