While monitoring server performance last Thursday, I noticed Apache was consuming 100% CPU capacity. A quick tail -f /var/log/apache2/access.log
revealed an alarming pattern - hundreds of thousands of requests containing "via ggpht.com GoogleImageProxy
" in the user agent string.
Google's image proxy service (ggpht.com) fetches and caches images for various Google products. In normal operation, this helps with:
- Image compression/optimization
- HTTPS conversion
- Content sanitization
But what we're seeing appears to be abnormal proxy behavior - possibly misconfigured scrapers or even a DDoS vector.
The log entries show these characteristics:
IP: 10.190.45.31 (load balancer)
User Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.0.7)
Gecko/2009021910 Firefox/3.0.7 (via ggpht.com GoogleImageProxy)
Response: HTTP 200
Payload Size: ~2KB
Here's the .htaccess rule I implemented to block these requests:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ggpht\.com\ GoogleImageProxy [NC]
RewriteRule ^ - [F,L]
For Nginx users:
if ($http_user_agent ~* "ggpht\.com GoogleImageProxy") {
return 403;
}
For more granular control, I recommend implementing rate limiting. Here's a ModSecurity rule example:
SecRule REQUEST_HEADERS:User-Agent "@pm ggpht.com GoogleImageProxy" \
"id:1001,phase:1,t:none,log,deny,status:403,\
msg:'Google Image Proxy Abuse Detected'"
Since the original IP is masked by your load balancer, ensure proper logging is implemented. For AWS ALB:
aws elbv2 modify-load-balancer-attributes \
--load-balancer-arn your-alb-arn \
--attributes Key=access_logs.s3.enabled,Value=true \
Key=access_logs.s3.bucket,Value=your-log-bucket
Create a custom metric in your monitoring system to track these requests. Sample Prometheus config:
- name: apache_google_proxy_requests
type: counter
help: Count of requests via GoogleImageProxy
match:
user_agent: "*ggpht.com GoogleImageProxy*"
When your Apache server suddenly spikes to 100% CPU with log entries containing (via ggpht.com GoogleImageProxy)
, you're likely facing one of two scenarios:
# Typical malicious pattern
123.45.67.89 - - [15/Jan/2023:08:22:11 +0000] "GET /wp-content/uploads/image.jpg HTTP/1.1" 200 5432 "-"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36 (via ggpht.com GoogleImageProxy)"
Legitimate GoogleImageProxy requests should:
- Come from Google IP ranges (verify via
whois
) - Have
X-Forwarded-For
headers when proxied - Maintain reasonable request rates (under 10 requests/sec)
Create a custom Apache log format in httpd.conf
:
LogFormat "%h %{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxyformat
CustomLog /var/log/apache2/proxy_traffic.log proxyformat
Add these rules to your .htaccess
or Apache config:
# Block suspicious User-Agents
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (via\ ggpht\.com\ GoogleImageProxy) [NC]
RewriteCond %{REMOTE_ADDR} !^66\.102\. [OR]
RewriteCond %{REMOTE_ADDR} !^172\.217\.
RewriteRule ^ - [F,L]
# Rate limiting module
<IfModule mod_ratelimit.c>
<Location />
SetOutputFilter RATE_LIMIT
SetEnv rate-limit 30
</Location>
</IfModule>
For enterprise-grade protection:
# modsecurity.conf rules
SecRule REQUEST_HEADERS:User-Agent "@rx $via ggpht\.com GoogleImageProxy$" \
"id:1001,\
phase:1,\
deny,\
status:403,\
msg:'Suspicious GoogleImageProxy traffic',\
logdata:'Matched User-Agent: %{MATCHED_VAR}'"
Python script to analyze suspicious patterns:
#!/usr/bin/env python3
import re
from collections import Counter
def analyze_logs(logfile):
ip_counter = Counter()
ua_pattern = re.compile(r'$via ggpht\.com GoogleImageProxy$')
with open(logfile) as f:
for line in f:
if ua_pattern.search(line):
ip = line.split()[0]
ip_counter[ip] += 1
return ip_counter.most_common(10)
if __name__ == '__main__':
print(analyze_logs('/var/log/apache2/access.log'))
For AWS ALB/ELB users:
# CloudFront/LB logging configuration
resource "aws_elb" "web" {
name = "web-lb"
availability_zones = ["us-west-2a"]
access_logs {
bucket = "my-logs-bucket"
interval = 5
enabled = true
}
}