Apache Performance Crisis: Diagnosing & Blocking ggpht.com GoogleImageProxy Flood Attacks


2 views

While monitoring server performance last Thursday, I noticed Apache was consuming 100% CPU capacity. A quick tail -f /var/log/apache2/access.log revealed an alarming pattern - hundreds of thousands of requests containing "via ggpht.com GoogleImageProxy" in the user agent string.

Google's image proxy service (ggpht.com) fetches and caches images for various Google products. In normal operation, this helps with:

  • Image compression/optimization
  • HTTPS conversion
  • Content sanitization

But what we're seeing appears to be abnormal proxy behavior - possibly misconfigured scrapers or even a DDoS vector.

The log entries show these characteristics:

IP: 10.190.45.31 (load balancer)
User Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.0.7) 
Gecko/2009021910 Firefox/3.0.7 (via ggpht.com GoogleImageProxy)
Response: HTTP 200
Payload Size: ~2KB

Here's the .htaccess rule I implemented to block these requests:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ggpht\.com\ GoogleImageProxy [NC]
RewriteRule ^ - [F,L]

For Nginx users:

if ($http_user_agent ~* "ggpht\.com GoogleImageProxy") {
    return 403;
}

For more granular control, I recommend implementing rate limiting. Here's a ModSecurity rule example:

SecRule REQUEST_HEADERS:User-Agent "@pm ggpht.com GoogleImageProxy" \
    "id:1001,phase:1,t:none,log,deny,status:403,\
    msg:'Google Image Proxy Abuse Detected'"

Since the original IP is masked by your load balancer, ensure proper logging is implemented. For AWS ALB:

aws elbv2 modify-load-balancer-attributes \
    --load-balancer-arn your-alb-arn \
    --attributes Key=access_logs.s3.enabled,Value=true \
    Key=access_logs.s3.bucket,Value=your-log-bucket

Create a custom metric in your monitoring system to track these requests. Sample Prometheus config:

- name: apache_google_proxy_requests
  type: counter
  help: Count of requests via GoogleImageProxy
  match: 
    user_agent: "*ggpht.com GoogleImageProxy*"

When your Apache server suddenly spikes to 100% CPU with log entries containing (via ggpht.com GoogleImageProxy), you're likely facing one of two scenarios:

# Typical malicious pattern
123.45.67.89 - - [15/Jan/2023:08:22:11 +0000] "GET /wp-content/uploads/image.jpg HTTP/1.1" 200 5432 "-" 
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36 (via ggpht.com GoogleImageProxy)"

Legitimate GoogleImageProxy requests should:

  • Come from Google IP ranges (verify via whois)
  • Have X-Forwarded-For headers when proxied
  • Maintain reasonable request rates (under 10 requests/sec)

Create a custom Apache log format in httpd.conf:

LogFormat "%h %{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxyformat
CustomLog /var/log/apache2/proxy_traffic.log proxyformat

Add these rules to your .htaccess or Apache config:

# Block suspicious User-Agents
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (via\ ggpht\.com\ GoogleImageProxy) [NC]
RewriteCond %{REMOTE_ADDR} !^66\.102\. [OR]
RewriteCond %{REMOTE_ADDR} !^172\.217\. 
RewriteRule ^ - [F,L]

# Rate limiting module
<IfModule mod_ratelimit.c>
    <Location />
        SetOutputFilter RATE_LIMIT
        SetEnv rate-limit 30
    </Location>
</IfModule>

For enterprise-grade protection:

# modsecurity.conf rules
SecRule REQUEST_HEADERS:User-Agent "@rx $via ggpht\.com GoogleImageProxy$" \
    "id:1001,\
    phase:1,\
    deny,\
    status:403,\
    msg:'Suspicious GoogleImageProxy traffic',\
    logdata:'Matched User-Agent: %{MATCHED_VAR}'"

Python script to analyze suspicious patterns:

#!/usr/bin/env python3
import re
from collections import Counter

def analyze_logs(logfile):
    ip_counter = Counter()
    ua_pattern = re.compile(r'$via ggpht\.com GoogleImageProxy$')
    
    with open(logfile) as f:
        for line in f:
            if ua_pattern.search(line):
                ip = line.split()[0]
                ip_counter[ip] += 1
                
    return ip_counter.most_common(10)

if __name__ == '__main__':
    print(analyze_logs('/var/log/apache2/access.log'))

For AWS ALB/ELB users:

# CloudFront/LB logging configuration
resource "aws_elb" "web" {
  name               = "web-lb"
  availability_zones = ["us-west-2a"]
  
  access_logs {
    bucket        = "my-logs-bucket"
    interval      = 5
    enabled       = true
  }
}