Nginx Configuration: Bypassing WP Super Cache for Google Referrers (Including Country-Specific Domains)


2 views

When using WP Super Cache with Nginx, there are valid use cases where you might want to serve uncached content to visitors coming from search engines. This is particularly useful when you need to:

  • Show fresh content to search engine crawlers for better indexing
  • Display dynamic elements (like personalized recommendations) to search-referred traffic
  • Test new content variations for organic traffic

The existing Nginx configuration has a few issues:

if ($http_referer ~* (www.google.com|www.google.co) ) {
    rewrite . /index.php break;
}

This implementation is problematic because:

  • It only matches www.google.com and www.google.co, missing many variations
  • The rewrite might not properly bypass the cache in all scenarios
  • It doesn't account for HTTPS referrals

Here's the improved version that properly handles all Google domains:

server {
    server_name website.com;
    
    location / {
        root /var/www/html/website.com;
        index index.php;
        
        # Google referrer detection (including all country TLDs)
        if ($http_referer ~* (https?://(www\.)?google\.([a-z]{2,3}\.)?[a-z]{2,3})) {
            set $skip_cache 1;
        }
        
        # Existing WP Super Cache logic
        if (-f $request_filename) {
            break;
        }
        
        set $supercache_file '';
        set $supercache_uri $request_uri;
        
        if ($request_method = POST) {
            set $supercache_uri '';
        }
        
        if ($query_string) {
            set $supercache_uri '';
        }
        
        if ($http_cookie ~* "comment_author_|wordpress|wp-postpass_" ) {
            set $supercache_uri '';
        }
        
        # Bypass cache if Google referrer detected
        if ($skip_cache = 1) {
            set $supercache_uri '';
        }
        
        if ($supercache_uri ~ ^(.+)$) {
            set $supercache_file /wp-content/cache/supercache/$http_host/$1index.html;
        }
        
        if (-f $document_root$supercache_file) {
            rewrite ^(.*)$ $supercache_file break;
        }
        
        if (!-e $request_filename) {
            rewrite . /index.php last;
        }
    }
    
    location ~ \.php$ {
        fastcgi_pass 127.0.0.1:9000;
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME /var/www/html/website.com$fastcgi_script_name;
        include fastcgi_params;
    }
}

The enhanced configuration includes these optimizations:

  • Comprehensive Google domain matching: The regex now matches:
    • All country-specific TLDs (google.co.uk, google.de, etc.)
    • Both HTTP and HTTPS protocols
    • With or without www prefix
  • Proper cache bypass: Uses $skip_cache variable to ensure full bypass
  • Maintains existing functionality: Preserves all original WP Super Cache logic

To verify it's working:

  1. Use curl to simulate Google referrals:
    curl -e "https://www.google.com/" http://yourdomain.com/
  2. Check your server logs for cache hits/misses
  3. Use browser developer tools to inspect response headers:
    • Look for X-Cache: HIT or X-Cache: MISS
    • Verify no cache-control: max-age header for Google referrals

While this solution works, consider these performance implications:

  • Increased server load during peak traffic from search engines
  • Potential delay in serving uncached content
  • Additional regex processing for each request

For high-traffic sites, you might want to implement this at the CDN level instead.

If Nginx configuration becomes too complex, consider:

  1. Using WP Super Cache's rejected_user_agent filter
  2. Implementing this logic in WordPress itself via wp_cache_skip_cache()
  3. Setting up a separate caching layer (Varnish, Fastly) with referral rules

When implementing WordPress caching with WP Super Cache, there are legitimate cases where you might want to bypass the cache for specific traffic sources. A common scenario is showing uncached content to visitors arriving from search engines like Google, while maintaining cached versions for other visitors.

The existing nginx configuration attempts to handle this by checking the referrer header, but has several issues:

if ($http_referer ~* (www.google.com|www.google.co) ) {
    rewrite . /index.php break;
}

This approach is problematic because:

  • It only matches www.google.com and www.google.co domains
  • The regex pattern isn't comprehensive enough for all Google TLDs
  • The rewrite might conflict with WP Super Cache's own logic

Here's an improved version that properly handles all Google referrers while maintaining cache functionality:

server {
    server_name website.com;
    root /var/www/html/website.com;
    index index.php;

    location / {
        set $bypass_cache 0;
        
        # Match any Google domain (including international TLDs)
        if ($http_referer ~* (www\.|)google\.(com|co|ad|ae|com.af|com.ag|com.ai|al|am|co.ao|com.ar|as|at|com.au|az|ba|com.bd|be|bf|bg|com.bh|bi|bj|com.bn|com.bo|com.br|bs|bt|co.bw|by|com.bz|ca|cd|cf|cg|ch|ci|co.ck|cl|cm|cn|com.co|co.cr|com.cu|cv|com.cy|cz|de|dj|dk|dm|com.do|dz|com.ec|ee|com.eg|es|com.et|fi|com.fj|fm|fr|ga|ge|gg|com.gh|com.gi|gl|gm|gr|com.gt|gy|com.hk|hn|hr|ht|hu|co.id|ie|co.il|im|co.in|iq|is|it|je|com.jm|jo|co.jp|co.ke|com.kh|ki|kg|co.kr|com.kw|kz|la|com.lb|li|lk|co.ls|lt|lu|lv|com.ly|com.ma|md|me|mg|mk|ml|com.mm|mn|ms|com.mt|mu|mv|mw|com.mx|com.my|co.mz|com.na|com.nf|com.ng|com.ni|ne|nl|no|com.np|nr|nu|co.nz|com.om|com.pa|com.pe|com.pg|com.ph|com.pk|pl|pn|com.pr|ps|pt|com.py|com.qa|ro|ru|rw|com.sa|com.sb|sc|se|com.sg|sh|si|sk|com.sl|sn|so|sm|sr|st|com.sv|td|tg|co.th|com.tj|tl|tm|tn|to|com.tr|tt|com.tw|co.tz|com.ua|co.ug|co.uk|com.uy|co.uz|com.vc|co.ve|vg|co.vi|com.vn|vu|ws|rs|co.za|co.zm|co.zw) ) {
            set $bypass_cache 1;
        }
        
        if ($bypass_cache = 1) {
            rewrite ^ /index.php last;
            break;
        }

        # Original WP Super Cache logic
        if (-f $request_filename) {
            break;
        }
        
        set $supercache_file '';
        set $supercache_uri $request_uri;
        
        if ($request_method = POST) {
            set $supercache_uri '';
        }
        
        if ($query_string) {
            set $supercache_uri '';
        }
        
        if ($http_cookie ~* "comment_author_|wordpress|wp-postpass_" ) {
            set $supercache_uri '';
        }
        
        if ($supercache_uri ~ ^(.+)$) {
            set $supercache_file /wp-content/cache/supercache/$http_host/$1index.html;
        }
        
        if (-f $document_root$supercache_file) {
            rewrite ^(.*)$ $supercache_file break;
        }
        
        if (!-e $request_filename) {
            rewrite . /index.php last;
        }
    }

    location ~ \.php$ {
        fastcgi_pass    127.0.0.1:9000;
        fastcgi_index   index.php;
        fastcgi_param   SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include         fastcgi_params;
    }
}

1. Comprehensive Google domain matching - The regex now covers all international Google domains (195 country-specific TLDs)

2. Proper cache bypass logic - Uses a variable ($bypass_cache) to cleanly handle the conditional flow

3. Maintains original WP Super Cache functionality - The rest of the caching rules remain unchanged

To verify the configuration is working:

# Test with curl simulating Google referrer
curl -H "Referer: https://www.google.com/search?q=test" http://website.com

# Test with curl simulating direct traffic
curl http://website.com

The first request should return uncached content (check headers for X-Cache: MISS), while the second should return cached content when available.

While this solution works, evaluating referrer headers does add minimal overhead. For high-traffic sites, consider:

  • Moving the referrer check to a separate location block for specific paths
  • Using nginx maps for more efficient pattern matching
  • Implementing this logic at the CDN level if using a service like Cloudflare

For better performance with many rules, consider using nginx map directive:

map $http_referer $bypass_cache {
    default 0;
    ~*(www\.|)google\.(com|co|[a-z]{2}) 1;
    ~*(www\.|)google\.com\.[a-z]{2} 1;
    ~*search\.yahoo\.com 1;
}

This can be placed in your nginx.conf http context, then referenced in your server block.