How to Perform Content Replacement in Reverse Proxy Using Apache’s Substitute Filter (mod_substitute)


3 views

When setting up reverse proxies, we often encounter situations where absolute URLs in the proxied content break functionality. The issue isn't with the URLs being requested (which mod_rewrite handles well), but with URLs embedded in the HTML, CSS, or JavaScript content being served.

While mod_rewrite is excellent for URL transformation, it doesn't process response bodies. For content replacement, we need mod_substitute. Key differences:

# mod_rewrite handles incoming requests
RewriteRule ^path/(.*)$ /newpath/$1 [L]

# mod_substitute processes response content
Substitute "s|old.domain|new.domain|i"

Here's a working configuration that handles both URL rewriting and content substitution:

<VirtualHost *:80>
    ServerName  proxy.example.com
    ServerAlias old.example.com
    
    # Basic proxy setup
    ProxyRequests Off
    <Proxy *>
        Require all granted
    </Proxy>
    
    # URL rewriting
    RewriteEngine On
    RewriteCond %{HTTP_HOST} ^old\.example\.com$ [NC]
    RewriteRule ^(.*)$ http://proxy.example.com/$1 [P,L]
    
    # Content substitution
    AddOutputFilterByType SUBSTITUTE text/html
    AddOutputFilterByType SUBSTITUTE text/css
    AddOutputFilterByType SUBSTITUTE application/javascript
    
    Substitute "s|//old.example.com/|//proxy.example.com/|ni"
    Substitute "s|http://old.example.com/|https://proxy.example.com/|ni"
    Substitute "s|href=\"/|href=\"/path/|ni"
</VirtualHost>

The Substitute directive supports regular expressions with these flags:

i - case insensitive
n - don't capture groups
m - multiline mode
s - dot matches newline

Example with complex pattern:

# Replace both HTTP and HTTPS references
Substitute "s|https?://old.example.com(/[^\"]+)|//new.example.com$1|ni"

If substitutions aren't working:

  1. Verify the MIME type is included in AddOutputFilterByType
  2. Check for content encoding (gzip/deflate) - may need to disable compression
  3. Enable debug logging: LogLevel debug substitute:trace8
  4. Test with simple patterns first

Content substitution adds overhead:

  • Works best for small to medium sites
  • Avoid using on large binary files
  • Consider caching transformed content
  • For high traffic sites, explore alternatives like:
# Nginx's sub_filter module
# Varnish's vmod_replace
# Custom middleware in your application

Beyond domain replacement, mod_substitute is useful for:

# Changing API endpoints in proxied JavaScript
Substitute "s|api.old.com/v1|api.new.com/v2|ni"

# Updating CDN URLs
Substitute "s|cdn.old.com|static.new.com/assets|ni"

# Fixing mixed content warnings
Substitute "s|http://|https://|ni"

When working with reverse proxies, we often encounter situations where absolute paths in proxied content break functionality. Unlike URL rewriting with mod_rewrite, modifying the actual content stream requires different techniques.

The mod_substitute module provides powerful content replacement capabilities. The key configuration elements:

AddOutputFilterByType SUBSTITUTE text/html
Substitute "s|uat.site.co.jp|jp.uat.site2uk.co.uk|ni"

The flags used here are critical:

  • n: Don't reset after match (replace all occurrences)
  • i: Case-insensitive matching

The original configuration might fail for several reasons:

# Problematic case (missing protocol)
<link href="//uat.site.co.jp/css/css.css

# Solution requires proper pattern matching
Substitute "s|//uat\.site\.co\.jp|//jp.uat.site2uk.co.uk|ni"

For complex replacements, consider these patterns:

# Replace in URLs with various protocols
Substitute "s|https?://uat\.site\.co\.jp|https://jp.uat.site2uk.co.uk|ni"

# Handle JSON responses
AddOutputFilterByType SUBSTITUTE application/json
Substitute "s|old\.domain|new\.domain|ni"

When processing large HTML files:

  • Limit SUBSTITUTE to specific content types
  • Use precise regular expressions
  • Consider caching transformed content

When mod_substitute isn't sufficient:

# Using mod_proxy_html (more powerful but complex)
ProxyHTMLEnable On
ProxyHTMLURLMap uat.site.co.jp jp.uat.site2uk.co.uk