When setting up reverse proxies, we often encounter situations where absolute URLs in the proxied content break functionality. The issue isn't with the URLs being requested (which mod_rewrite handles well), but with URLs embedded in the HTML, CSS, or JavaScript content being served.
While mod_rewrite is excellent for URL transformation, it doesn't process response bodies. For content replacement, we need mod_substitute. Key differences:
# mod_rewrite handles incoming requests
RewriteRule ^path/(.*)$ /newpath/$1 [L]
# mod_substitute processes response content
Substitute "s|old.domain|new.domain|i"
Here's a working configuration that handles both URL rewriting and content substitution:
<VirtualHost *:80>
ServerName proxy.example.com
ServerAlias old.example.com
# Basic proxy setup
ProxyRequests Off
<Proxy *>
Require all granted
</Proxy>
# URL rewriting
RewriteEngine On
RewriteCond %{HTTP_HOST} ^old\.example\.com$ [NC]
RewriteRule ^(.*)$ http://proxy.example.com/$1 [P,L]
# Content substitution
AddOutputFilterByType SUBSTITUTE text/html
AddOutputFilterByType SUBSTITUTE text/css
AddOutputFilterByType SUBSTITUTE application/javascript
Substitute "s|//old.example.com/|//proxy.example.com/|ni"
Substitute "s|http://old.example.com/|https://proxy.example.com/|ni"
Substitute "s|href=\"/|href=\"/path/|ni"
</VirtualHost>
The Substitute directive supports regular expressions with these flags:
i - case insensitive
n - don't capture groups
m - multiline mode
s - dot matches newline
Example with complex pattern:
# Replace both HTTP and HTTPS references
Substitute "s|https?://old.example.com(/[^\"]+)|//new.example.com$1|ni"
If substitutions aren't working:
- Verify the MIME type is included in AddOutputFilterByType
- Check for content encoding (gzip/deflate) - may need to disable compression
- Enable debug logging: LogLevel debug substitute:trace8
- Test with simple patterns first
Content substitution adds overhead:
- Works best for small to medium sites
- Avoid using on large binary files
- Consider caching transformed content
- For high traffic sites, explore alternatives like:
# Nginx's sub_filter module
# Varnish's vmod_replace
# Custom middleware in your application
Beyond domain replacement, mod_substitute is useful for:
# Changing API endpoints in proxied JavaScript
Substitute "s|api.old.com/v1|api.new.com/v2|ni"
# Updating CDN URLs
Substitute "s|cdn.old.com|static.new.com/assets|ni"
# Fixing mixed content warnings
Substitute "s|http://|https://|ni"
When working with reverse proxies, we often encounter situations where absolute paths in proxied content break functionality. Unlike URL rewriting with mod_rewrite, modifying the actual content stream requires different techniques.
The mod_substitute
module provides powerful content replacement capabilities. The key configuration elements:
AddOutputFilterByType SUBSTITUTE text/html
Substitute "s|uat.site.co.jp|jp.uat.site2uk.co.uk|ni"
The flags used here are critical:
- n: Don't reset after match (replace all occurrences)
- i: Case-insensitive matching
The original configuration might fail for several reasons:
# Problematic case (missing protocol)
<link href="//uat.site.co.jp/css/css.css
# Solution requires proper pattern matching
Substitute "s|//uat\.site\.co\.jp|//jp.uat.site2uk.co.uk|ni"
For complex replacements, consider these patterns:
# Replace in URLs with various protocols
Substitute "s|https?://uat\.site\.co\.jp|https://jp.uat.site2uk.co.uk|ni"
# Handle JSON responses
AddOutputFilterByType SUBSTITUTE application/json
Substitute "s|old\.domain|new\.domain|ni"
When processing large HTML files:
- Limit SUBSTITUTE to specific content types
- Use precise regular expressions
- Consider caching transformed content
When mod_substitute
isn't sufficient:
# Using mod_proxy_html (more powerful but complex)
ProxyHTMLEnable On
ProxyHTMLURLMap uat.site.co.jp jp.uat.site2uk.co.uk