Nginx Rewrite Rule: Replace Query String Question Mark with Underscore for Static HTML Mirroring


4 views

When mirroring dynamic websites as static HTML (e.g., using wget), query strings containing question marks (?) pose a filesystem challenge. Most web servers won't serve URLs with literal question marks as static files. A common workaround is converting ? to _ in the saved filenames.

Here's how to implement this in Nginx:


server {
    listen 80;
    server_name example.com;
    
    # Convert ? to _ for static files
    if ($args ~ "^(.*)$") {
        rewrite ^(.*)\?(.*)$ $1_$2? last;
    }
    
    # Serve the rewritten files
    location / {
        try_files $uri $uri/ =404;
    }
}
  • This preserves the original query parameters while making them filesystem-friendly
  • The last flag stops processing further rewrite rules
  • Combine with appropriate try_files directives for static file serving

For a more comprehensive solution handling edge cases:


map $request_uri $rewritten_uri {
    default $request_uri;
    "~^(?[^?]*)\?(?.*)$" "${path}_${query}";
}

server {
    # ... other server config ...
    
    if ($rewritten_uri != $request_uri) {
        rewrite ^ $rewritten_uri? last;
    }
    
    location / {
        root /path/to/static/files;
        try_files $uri $uri/ =404;
    }
}

While rewrite rules work, consider these optimizations:

  • Pre-process URLs during mirroring (wget --adjust-extension)
  • Use hash-based directory structures for large numbers of files
  • Implement proper cache headers for static content

Other methods to handle query strings in static mirrors:

  1. URL-encode the entire query string
  2. Use MD5 hashes of full URLs as filenames
  3. Implement a directory structure matching the URL path

When mirroring dynamic websites as static HTML files, query strings containing question marks (?) pose a significant challenge. Traditional web servers like Nginx treat these as dynamic requests rather than static file paths. The solution lies in rewriting URLs to replace question marks with underscores during the mirroring process.

Here's the Nginx rewrite rule that accomplishes this transformation:

location / {
    if ($args ~ "^(.*)$") {
        rewrite ^(.*?)\?(.*)$ $1_$2? last;
    }
    try_files $uri $uri/ =404;
}

Consider this practical scenario where we want to serve mirrored content:

# Original URL: http://example.com/search.php?q=nginx+rewrite
# Becomes: /search.php_q=nginx+rewrite

server {
    listen 80;
    server_name example.com;
    root /var/www/mirror;

    location / {
        if ($args ~ "^(.*)$") {
            rewrite ^(.*?)\?(.*)$ $1_$2? last;
        }
        
        try_files $uri $uri.html $uri/ =404;
    }
}

For more complex scenarios, we can extend the solution:

# Handle multiple query parameters
rewrite ^(.*?)\?(.*)$ $1_$2? last;

# Preserve original URLs while serving static files
location @static {
    rewrite ^([^?]*)\?(.*)$ $1_$2 break;
    try_files $uri =404;
}

# Alternative approach using map directive
map $request_uri $rewrite_uri {
    default $request_uri;
    "~^(?[^?]*)\?(?.*)$" "${base}_${query}";
}

server {
    ...
    rewrite ^ $rewrite_uri? last;
    ...
}

When implementing this solution:

  • Use rewrite...last for better performance in location contexts
  • Minimize regex complexity for faster processing
  • Consider caching rewritten URLs if traffic volume is high
  • Test with various query string patterns (special characters, encoding, etc.)

Enable Nginx debug logging to troubleshoot:

error_log /var/log/nginx/error.log debug;
rewrite_log on;

Check the logs for rewrite processing details and verify the transformed URIs match your expectations.