When mirroring dynamic websites as static HTML (e.g., using wget), query strings containing question marks (?
) pose a filesystem challenge. Most web servers won't serve URLs with literal question marks as static files. A common workaround is converting ?
to _
in the saved filenames.
Here's how to implement this in Nginx:
server {
listen 80;
server_name example.com;
# Convert ? to _ for static files
if ($args ~ "^(.*)$") {
rewrite ^(.*)\?(.*)$ $1_$2? last;
}
# Serve the rewritten files
location / {
try_files $uri $uri/ =404;
}
}
- This preserves the original query parameters while making them filesystem-friendly
- The
last
flag stops processing further rewrite rules - Combine with appropriate
try_files
directives for static file serving
For a more comprehensive solution handling edge cases:
map $request_uri $rewritten_uri {
default $request_uri;
"~^(?[^?]*)\?(?.*)$" "${path}_${query}";
}
server {
# ... other server config ...
if ($rewritten_uri != $request_uri) {
rewrite ^ $rewritten_uri? last;
}
location / {
root /path/to/static/files;
try_files $uri $uri/ =404;
}
}
While rewrite rules work, consider these optimizations:
- Pre-process URLs during mirroring (wget --adjust-extension)
- Use hash-based directory structures for large numbers of files
- Implement proper cache headers for static content
Other methods to handle query strings in static mirrors:
- URL-encode the entire query string
- Use MD5 hashes of full URLs as filenames
- Implement a directory structure matching the URL path
When mirroring dynamic websites as static HTML files, query strings containing question marks (?
) pose a significant challenge. Traditional web servers like Nginx treat these as dynamic requests rather than static file paths. The solution lies in rewriting URLs to replace question marks with underscores during the mirroring process.
Here's the Nginx rewrite rule that accomplishes this transformation:
location / {
if ($args ~ "^(.*)$") {
rewrite ^(.*?)\?(.*)$ $1_$2? last;
}
try_files $uri $uri/ =404;
}
Consider this practical scenario where we want to serve mirrored content:
# Original URL: http://example.com/search.php?q=nginx+rewrite
# Becomes: /search.php_q=nginx+rewrite
server {
listen 80;
server_name example.com;
root /var/www/mirror;
location / {
if ($args ~ "^(.*)$") {
rewrite ^(.*?)\?(.*)$ $1_$2? last;
}
try_files $uri $uri.html $uri/ =404;
}
}
For more complex scenarios, we can extend the solution:
# Handle multiple query parameters
rewrite ^(.*?)\?(.*)$ $1_$2? last;
# Preserve original URLs while serving static files
location @static {
rewrite ^([^?]*)\?(.*)$ $1_$2 break;
try_files $uri =404;
}
# Alternative approach using map directive
map $request_uri $rewrite_uri {
default $request_uri;
"~^(? [^?]*)\?(?.*)$" "${base}_${query}";
}
server {
...
rewrite ^ $rewrite_uri? last;
...
}
When implementing this solution:
- Use
rewrite...last
for better performance in location contexts - Minimize regex complexity for faster processing
- Consider caching rewritten URLs if traffic volume is high
- Test with various query string patterns (special characters, encoding, etc.)
Enable Nginx debug logging to troubleshoot:
error_log /var/log/nginx/error.log debug;
rewrite_log on;
Check the logs for rewrite processing details and verify the transformed URIs match your expectations.