When managing multiple development sites under Apache virtual hosts, maintaining individual robots.txt files becomes cumbersome. Each time we deploy a site to production, we need to:
- Remove or modify the development robots.txt
- Ensure the production version is properly configured
- Maintain consistency across all environments
We can use Apache's mod_rewrite to intercept all robots.txt requests and serve a single master file:
# In httpd.conf or virtual host configuration
RewriteEngine On
RewriteRule ^/robots\.txt$ /path/to/central-robots.txt [L]
Option 1: Alias Directive
For simpler cases where all sites should share the exact same robots.txt:
Alias /robots.txt /var/www/shared/robots.txt
Option 2: Environment-Based Configuration
For more control based on server environment:
<IfDefine DEVELOPMENT>
RewriteEngine On
RewriteRule ^/robots\.txt$ /path/to/dev-robots.txt [L]
</IfDefine>
Here's a complete virtual host configuration example:
<VirtualHost *:80>
ServerName dev-site1.example.com
DocumentRoot "/var/www/site1"
# Central robots.txt for all dev sites
RewriteEngine On
RewriteRule ^/robots\.txt$ /usr/local/apache/conf/no-crawl-robots.txt [L]
# Other configurations...
</VirtualHost>
- Place the shared robots.txt outside document roots
- Use absolute paths for reliability
- Test with curl:
curl -I http://yoursite/robots.txt
- Consider adding Cache-Control headers for the robots.txt file
Common issues and solutions:
# Check if mod_rewrite is loaded
apachectl -M | grep rewrite
# Verify the rewrite rule works
tail -f /var/log/apache2/error_log
# Test the rule directly
curl -v http://localhost/robots.txt
<VirtualHost *:80>
ServerName dummy.example.com
DocumentRoot "C:/xampp/htdocs"
RewriteEngine On
RewriteRule ^/robots\.txt$ /global-robots.txt [L]
</VirtualHost>
For multi-site XAMPP environments, maintaining separate robots.txt files becomes tedious. Apache's mod_rewrite provides an elegant solution through path rewriting. The key is implementing this at server config level rather than per vhost.
# In httpd.conf or vhosts.conf
RewriteMap robots txt:/path/to/robots-mapping.txt
RewriteRule ^/(.*)/robots\.txt$ ${robots:%{SERVER_NAME}} [L]
For Windows-based XAMPP installations, ensure proper path escaping:
# Windows path example
RewriteRule ^/robots\.txt$ "C:/xampp/global_config/block-all-robots.txt" [L]
For dynamic content prevention:
RewriteCond %{REQUEST_URI} ^/robots\.txt$
RewriteRule .* - [E=NO_ROBOTS:1]
Header set X-Robots-Tag "noindex, nofollow" env=NO_ROBOTS
Verify your configuration using curl:
curl -I http://localhost/robots.txt
curl -v http://dev.site1/robots.txt
curl -v http://dev.site2/robots.txt
The response should show the same content-length for all virtual hosts when requesting robots.txt.
For high-traffic environments, consider caching the robots.txt content:
<FilesMatch "^robots\.txt$">
Header set Cache-Control "max-age=86400, public"
</FilesMatch>