Apache Virtual Hosts: Centralized robots.txt Configuration for All Domains


2 views

When managing multiple development sites under Apache virtual hosts, maintaining individual robots.txt files becomes cumbersome. Each time we deploy a site to production, we need to:

  1. Remove or modify the development robots.txt
  2. Ensure the production version is properly configured
  3. Maintain consistency across all environments

We can use Apache's mod_rewrite to intercept all robots.txt requests and serve a single master file:

# In httpd.conf or virtual host configuration
RewriteEngine On
RewriteRule ^/robots\.txt$ /path/to/central-robots.txt [L]

Option 1: Alias Directive

For simpler cases where all sites should share the exact same robots.txt:

Alias /robots.txt /var/www/shared/robots.txt

Option 2: Environment-Based Configuration

For more control based on server environment:

<IfDefine DEVELOPMENT>
    RewriteEngine On
    RewriteRule ^/robots\.txt$ /path/to/dev-robots.txt [L]
</IfDefine>

Here's a complete virtual host configuration example:

<VirtualHost *:80>
    ServerName dev-site1.example.com
    DocumentRoot "/var/www/site1"
    
    # Central robots.txt for all dev sites
    RewriteEngine On
    RewriteRule ^/robots\.txt$ /usr/local/apache/conf/no-crawl-robots.txt [L]
    
    # Other configurations...
</VirtualHost>
  • Place the shared robots.txt outside document roots
  • Use absolute paths for reliability
  • Test with curl: curl -I http://yoursite/robots.txt
  • Consider adding Cache-Control headers for the robots.txt file

Common issues and solutions:

# Check if mod_rewrite is loaded
apachectl -M | grep rewrite

# Verify the rewrite rule works
tail -f /var/log/apache2/error_log

# Test the rule directly
curl -v http://localhost/robots.txt


<VirtualHost *:80>
    ServerName dummy.example.com
    DocumentRoot "C:/xampp/htdocs"

    RewriteEngine On
    RewriteRule ^/robots\.txt$ /global-robots.txt [L]
</VirtualHost>

For multi-site XAMPP environments, maintaining separate robots.txt files becomes tedious. Apache's mod_rewrite provides an elegant solution through path rewriting. The key is implementing this at server config level rather than per vhost.

# In httpd.conf or vhosts.conf
RewriteMap robots txt:/path/to/robots-mapping.txt
RewriteRule ^/(.*)/robots\.txt$ ${robots:%{SERVER_NAME}} [L]

For Windows-based XAMPP installations, ensure proper path escaping:

# Windows path example
RewriteRule ^/robots\.txt$ "C:/xampp/global_config/block-all-robots.txt" [L]

For dynamic content prevention:

RewriteCond %{REQUEST_URI} ^/robots\.txt$
RewriteRule .* - [E=NO_ROBOTS:1]
Header set X-Robots-Tag "noindex, nofollow" env=NO_ROBOTS

Verify your configuration using curl:

curl -I http://localhost/robots.txt
curl -v http://dev.site1/robots.txt
curl -v http://dev.site2/robots.txt

The response should show the same content-length for all virtual hosts when requesting robots.txt.

For high-traffic environments, consider caching the robots.txt content:

<FilesMatch "^robots\.txt$">
    Header set Cache-Control "max-age=86400, public"
</FilesMatch>