How to Block Web Crawlers/Bots in IIS 7.5 and 8.0 Using Request Filtering and URL Rewrite Rules


4 views

When dealing with unwanted web crawlers or bots in IIS, you have several effective approaches. Unlike Apache's .htaccess method, IIS requires different techniques that leverage its native modules. Let me share the most practical solutions I've implemented in production environments.

The most security-focused method uses IIS's built-in Request Filtering module, which evolved from the older URLScan tool. Here's a production-tested configuration:

<system.webServer>
    <security>
        <requestFiltering>
            <filteringRules>
                <filteringRule name="BlockBadBots" scanUrl="false" scanQueryString="false">
                    <scanHeaders>
                        <clear />
                        <add requestHeader="User-Agent" />
                    </scanHeaders>
                    <appliesTo>
                        <clear />
                    </appliesTo>
                    <denyStrings>
                        <clear />
                        <add string="YandexBot" />
                        <add string="SemrushBot" />
                        <add string="AhrefsBot" />
                    </denyStrings>
                </filteringRule>
            </filteringRules>
        </requestFiltering>
    </security>
</system.webServer>

For more flexibility, the URL Rewrite module offers powerful pattern matching. This example blocks multiple bots simultaneously:

<rule name="BlockScrapers" patternSyntax="Wildcard" stopProcessing="true">
    <match url="*" />
    <conditions logicalGrouping="MatchAny">
        <add input="{HTTP_USER_AGENT}" pattern="YandexBot" />
        <add input="{HTTP_USER_AGENT}" pattern="SemrushBot" />
        <add input="{HTTP_USER_AGENT}" pattern="MJ12bot" />
    </conditions>
    <action type="CustomResponse" statusCode="403" 
            statusReason="Forbidden: Bot Access Denied" 
            statusDescription="This website restricts automated scraping" />
</rule>

For high-traffic sites, I recommend combining both methods:

  • Use Request Filtering for known malicious bots (lower overhead)
  • Use URL Rewrite for complex patterns or temporary blocks

For sophisticated protection, integrate with IP denylists using this rewrite map approach:

<rewrite>
    <rules>
        <rule name="DynamicBotBlock" stopProcessing="true">
            <match url=".*" />
            <conditions>
                <add input="{HTTP_USER_AGENT}" pattern="bot|crawl|spider" ignoreCase="true" />
                <add input="{BotBlockList:{REMOTE_ADDR}}" pattern="1" />
            </conditions>
            <action type="CustomResponse" statusCode="429" 
                    statusReason="Too Many Requests" 
                    statusDescription="Bot traffic quota exceeded" />
        </rule>
    </rules>
    <rewriteMaps>
        <rewriteMap name="BotBlockList">
            <add key="192.168.1.100" value="1" />
            <add key="10.0.0.15" value="1" />
        </rewriteMap>
    </rewriteMaps>
</rewrite>

Always test using curl with custom user agents:

curl -A "YandexBot" http://yoursite.com/robots.txt

The expected response should be 403 Forbidden when your rules are working correctly.


When dealing with unwanted crawlers like YandexBot or other aggressive scrapers, IIS administrators need robust solutions that go beyond basic configuration. Unlike Apache's .htaccess approach, IIS requires different techniques that leverage its native modules.

The most security-focused approach uses IIS's built-in Request Filtering module, which traces its roots to the old URLScan tool. This method operates at a lower level than URL rewriting and provides better performance for blocking known bad actors.

<system.webServer>
    <security>
        <requestFiltering>
            <filteringRules>
                <filteringRule name="BlockBadBots" scanUrl="false" scanQueryString="false">
                    <scanHeaders>
                        <clear />
                        <add requestHeader="User-Agent" />
                    </scanHeaders>
                    <appliesTo>
                        <clear />
                    </appliesTo>
                    <denyStrings>
                        <clear />
                        <add string="SemrushBot" />
                        <add string="AhrefsBot" />
                        <add string="MJ12bot" />
                    </denyStrings>
                </filteringRule>
            </filteringRules>
        </requestFiltering>
    </security>
</system.webServer>

For more complex blocking scenarios, the URL Rewrite module offers greater flexibility. This is particularly useful when you need to:

  • Combine multiple conditions (e.g., User-Agent + IP range)
  • Return custom response codes
  • Log blocked requests
<rule name="BlockAggressiveCrawlers" stopProcessing="true">
    <match url=".*" />
    <conditions logicalGrouping="MatchAny">
        <add input="{HTTP_USER_AGENT}" pattern="DotBot" />
        <add input="{HTTP_USER_AGENT}" pattern="Barkrowler" />
        <add input="{REMOTE_ADDR}" pattern="^192\.0\.2\.\d+" />
    </conditions>
    <action type="CustomResponse" 
            statusCode="403" 
            statusReason="Forbidden" 
            statusDescription="Automated crawling not permitted" />
</rule>

For sophisticated bot detection, combine multiple IIS features:

<rule name="AdvancedBotDetection">
    <match url="(.*)" />
    <conditions logicalGrouping="MatchAll">
        <add input="{HTTP_USER_AGENT}" pattern="(bot|crawler|spider)" />
        <add input="{HTTP_ACCEPT}" pattern="(application/json|text/xml)" />
        <add input="{REQUEST_RATE}" pattern="([5-9]\d{2,}|[1-9]\d{3,})" />
    </conditions>
    <action type="Redirect" url="http://example.com/bot-blocked" appendQueryString="false" />
</rule>

When implementing bot blocking at scale:

  • Request Filtering rules execute before URL Rewrite rules
  • Complex regex patterns impact performance more than simple string matches
  • Consider using Failed Request Tracing to monitor blocking effectiveness

From production experience:

<!-- Block Chinese search engines while allowing Google/Bing -->
<rule name="BlockChineseSE">
    <match url="(.*)" />
    <conditions>
        <add input="{HTTP_USER_AGENT}" pattern="(Baiduspider|Sogou|360Spider)" />
    </conditions>
    <action type="AbortRequest" />
</rule>