When implementing rate limiting in Nginx, we often need different rules for different types of traffic. The common requirement is to apply stricter limits for regular browsers while allowing legitimate bots (like Googlebot or Bingbot) higher request rates. The challenge comes from Nginx's configuration limitations when trying to combine limit_req
directives with conditional logic.
Here's a tested configuration that properly implements user-agent based rate limiting:
http {
# Define user agent types
map $http_user_agent $limit_key {
default $binary_remote_addr;
"~*Googlebot" "googlebot";
"~*Bingbot" "bingbot";
"~*Slurp" "";
"~*NastyBot" "";
}
# Rate limit zones
limit_req_zone $binary_remote_addr zone=browser:10m rate=1r/s;
limit_req_zone $binary_remote_addr zone=bot:10m rate=10r/s;
server {
listen 80;
server_name example.com;
location / {
# Block bad bots (empty $limit_key)
if ($limit_key = "") {
return 403;
}
# Apply bot rate limit if matched
if ($limit_key ~* "bot") {
limit_req zone=bot burst=20 nodelay;
break;
}
# Default to browser rate limit
limit_req zone=browser burst=5 nodelay;
# Your regular configuration
try_files $uri $uri/ =404;
}
}
}
The solution uses Nginx's map
directive to create a variable that identifies different user agents. Important notes:
- Legitimate bots are mapped to special values (containing "bot")
- Bad bots are mapped to empty string, which triggers 403 response
- Default case uses client IP address for rate limiting
- The
break
directive prevents falling through to default limit
For more complex scenarios, you might want to:
# Multiple location blocks approach
location /api/ {
# Special rate limits for API endpoints
if ($limit_key ~* "bot") {
limit_req zone=bot_api burst=30 nodelay;
}
limit_req zone=browser_api burst=10 nodelay;
}
location / {
# Regular content rate limits
if ($limit_key ~* "bot") {
limit_req zone=bot_content burst=50 nodelay;
}
limit_req zone=browser_content burst=20 nodelay;
}
Since this solution relies on User-Agent, you should verify Googlebot requests:
# Reverse DNS verification for Googlebot
location / {
if ($http_user_agent ~* "Googlebot") {
set $googlebot 1;
}
if ($remote_addr !~* "^66\.249\.([6-9][0-9]|8[0-4])\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))$") {
set $googlebot "${googlebot}0";
}
if ($googlebot = "10") {
return 403;
}
# Rest of your configuration
}
When configuring rate limiting in Nginx, we often need different rules for various user agents. The main challenges are:
- Properly identifying bots via User-Agent strings
- Applying different rate limits without using 'if' in location blocks
- Rejecting malicious bots while allowing legitimate crawlers
Here's a tested configuration that properly implements user-agent based rate limiting:
http {
# Define user agent types
map $http_user_agent $limit_key {
default $binary_remote_addr;
"~*Googlebot" "googlebot";
"~*Bingbot" "bingbot";
"~*Slurp" "bad_bot";
"~*(nastybot|evilbot)" "bad_bot";
}
# Rate limit zones
limit_req_zone $binary_remote_addr zone=normal:10m rate=1r/s;
limit_req_zone $binary_remote_addr zone=crawlers:10m rate=10r/s;
# Bad bots zone (will be blocked)
limit_req_zone $binary_remote_addr zone=bad_bots:10m rate=1r/m;
server {
listen 80;
server_name example.com;
# Block bad bots first
if ($limit_key = "bad_bot") {
return 403;
}
location / {
# Apply normal rate limiting by default
limit_req zone=normal burst=5 nodelay;
# Apply crawler-specific rate limiting
if ($limit_key ~ "^(googlebot|bingbot)$") {
limit_req zone=crawlers burst=20 nodelay;
}
# Your regular configuration
try_files $uri $uri/ =404;
}
}
}
For complex configurations, consider using separate location blocks matched by user agent:
http {
limit_req_zone $binary_remote_addr zone=normal:10m rate=1r/s;
limit_req_zone $binary_remote_addr zone=crawlers:10m rate=10r/s;
server {
listen 80;
server_name example.com;
# Default location with normal rate limiting
location / {
limit_req zone=normal burst=5 nodelay;
try_files $uri $uri/ =404;
}
# Special handling for known bots
location @crawlers {
limit_req zone=crawlers burst=20 nodelay;
try_files $uri $uri/ =404;
}
# Block bad bots
location @bad_bots {
return 403;
}
}
# Map user agents to locations
map $http_user_agent $limit_loc {
default @default;
"~*Googlebot" @crawlers;
"~*Bingbot" @crawlers;
"~*(Slurp|nastybot)" @bad_bots;
}
}
After implementing, test your configuration with:
sudo nginx -t
sudo systemctl reload nginx
You can verify the rate limiting is working with curl commands:
# Test normal rate limiting
curl -A "Mozilla" -I http://example.com
# Test crawler rate limiting
curl -A "Googlebot" -I http://example.com
# Test bad bot blocking
curl -A "Slurp" -I http://example.com
- Regularly update your bot patterns as user agent strings evolve
- Consider adding the crawler IP validation for additional security
- Monitor your rate limit zones size (10m in examples) based on traffic
- Adjust burst values according to your specific requirements