← nginx 3 min read
Contents

Configuring NGINX for Blocking and Rate Limiting Bots

Managing bots effectively is essential for optimizing server performance and protecting against unwanted traffic. Below is a guide for setting up an NGINX configuration that allows you to block or rate-limit bots based on two text files: botblocklist.txt and botratelimit.txt. This method enables you to control bot behavior without modifying the NGINX configuration for every new bot, making the system more flexible and scalable.

1. botblocklist.txt: Block List for Bots

The botblocklist.txt file contains a list of bots that can be blocked. Each bot’s status is initially set to 0, meaning no bots are blocked by default. Change the value to 1 to block a specific bot.

Example content of botblocklist.txt:

AhrefsBot=0
MJ12bot=0
SemrushBot=0
Baiduspider=0
DotBot=0
YandexBot=0
GPTBot=0
Sogou Spider=0
CCBot=0
Yeti=0
Rogerbot=0
Screaming Frog SEO Spider=0
Lumar=0
OnCrawl=0
Googlebot=0
Bingbot=0
Slurp=0
DuckDuckBot=0
FacebookExternalHit=0
Pinterestbot=0
Twitterbot=0
LinkedInBot=0
Amazonbot=0
Amazonproductbot=0

To block any bot, set its value to 1. For example, to block AhrefsBot, modify the file as follows:

AhrefsBot=1

2. botratelimit.txt: Rate Limiting Bots

The botratelimit.txt file controls bots that will be rate-limited. Initially, all bots are set to 0 (no rate limiting). Like the block list, set the value to 1 for any bot you want to rate limit.

Example content of botratelimit.txt:

AhrefsBot=0
MJ12bot=0
SemrushBot=0
Baiduspider=0
DotBot=0
YandexBot=0
GPTBot=0
Sogou Spider=0
CCBot=0
Yeti=0
Rogerbot=0
Screaming Frog SEO Spider=0
Lumar=0
OnCrawl=0
Googlebot=0
Bingbot=0
Slurp=0
DuckDuckBot=0
FacebookExternalHit=0
Pinterestbot=0
Twitterbot=0
LinkedInBot=0
Amazonbot=0
Amazonproductbot=0

To rate-limit a bot, change its value to 1. For example, to rate limit Googlebot, you would modify the file as follows:

Googlebot=1

3. NGINX Configuration for Blocking and Rate Limiting

In your NGINX configuration file, map the user-agent strings to $block_bot and $rate_limit_bot using the botblocklist.txt and botratelimit.txt files.

# Map for blocking bots
map $http_user_agent $block_bot {
    default 0;
    include /etc/nginx/botblocklist.txt;
}

# Map for rate-limiting bots
map $http_user_agent $rate_limit_bot {
    default 0;
    include /etc/nginx/botratelimit.txt;
}

4. Rate Limit Zone Definition

Define a rate-limiting zone in NGINX to handle requests from bots listed in the botratelimit.txt file. This example limits requests to 5 per 10 seconds from the same IP address.

limit_req_zone $binary_remote_addr zone=bot_zone:10m rate=5r/10s;

5. Server Block for Handling Bot Behavior

In your NGINX server block, add conditions to block or rate-limit bots based on their status in the respective text files.

server {
    listen 80;
    server_name your_domain.com;

    # Block bots if they are listed in botblocklist.txt
    if ($block_bot) {
        return 403;
    }

    # Rate-limit bots if they are listed in botratelimit.txt
    if ($rate_limit_bot) {
        limit_req zone=bot_zone burst=10 nodelay;
    }

    location / {
        proxy_pass http://backend;
    }
}

6. Applying and Restarting NGINX

Once you have made the necessary changes to your configuration and text files, restart NGINX to apply the updated settings.

sudo systemctl restart nginx

Final Notes

This setup ensures that you have fine-tuned control over which bots are blocked or rate-limited simply by modifying the botblocklist.txt and botratelimit.txt files. Any future bots can be managed through these files without altering the NGINX configuration itself, making it easy to update and maintain.

This configuration is scalable and flexible, providing an efficient way to manage bot traffic and protect your server from unwanted or abusive behavior.