What's in this article

      Bad bots can cause trouble, attacking websites, trying to bring them down! Even legitimate bots can get out of control, making huge numbers of requests to websites! Let's see how we can deal with this issue...

      The problem with bots

      "Bots" in this context refers not to walking and talking robots, but to web bots, which are automated programmes that perform simple tasks at speed. 

      An example of a bot, is "Googlebot", which is Google's own bot that is constantly discovering new web pages and indexing content, so Google can show results when we search.

      Other bots are not so useful and could be designed to attack websites, trying to bring them down!

      But even legitimate bots can get out of control, I've seen it many times before.  We've seen bots that belong to web applications go mad, making huge numbers of requests to websites.  Sometimes it could be because of a bug or glitch, other times it might be by design but still causing us issues.

      The huge number of requests can cause a massive load on server and cause websites to struggle or even go down.

      Our websites are really for humans to use, so it can get to the point where we need to take action...

      Some bots are legitimate, some are bad!

      As mentioned some bots are "good".  Google bot is a good bot, we wouldn't want to block that, noone would find our website!

      So are some others from SEO tools and other legitimate software that needs to look at sites in order to assess their content.

      Human Security covers the list of the common good bots in their article.

      Unwanted bots often orginate from other countries like Russia or China, where all we actually care about is users in the UK.  I appreciate you may want traffic from other countries, but we're just saying as a UK business, we want to focus there.

      Find which bots are targeting your website

      You can find which bots are hitting your site in your Apache logs here:

      In cPanel:

      /var/log/apache2/error_log

      Using SSH:

      tail -f /var/log/apache2/access.log
      

      In Plesk:

      /var/www/vhosts/<domain.tld>/logs/
      

      Here's an example of bad bots in the Apache log, thanks to ChatGPT:

      192.241.200.101 - - [14/Sep/2025:12:03:15 +0000] "GET /wp-login.php HTTP/1.1" 200 3267 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)"
      
      185.220.101.4 - - [14/Sep/2025:12:05:22 +0000] "GET /robots.txt HTTP/1.1" 200 68 "-" "AhrefsBot/7.0 (+http://ahrefs.com/robot/)"
      
      45.146.165.55 - - [14/Sep/2025:12:07:09 +0000] "GET /admin/ HTTP/1.1" 404 502 "-" "Mozilla/5.0 (compatible; CensysInspect/1.1; +https://about.censys.io/)"
      
      185.191.171.35 - - [14/Sep/2025:12:08:46 +0000] "GET /?author=1 HTTP/1.1" 301 178 "-" "python-requests/2.26.0"
      
      222.186.30.112 - - [14/Sep/2025:12:10:02 +0000] "POST /xmlrpc.php HTTP/1.1" 200 712 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0"
      

      Why these look suspicious:

      • MJ12bot, AhrefsBot, CensysInspect → known crawlers that often hammer sites heavily.
      • python-requests → usually scripts or scrapers, not browsers.
      • Scanning patterns like /wp-login.php, /xmlrpc.php, /admin/, /?author=1 are often used in brute force or vulnerability scans.
      • IP ranges (like 185.220.x.x) are sometimes Tor exit nodes or botnets.

       

      Bot army

      .htaccess rules to block bots

      Here are some example rules you can add to your .htaccess file to block certain bots.  Note, the exact syntax of .htaccess rules can vary depending on your server technology stack, so you may need to research further to find one that works for you.

      For most websites, you'll have an .htaccess file in the root of your site and you can add some rules to block bad bots.

      htaccess-file.jpg

      1. Locate your .htaccess  file in your website
      2. Add the rules on this page
      3. Test your site!

      Warning! - You can break your site by adding incorrect or malformed rules in your .htaccess file.  I recommend adding one at a time and testing your site still loads in your browser before going onto the next.  If in doubt, add them on a deveopment or staging site.

      Blocking bots by name

      We can block bots by HTTP_USER_AGENT in our .htaccess file.

      Here's the general format to block the bots named in the rules below (substitute BOTNAME, BOTNAME2 and BOTNAME3):

      RewriteEngine On
      RewriteCond %{HTTP_USER_AGENT} (BOTNAME|BOTNAME2|BOTNAME3) [NC]
      RewriteRule (.*) - [F,L]

      So for blocking a bot called SiteAuditBot we would add the following rule

      RewriteEngine On
      RewriteCond %{HTTP_USER_AGENT} (SiteAuditBot) [NC]
      RewriteRule (.*) - [F,L]
      

      SiteAuditBot is actually a legitimate bot used by software called SemRush, however, it can still make a lot of requests to our site, so we can block it if we want to.

      And you can simply add more bot names, by adding "|" between their names.

      Blocking bots via referrer

      We can also use HTTP_REFERER to block bots originating from certain domains:

      # Block via Referrer
      <IfModule mod_rewrite.c>
          RewriteEngine On
          RewriteCond %{HTTP_REFERER} ^http://(.*)spamreferrer1\.org [NC,OR]
          RewriteCond %{HTTP_REFERER} ^http://(.*)bandwidthleech\.com [NC,OR]
          RewriteCond %{HTTP_REFERER} ^http://(.*)contentthieves\.ru [NC]
          RewriteRule (.*) - [F,L]
      </IfModule>

      Source: https://chemicloud.com/kb/article/block-bad-bots-and-spiders-using-htaccess/

      Here's another rule you can try out to block bots by name:

      SetEnvIfNoCase User-Agent "Yandex" bad_bot    
      SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot    
      SetEnvIfNoCase User-Agent "MJ12bot" bad_bot
      
      <IfModule mod_authz_core.c>
       <Limit GET POST>
        <RequireAll>
         Require all granted
         Require not env bad_bot
        </RequireAll>
       </Limit>
      </IfModule>

      Source: https://stackoverflow.com/questions/30936220/how-to-block-bot-bot-via-htaccess

      Blocked bot

      Create a blacklist of certain bots in .htaccess

      You can also individually, block bots if you know their name.

      Turn on the RewriteEngine, and set the target to the root using RewriteBase, then list some bots you want to block:

      RewriteEngine on
      RewriteBase /
      RewriteCond %{HTTP_USER_AGENT} almaden [OR]
      RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
      RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
      ...

      Source: https://perishablepress.com/ultimate-htaccess-blacklist/ - see here for complete list of bots to block

      So, this will block the bots called almaden, Anarchie etc.

      Blocking bots by IP address in .htaccess

      When we block by IP address, we can block bots, but also unwanted users in general who may be attacking our website.

      This is nice and simple to do, by using the deny directive:

      deny from 123.123.123.123

      You can also block a range of IP addresses like this:

      To block the range 123.123.123.1 – 123.123.123.255, use 123.123.123.0/24
      
      To block the range 123.123.64.1 – 123.123.127.255, use 123.123.123.0/18

      Source: inmotionhosting.com

      We sometimes find we have to block a range of IP addresses when our servers are being hit hard from certain geographical locations!

      Another way to block by IP address is to use the following rules:

      To block multiple IPs:

      # Block multiple IPs
      <IfModule mod_rewrite.c>
          RewriteEngine On
          RewriteCond %{REMOTE_ADDR} ^123\.456\.789\.000$ [OR]
          RewriteCond %{REMOTE_ADDR} ^222\.333\.444\.555$ [OR]
          RewriteCond %{REMOTE_ADDR} ^111\.222\.333\.444$
          RewriteRule (.*) - [F,L]
      </IfModule>

      For blocking a range of IPs:

      # Block a range of IPs
      <IfModule mod_rewrite.c>
          RewriteEngine On
          RewriteCond %{REMOTE_ADDR} ^123\. [OR]
          RewriteCond %{REMOTE_ADDR} ^111\.222\. [OR]
          RewriteCond %{REMOTE_ADDR} ^444\.555\.777\.
          RewriteRule (.*) - [F,L]
      </IfModule>

      Source: https://chemicloud.com/kb/article/block-bad-bots-and-spiders-using-htaccess/

      Blocking AI bots in robots.txt

      Your websites robots.txt is another file in your website for managing the access to your site.  We use it to block certain directories and pages that we don't want Google to index, such as a login page or test page, but we can also use it for blocking bots.

      Not everyone is a fan of letting AI bots run amok on their website, and if you are part of this school of thought then there is good news - you can block 'em!

      Your robots.txt file is also found in your websites root directory:

      robots.jpg

      So if you wanted to block ChatGPT and Gemini you can add the following rules:

      #Block OpenAI’s GPTBot
      
      User-agent: GPTBot
      Disallow: /
      
      #Block Google’s Gemini
      
      User-agent: Google-Extended
      Disallow: /

      Source: https://datadome.co/learning-center/block-ai-bots/

      An alternative to blocking: Limiting

      There is an alternative to totally blocking a bot or user, which is to rate-limit them.

      This means, if they make a certain number of requests over a certain time, we can then stop them accessing our server or web page.

      See this article for rate limiting using Cloudflare.

      Bots: One blocked, other ok

      Another alternative: Use software

      There are some great pre-existing software products you can use instead to detect, block and manage bot access to your site, here are some examples:

      Cloudflare Bot Management

      Cloudflare says we can manage good and bad bots in real-time with speed and accuracy by harnessing data from the millions of Internet properties on Cloudflare.

      As a huge CDN they certainly have access to a lot of data regarding internet traffic and have the infrastructure to block bad bots.

      Read more here: https://www.cloudflare.com/en-gb/application-services/products/bot-management/

      DataDome

      Stop sophisticated bots on your websites, mobile apps, and APIs with real-time, easy-to-use, accurate protection from DataDome.

      DataDome prevents fraud, data breaches and site performance issues resulting from bad bots hitting your website. It also prevents bad bots from affecting marketing and sales analytics by adding unwanted hits.

      Discover more about DataDone here: https://datadome.co/products/bot-protection/

      Sucuri

      With a range of services including WAF (Website Application Firewall) and Monitoring & Detection and Performance Boost, Sucuri offer a highly technical team of security professionals distributed around the world, each trained in identifying and fixing any issues you might be faced with. Consider us an extension of your existing team.

      Their plugin offers a host of features for your WordPress based website. 

      Read more on their site here: https://sucuri.net/website-firewall-a/bot-mitigation/

      Search Google 

      The above is just a few solutions, you'll find many more via Google with this search.

      What's next?

      Bots can be a real pest, plaguing your website with fake "traffic" and even threatening to bring it down altogether.

      Hopefully this article helps to clarify some of the issues and solutions around controlling bots access to your website.

      Get in touch if you need help with bots,  or post below if you have any comments on this article.

      Article by David Reeder. LinkedIn Profile:

      Related Articles

      Keep up to date

      Subscribe to receive occasional email newsletters from us.