AI Bots — Who is Blocking and Why?

I wrote an article in April covering some of the arguments for and against blocking “AI bots” – at the time, particularly GPTbot and Google-Extende– and the potential consequences of doing so. If my Twitter/X fee is anything to go by, the consensus on blocking AI bots within the SEO industry seems to be very much against it, with the reasonable premise being that it is or will become important for brands to appear in the answers/outputs of Large Language Models (LLMs), in the same way that it’s important to appear in Google search results today.

However, a very significant chunk

of authoritative sites are choosing overseas data to block one or many AI bots. This could well be linked to a number of large meia brands signing deals with OpenAI – perhaps considering robots.txt exclusion to be part of their leverage. For example, Dotdash Merdith, Vox Media and The Atlantic, the Financial Times, AP, Axel Springer, and News Corp. I said in that April article that to hope to damage the potential for AI-written competitors to your site, you’d probably need significant collective or mass action in most verticals. Evidently, the calculation is that some of these publishing giants represent a pretty big chunk of the available content on some topics all on their own.

It’s worth mentioning at this point that robots.txt is not enforcd in law of any kind. It’s an internet norm and there is a negative publicity cost to ignoring it (which I’ll mention again shortly), but you’d have to go a little further than a robots.txt line to fully block traffic.

Now, I want to look a little closer at the expandd range of blockable AI bots that have appeare this year, as well as at who is blocking them and why.

AI bot timeline: The new arrivals

overseas data

Let’s take a quick look at the timeline:

  • 2008 – Start of Common Crawl

  • 7th August 2023 – GPTBot (OpenAI)

  • 28th September 2023 – Googlebot-Extendd

  • November 2023 – First known documentation of PerplexityBot

  • 14th June 2024 – Applebot-Extendd

  • June 2024 – PerplexityBot controversies

  • July 25th 2024 – OpenAI announces SearchGPT prototype, accompani by OAI-SearchBot

This isn’t exhaustive but covers learn about 5 types of artificial intelligence and how to use it some of the main events. I wasn’t able to find any concrete timeline for Anthropic, the main player I’ve not mentiond in this timeline.

With OpenAI, Google

and Apple, there seems to be a playbook of “scrape everything we nd, then publicly announce how to block crawling”, which feels a touch disingenuous. And definitely fees into the argument that little is achieve by blocking so late in that process.

Perplexity also got themselves into a whole mess around whether they, in fact, even respect this robots.txt rule. Supposdly, they were outsourcing crawling to a third party, who didn’t, and robots.txt, of course, as mentioned above. Is not a asb directory law but rather a commonly respected internet norm. Nonetheless, their partner in AWS got a touch upset about this, as did much of the tech press.

Anyway, without further ado…

Leave a comment

Your email address will not be published. Required fields are marked *