I will make three arguments for leaving your website totally unblocked for AI bots:
Present-day traffic acquisition
Present-day brand exposure
Future developments in LLM-powered tools
Present-day traffic acquisition
I recently saw Wil Reynolds speak at SMX Munich, and in his talk, he made the powerful case for ChatGPT (or similar tools) as a significant present-day or near-future acquisition channel. He’s outlined a similar argument in this blog post, along with a methodology for getting a feel for how affected your business might be. I recommend you check it out.
This will definitely vary from one business to the next. That said, my present experience is that:
I’ve made the case elsewhere that I don’t think generative AI is a like-for-like replacement for search. It’s a different tool with different uses. But you should assess this for your business.
In the case of “Google-Extended,” we also have to consider whether we think this affects Google Search as an acquisition channel. For now, Google says not, a claim which some people are understandably skeptical about. Either way, this may change rapidly if and when Google introduces generative AI search features.
Present-day brand exposure
Also, at SMX Munich, I saw Rand Fishkin make the case that digital marketers get too hung up on attributing the bottom of the funnel, which is increasingly difficult, and should instead take a leaf out of the book of pre-web marketers who valued impressions, footfall, and similar “vanity metrics.” I agree! In fact, I wrote about it back in 2019. That post has since fallen victim to a slightly questionable site migration, but I’ve reuploaded it here.
On a similar basis, maybe I should not only care about whether ChatGPT (or indeed other outputs from LLM models, such as AI-written content) drive me traffic but also simply whether they mention my brand and products. Preferably, in the same way I would.
If I prevent these models from accessing the pages where I talk about my products, and if I also subscribe to the argument above that preventing access does meaningfully affect what content the models can ingest, then I am making it less likely that they will be mentioned in an accurate way, or indeed at all.
This could be particularly impactful in a case where I launch a new product or rebrand — anything new will be ingested only via external sources, which may be less than positive, or, again, inaccurate.
Future developments in LLM-powered tools
What if we accept that current tools built on generative AI are not major acquisition channels? Will that always be the case? What if I block GPTBot now, and then in a year or two, OpenAI launches a search engine built on the index it has built?
Perhaps at that point, one might make a swift U-turn. Will it be swift enough? Often, these models are not exactly at Google’s level when it comes to quickly ingesting new content. Presumably, though, in order to be a competitive search engine, they would have to be? Or would they use Bing’s index and crawler? One might also make the argument that these models could use the (original?) content itself as an authority signal, as opposed to (for example) links, user signals, or branded search volume. Personally, I find that impractical and, as such, unlikely, but it’s all a big unknown, and that’s the point with all this — the uncertainty itself is not an attractive proposition.
On top of that, a search engine is only one (more likely) possibility — a few years ago, we would not have imagined ChatGPT would be as impactful as it has been.