Is your website blocking AI crawlers without you knowing?
Your website may be blocking AI crawlers right now without you knowing. Since July 1, 2025, Cloudflare, which handles roughly 20% of all web traffic, blocks AI crawlers by default for new customers. If nobody on your team changed that setting, ChatGPT, Claude, and Perplexity may be locked out of your site.
Here’s the thing. Most GEO advice starts with content: write FAQ pages, add schema, earn citations. All useful. But none of it counts if the AI crawler never gets through the door. The door check comes first, and almost nobody runs it.
It is not just the CDN. A lot of robots.txt files got a “block all AI bots” snippet in 2023 or 2024, back when the only question was training data. A Reuters Institute study found 48% of the most widely used news sites across ten countries block OpenAI’s crawlers, and many never distinguished the training bot from the search bots that send buyers back.
How do I check if my website is blocking AI crawlers?
To check if your website is blocking AI crawlers, open yourdomain.com/robots.txt and look for GPTBot, ClaudeBot, OAI-SearchBot, or PerplexityBot under a Disallow rule. Then log into your CDN, Cloudflare for most small businesses, and check the AI bot blocking setting. Five minutes, no developer needed.
Step by step:
- Type your domain into the browser and add
/robots.txtto the end. - Scan for the bot names above. A
Disallow: /line under any of them means that bot is told to stay out of your whole site. - Watch for the wildcard. A
User-agent: *followed byDisallow: /blocks everything, AI bots included. - Log into Cloudflare (or your CDN), open the bot settings, and look for the AI crawler block. On new accounts it is on by default.
- If you have server logs, search them for “GPTBot” or “PerplexityBot”. Rows of 403 responses mean your server is turning the bots away regardless of what robots.txt says.
The CDN step is the one everybody skips. Your robots.txt can say “come in” while the firewall in front of it says no, and from the outside both look identical: the AI simply never mentions you.
Which AI crawlers should I allow?
Allow the search crawlers that put your name in AI answers: OAI-SearchBot and ChatGPT-User for ChatGPT, Claude-SearchBot for Claude, PerplexityBot for Perplexity. Training bots like GPTBot and CCBot are a separate choice. Cloudflare Radar data from June 2026 shows GPTBot is the most blocked AI crawler, and many sites block every OpenAI bot in one go.
The distinction that matters: some bots collect training data and send you nothing back. Others power live search answers, and those answers are where your buyers are deciding. Block the first group if you like. Blocking the second group makes you invisible.
| Crawler | Operator | What it does | Allow it? |
|---|---|---|---|
| OAI-SearchBot | OpenAI | Powers ChatGPT search answers | Yes |
| ChatGPT-User | OpenAI | Fetches pages live when a user asks | Yes |
| Claude-SearchBot | Anthropic | Powers Claude web search (new in 2026) | Yes |
| PerplexityBot | Perplexity | Powers Perplexity answers | Yes |
| GPTBot | OpenAI | Collects model training data | Your call |
| ClaudeBot | Anthropic | Collects model training data | Your call |
| CCBot | Common Crawl | Open training dataset | Your call |
In a single-day sample of 4,047 robots.txt files parsed by Cloudflare on March 30, 2026, 13.8% mentioned GPTBot in their rules, more than any other AI crawler. The internet is genuinely split on training bots. Fine. Just don’t let that argument cost you the search bots.
Why check crawler access before any other GEO work?
Check crawler access before any other GEO work because a blocked crawler makes the rest worthless. Schema, FAQ pages, citations: none of it counts if the bot never reads the page. And the traffic at stake is good traffic. AI search visitors are worth 4.4x as much as organic search visitors, per Semrush research from June 2025.
One honest caveat on that number: Semrush measured it across 500+ digital marketing topics, a field where AI adoption runs hottest. Your category may sit lower. The direction still holds, because an AI-referred visitor has already compared options before they click.
So the order of operations is: door first, content second. Once the bots can read you, the actual GEO work begins. We covered that side in the GEO primer.
How do I fix a robots.txt that blocks AI crawlers?
To fix a robots.txt that blocks AI crawlers, remove or change the Disallow lines for the search bots you want in, then re-test the file. In Cloudflare, set the AI crawler control to allow verified search bots. Re-check after every site migration or plugin update, because these settings reset quietly.
A clean setup that keeps your training-data choice separate from your visibility looks like this:
# Search bots — these put you in AI answers
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
# Training bot — block it or not, your choice
User-agent: GPTBot
Disallow: /
Two warnings from sites we’ve looked at. First, rule order and wildcards trip people up: not every crawler resolves a specific user-agent rule against a User-agent: * block the same way, so test the file after editing. Second, robots.txt is voluntary. It controls the polite bots. The CDN setting is what actually enforces, in both directions.
Then verify the result the way your buyers would: ask ChatGPT and Perplexity about your category and see if you’re named. We run that exact count, dated and engine by engine, in the free AI-visibility audit.