87% of Websites Block AI Crawlers Without Knowing It
38% of websites block at least one major AI crawler in their robots.txt, and most don't realize it. Our scan reveals which bots are blocked most and which industries are most restrictive.
Founder & CEO at AgentReady
Invisible by Accident: The Blocking Epidemic
The title says 87% and I need to clarify: 38% of websites actively block at least one major AI crawler through explicit robots.txt directives. The remaining 49% that make up the 87% don't block crawlers directly, but serve content in ways that AI crawlers can't effectively parse -- JavaScript rendering, login walls, or broken sitemaps.
But the intentional blocking is the story. When we built AgentReady™ and started scanning robots.txt files across 5,000 sites, I expected to find a handful of sites with outdated configurations. Instead, we found a systemic pattern: sites are blocking AI crawlers they don't know exist, using rules they didn't write, from templates they copied years ago.
12% of sites block every major AI crawler -- GPTBot, ClaudeBot, PerplexityBot, and others. These sites have made themselves completely invisible to AI-powered search and recommendation systems. Many are businesses that would benefit enormously from AI visibility. They just don't know they've opted out.
Which AI Crawlers Get Blocked Most
Not all AI crawlers face equal resistance. The blocking rates from our dataset reveal clear patterns.
GPTBot is blocked most frequently at 22%. This makes sense: GPTBot was one of the first AI crawlers to receive widespread attention, and many of the early robots.txt templates that circulated in 2024 specifically targeted it. When site owners Googled "how to block AI from scraping my site," GPTBot was the primary target.
ClaudeBot is blocked on 19% of sites. Anthropic's crawler is newer but was quickly added to blocking templates after GPTBot. Many of the sites that block GPTBot also block ClaudeBot, suggesting they used the same template or plugin.
PerplexityBot is blocked on 8%, Google-Extended on 7%, and CCBot (Common Crawl) on 15%. The lower rate for PerplexityBot likely reflects its lower public profile rather than intentional acceptance.
What's striking is the correlation: of sites that block any AI crawler, 72% block two or more. Blocking tends to be all-or-nothing rather than selective, which suggests it's driven by templates and plugins rather than deliberate policy decisions.
AI Crawler Blocking Rates Across 5,000 Sites
Industries That Block Most: Finance and Healthcare Lead
Bot blocking rates vary dramatically by industry. Finance leads at 52% of sites blocking at least one AI crawler. Healthcare follows at 48%. Both industries have legitimate regulatory concerns about data exposure, but the blocking often extends far beyond sensitive content to include marketing pages, blog posts, and educational resources that would benefit from AI visibility.
The irony is thick. Financial advisors and healthcare providers who publish helpful content specifically to attract potential clients are simultaneously preventing AI systems from discovering and recommending that content. A financial planning firm that blocks GPTBot loses visibility in every ChatGPT conversation where someone asks for financial planning advice.
Tech/SaaS has the lowest blocking rate at 14%, followed by Media & Publishing at 18%. These industries were fastest to recognize AI traffic as valuable. Several major publishers who initially blocked AI crawlers in 2024 reversed course in 2025 after seeing the referral traffic data.
Our robots.txt guide shows how to configure selective access that addresses security concerns without sacrificing AI visibility.
- Finance: 52% block at least one AI crawler
- Healthcare: 48%
- Legal: 41%
- Real Estate: 38%
- Education: 32%
- E-Commerce: 28%
- B2B Services: 24%
- Travel: 22%
- Media & Publishing: 18%
- Tech/SaaS: 14%
Where These Blocks Come From
We categorized the source of AI crawler blocks and found three primary origins.
Security plugins and CDN defaults (42% of blocks). Services like Sucuri, Wordfence, and some Cloudflare configurations include AI crawler blocking in their default or recommended security settings. Site owners enable "bot protection" without realizing it includes legitimate AI crawlers alongside malicious bots.
Copied robots.txt templates (35% of blocks). Many sites use robots.txt files copied from online guides, Stack Overflow answers, or CMS templates that include AI crawler blocks. These templates spread virally in 2024 when the dominant narrative was about protecting content from AI training. The templates persist long after the conversation has evolved.
CMS platform defaults (23% of blocks). Some CMS platforms ship with robots.txt configurations that restrict AI crawlers. Shopify's default robots.txt, for example, blocks several paths that AI crawlers would otherwise index. Wix has historically been restrictive in its centrally managed robots.txt.
The common thread is that most blocks are inherited, not intentional. When we surveyed 200 site owners whose sites blocked AI crawlers, 78% were unaware of the blocks. Only 8% had deliberately chosen to block AI access.
The Score Impact: Blocking Costs More Than You Think
Bot Access & Crawlability carries the highest weight in our scoring framework at 25%, and for good reason. Blocking AI crawlers doesn't just reduce your score -- it caps it.
Sites that block all AI crawlers have an average overall score of 31, regardless of how well they perform in other categories. You can have perfect schema markup, excellent content, and strong authority signals, but if AI agents can't reach your pages, none of it matters.
Sites that block some crawlers but not all average 48, a 17-point penalty compared to sites with no blocks (average 62 among non-blocking sites). Even partial blocking is expensive because different AI systems use different crawlers, and blocking any of them reduces your reach.
The most important metric: unblocking AI crawlers is worth an average of +14 points to your overall score, making it the single highest-ROI fix in our entire framework. It costs nothing, takes 5 minutes, and has no downside for 95% of websites.
See the full walkthrough in our robots.txt AI crawler guide.
How to Audit Your Own robots.txt in 60 Seconds
You can check right now. Navigate to yourdomain.com/robots.txt in your browser and search for these user-agent strings: GPTBot, ClaudeBot, PerplexityBot, CCBot, Google-Extended, Bytespider, Amazonbot.
If you see any of these followed by Disallow: /, that crawler is blocked from your entire site. If you see them with specific path restrictions, they're partially blocked.
For a complete AI readiness audit that goes beyond robots.txt to check schema, protocols, and content quality, use our free scanning tool. It takes 30 seconds and gives you a full breakdown across all six scoring categories.
# Example: robots.txt that blocks AI crawlers
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
# Fix: Replace with selective access
User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /private/
User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /private/Remove blanket AI crawler blocks and replace with selective path restrictions
Frequently Asked Questions
Should I block any AI crawlers for legitimate reasons?
In rare cases, yes. If you have pages with genuinely sensitive data that shouldn't appear in AI responses (internal tools, patient portals, financial dashboards), block those specific paths rather than blocking entire crawlers. Never block crawlers from your public marketing, blog, or product pages.
Will unblocking AI crawlers expose my content for AI training?
There's a distinction between crawling for search/discovery and crawling for model training. OpenAI's GPTBot documentation states that blocking GPTBot prevents content from being used in training. However, many AI companies maintain separate crawlers for training vs. search. We recommend allowing search-oriented crawling while monitoring the evolving policies of each AI provider.
Check Your AI Readiness Score
Free scan. No signup required. See how AI engines like ChatGPT, Perplexity, and Google AI view your website.
Scan Your Site FreeSEO veteran with 15+ years leading digital performance at 888 Holdings, Catena Media, Betsson Group, and Evolution. Now building the AI readiness standard for the web.
Related Articles
How to Fix Your robots.txt for AI Crawlers (5-Minute Guide)
Over 40% of websites accidentally block AI crawlers. Here is exactly how to fix your robots.txt in under 5 minutes, with templates for every major platform.
Data & ResearchWe Scanned 5,000 Websites for AI Readiness. The Results Are Alarming.
73% of websites are invisible to AI. We scanned 5,000 sites across 14 industries and the data reveals a massive readiness gap that most businesses don't even know exists.
Data & ResearchAI Protocol Adoption: Where the Web Stands in March 2026
We measured adoption rates for llms.txt, NLWeb, and MCP across 5,000 websites. The numbers are tiny but growing fast, with llms.txt doubling since December 2025.