BlogData & Research

Data & ResearchFebruary 15, 20269 min

87% of Websites Block AI Crawlers Without Knowing It

38% of websites block at least one major AI crawler in their robots.txt, and most don't realize it. Our scan reveals which bots are blocked most and which industries are most restrictive.

Eitan Gorodetsky

Founder & CEO at AgentReady

Invisible by Accident: The Blocking Epidemic

The title says 87% and I need to clarify: 38% of websites actively block at least one major AI crawler through explicit robots.txt directives. The remaining 49% that make up the 87% don't block crawlers directly, but serve content in ways that AI crawlers can't effectively parse -- JavaScript rendering, login walls, or broken sitemaps.

But the intentional blocking is the story. When we built AgentReady™ and started scanning robots.txt files across 5,000 sites, I expected to find a handful of sites with outdated configurations. Instead, we found a systemic pattern: sites are blocking AI crawlers they don't know exist, using rules they didn't write, from templates they copied years ago.

12% of sites block every major AI crawler -- GPTBot, ClaudeBot, PerplexityBot, and others. These sites have made themselves completely invisible to AI-powered search and recommendation systems. Many are businesses that would benefit enormously from AI visibility. They just don't know they've opted out.

38%

Of websites block at least one major AI crawler

Which AI Crawlers Get Blocked Most

Not all AI crawlers face equal resistance. The blocking rates from our dataset reveal clear patterns.

GPTBot is blocked most frequently at 22%. This makes sense: GPTBot was one of the first AI crawlers to receive widespread attention, and many of the early robots.txt templates that circulated in 2024 specifically targeted it. When site owners Googled "how to block AI from scraping my site," GPTBot was the primary target.

ClaudeBot is blocked on 19% of sites. Anthropic's crawler is newer but was quickly added to blocking templates after GPTBot. Many of the sites that block GPTBot also block ClaudeBot, suggesting they used the same template or plugin.

PerplexityBot is blocked on 8%, Google-Extended on 7%, and CCBot (Common Crawl) on 15%. The lower rate for PerplexityBot likely reflects its lower public profile rather than intentional acceptance.

What's striking is the correlation: of sites that block any AI crawler, 72% block two or more. Blocking tends to be all-or-nothing rather than selective, which suggests it's driven by templates and plugins rather than deliberate policy decisions.

AI Crawler Blocking Rates Across 5,000 Sites

Industries That Block Most: Finance and Healthcare Lead

Bot blocking rates vary dramatically by industry. Finance leads at 52% of sites blocking at least one AI crawler. Healthcare follows at 48%. Both industries have legitimate regulatory concerns about data exposure, but the blocking often extends far beyond sensitive content to include marketing pages, blog posts, and educational resources that would benefit from AI visibility.

The irony is thick. Financial advisors and healthcare providers who publish helpful content specifically to attract potential clients are simultaneously preventing AI systems from discovering and recommending that content. A financial planning firm that blocks GPTBot loses visibility in every ChatGPT conversation where someone asks for financial planning advice.

Tech/SaaS has the lowest blocking rate at 14%, followed by Media & Publishing at 18%. These industries were fastest to recognize AI traffic as valuable. Several major publishers who initially blocked AI crawlers in 2024 reversed course in 2025 after seeing the referral traffic data.

Our robots.txt guide shows how to configure selective access that addresses security concerns without sacrificing AI visibility.

Finance: 52% block at least one AI crawler
Healthcare: 48%
Legal: 41%
Real Estate: 38%
Education: 32%
E-Commerce: 28%
B2B Services: 24%
Travel: 22%
Media & Publishing: 18%
Tech/SaaS: 14%

Where These Blocks Come From

We categorized the source of AI crawler blocks and found three primary origins.

Security plugins and CDN defaults (42% of blocks). Services like Sucuri, Wordfence, and some Cloudflare configurations include AI crawler blocking in their default or recommended security settings. Site owners enable "bot protection" without realizing it includes legitimate AI crawlers alongside malicious bots.

Copied robots.txt templates (35% of blocks). Many sites use robots.txt files copied from online guides, Stack Overflow answers, or CMS templates that include AI crawler blocks. These templates spread virally in 2024 when the dominant narrative was about protecting content from AI training. The templates persist long after the conversation has evolved.

CMS platform defaults (23% of blocks). Some CMS platforms ship with robots.txt configurations that restrict AI crawlers. Shopify's default robots.txt, for example, blocks several paths that AI crawlers would otherwise index. Wix has historically been restrictive in its centrally managed robots.txt.

The common thread is that most blocks are inherited, not intentional. When we surveyed 200 site owners whose sites blocked AI crawlers, 78% were unaware of the blocks. Only 8% had deliberately chosen to block AI access.

78%

Of site owners were unaware they blocked AI crawlers

The Score Impact: Blocking Costs More Than You Think

Bot Access & Crawlability carries the highest weight in our scoring framework at 25%, and for good reason. Blocking AI crawlers doesn't just reduce your score -- it caps it.

Sites that block all AI crawlers have an average overall score of 31, regardless of how well they perform in other categories. You can have perfect schema markup, excellent content, and strong authority signals, but if AI agents can't reach your pages, none of it matters.

Sites that block some crawlers but not all average 48, a 17-point penalty compared to sites with no blocks (average 62 among non-blocking sites). Even partial blocking is expensive because different AI systems use different crawlers, and blocking any of them reduces your reach.

The most important metric: unblocking AI crawlers is worth an average of +14 points to your overall score, making it the single highest-ROI fix in our entire framework. It costs nothing, takes 5 minutes, and has no downside for 95% of websites.

See the full walkthrough in our robots.txt AI crawler guide.

How to Audit Your Own robots.txt in 60 Seconds

You can check right now. Navigate to yourdomain.com/robots.txt in your browser and search for these user-agent strings: GPTBot, ClaudeBot, PerplexityBot, CCBot, Google-Extended, Bytespider, Amazonbot.

If you see any of these followed by Disallow: /, that crawler is blocked from your entire site. If you see them with specific path restrictions, they're partially blocked.

For a complete AI readiness audit that goes beyond robots.txt to check schema, protocols, and content quality, use our free scanning tool. It takes 30 seconds and gives you a full breakdown across all six scoring categories.

text

# Example: robots.txt that blocks AI crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# Fix: Replace with selective access
User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /private/

User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /private/

Remove blanket AI crawler blocks and replace with selective path restrictions

Frequently Asked Questions

Should I block any AI crawlers for legitimate reasons?

In rare cases, yes. If you have pages with genuinely sensitive data that shouldn't appear in AI responses (internal tools, patient portals, financial dashboards), block those specific paths rather than blocking entire crawlers. Never block crawlers from your public marketing, blog, or product pages.

Will unblocking AI crawlers expose my content for AI training?

There's a distinction between crawling for search/discovery and crawling for model training. OpenAI's GPTBot documentation states that blocking GPTBot prevents content from being used in training. However, many AI companies maintain separate crawlers for training vs. search. We recommend allowing search-oriented crawling while monitoring the evolving policies of each AI provider.

Check Your AI Readiness Score

Free scan. No signup required. See how AI engines like ChatGPT, Perplexity, and Google AI view your website.

Scan Your Site Free

Eitan GorodetskyFounder & CEO

SEO veteran with 15+ years leading digital performance at 888 Holdings, Catena Media, Betsson Group, and Evolution. Now building the AI readiness standard for the web.

15+ Years in SEO & Digital PerformanceDirector of Digital Performance at Betsson Group (20+ brands)Conference Speaker: SIGMA, SBC, iGaming NEXT

LinkedIn Website

Guides

How to Fix Your robots.txt for AI Crawlers (5-Minute Guide)

Over 40% of websites accidentally block AI crawlers. Here is exactly how to fix your robots.txt in under 5 minutes, with templates for every major platform.

Data & Research

We Scanned 5,000 Websites for AI Readiness. The Results Are Alarming.

73% of websites are invisible to AI. We scanned 5,000 sites across 14 industries and the data reveals a massive readiness gap that most businesses don't even know exists.

Data & Research

AI Protocol Adoption: Where the Web Stands in March 2026

We measured adoption rates for llms.txt, NLWeb, and MCP across 5,000 websites. The numbers are tiny but growing fast, with llms.txt doubling since December 2025.

87% of Websites Block AI Crawlers Without Knowing It

38% of websites block at least one major AI crawler in their robots.txt, and most don't realize it. Our scan reveals which bots are blocked most and which industries are most restrictive.

Eitan Gorodetsky

Founder & CEO at AgentReady

Invisible by Accident: The Blocking Epidemic

38%

Of websites block at least one major AI crawler

Which AI Crawlers Get Blocked Most

Not all AI crawlers face equal resistance. The blocking rates from our dataset reveal clear patterns.

AI Crawler Blocking Rates Across 5,000 Sites

Industries That Block Most: Finance and Healthcare Lead

Our robots.txt guide shows how to configure selective access that addresses security concerns without sacrificing AI visibility.

Finance: 52% block at least one AI crawler
Healthcare: 48%
Legal: 41%
Real Estate: 38%
Education: 32%
E-Commerce: 28%
B2B Services: 24%
Travel: 22%
Media & Publishing: 18%
Tech/SaaS: 14%

Where These Blocks Come From

We categorized the source of AI crawler blocks and found three primary origins.

78%

Of site owners were unaware they blocked AI crawlers

The Score Impact: Blocking Costs More Than You Think

Bot Access & Crawlability carries the highest weight in our scoring framework at 25%, and for good reason. Blocking AI crawlers doesn't just reduce your score -- it caps it.

See the full walkthrough in our robots.txt AI crawler guide.

How to Audit Your Own robots.txt in 60 Seconds

If you see any of these followed by Disallow: /, that crawler is blocked from your entire site. If you see them with specific path restrictions, they're partially blocked.

text

# Example: robots.txt that blocks AI crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# Fix: Replace with selective access
User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /private/

User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /private/

Remove blanket AI crawler blocks and replace with selective path restrictions

Frequently Asked Questions

Should I block any AI crawlers for legitimate reasons?

Will unblocking AI crawlers expose my content for AI training?

Check Your AI Readiness Score

Free scan. No signup required. See how AI engines like ChatGPT, Perplexity, and Google AI view your website.

Scan Your Site Free

Eitan GorodetskyFounder & CEO

SEO veteran with 15+ years leading digital performance at 888 Holdings, Catena Media, Betsson Group, and Evolution. Now building the AI readiness standard for the web.

15+ Years in SEO & Digital PerformanceDirector of Digital Performance at Betsson Group (20+ brands)Conference Speaker: SIGMA, SBC, iGaming NEXT

LinkedIn Website

Guides

87% of Websites Block AI Crawlers Without Knowing It

Invisible by Accident: The Blocking Epidemic

Which AI Crawlers Get Blocked Most

AI Crawler Blocking Rates Across 5,000 Sites

Industries That Block Most: Finance and Healthcare Lead

Where These Blocks Come From

The Score Impact: Blocking Costs More Than You Think

How to Audit Your Own robots.txt in 60 Seconds

Frequently Asked Questions

Should I block any AI crawlers for legitimate reasons?

Will unblocking AI crawlers expose my content for AI training?

Check Your AI Readiness Score

Related Articles

How to Fix Your robots.txt for AI Crawlers (5-Minute Guide)

We Scanned 5,000 Websites for AI Readiness. The Results Are Alarming.

AI Protocol Adoption: Where the Web Stands in March 2026

Related Documentation

87% of Websites Block AI Crawlers Without Knowing It

Invisible by Accident: The Blocking Epidemic

Which AI Crawlers Get Blocked Most

AI Crawler Blocking Rates Across 5,000 Sites

Industries That Block Most: Finance and Healthcare Lead

Where These Blocks Come From

The Score Impact: Blocking Costs More Than You Think

How to Audit Your Own robots.txt in 60 Seconds

Frequently Asked Questions

Should I block any AI crawlers for legitimate reasons?

Will unblocking AI crawlers expose my content for AI training?

Check Your AI Readiness Score

Related Articles

How to Fix Your robots.txt for AI Crawlers (5-Minute Guide)

We Scanned 5,000 Websites for AI Readiness. The Results Are Alarming.

AI Protocol Adoption: Where the Web Stands in March 2026

Related Documentation