How to Fix Your robots.txt for AI Crawlers (5-Minute Guide)
Over 40% of websites accidentally block AI crawlers. Here is exactly how to fix your robots.txt in under 5 minutes, with templates for every major platform.
Founder & CEO at AgentReady
The Problem: You Are Probably Blocking AI Traffic
Here is a fact that surprises most website owners: your robots.txt file is very likely blocking one or more major AI crawlers right now, even if you never intended to.
Our research on AI crawler blocking found that over 40% of the top 100K websites block at least one major AI crawler. The most common cause is not deliberate blocking — it is overly restrictive default rules.
Many robots.txt files include a blanket User-agent: * rule with restrictive Disallow patterns. Others use CDN or security features that aggressively block unknown bots. Some CMS platforms ship with restrictive defaults that were written before AI crawlers existed.
The result: your content is invisible to ChatGPT, Claude, Perplexity, and other AI systems. You are missing citations, traffic, and the compound visibility benefits that come from being part of AI-generated responses.
The fix takes under 5 minutes. This guide walks through exactly what to do, with copy-paste templates and platform-specific instructions.
The 5 AI Crawlers You Need to Allow
Each major AI platform operates its own web crawler. These bots identify themselves with specific user agent strings that you can explicitly allow or block in your robots.txt. Here are the five that matter most in 2026:
GPTBot is OpenAI’s crawler, used by ChatGPT, ChatGPT Plus, and the ChatGPT API. It is the highest-volume AI crawler and the one most commonly blocked. Blocking GPTBot means your content cannot appear in ChatGPT responses.
ClaudeBot is Anthropic’s crawler, used by Claude and Claude-based applications. It follows robots.txt directives strictly and is considered one of the most respectful AI crawlers in terms of request frequency.
PerplexityBot powers Perplexity AI, the AI-first search engine. Perplexity explicitly cites its sources, so being accessible to PerplexityBot directly translates to visible, linked citations.
Google-Extended is Google’s AI-specific crawler, separate from Googlebot. It feeds content to Google’s Gemini and AI Overviews. Importantly, blocking Google-Extended does not affect your regular Google Search rankings — it only affects whether your content appears in AI-generated responses.
CCBot is the Common Crawl bot. Common Crawl’s open dataset is used as training data by many AI systems. Allowing CCBot ensures your content is represented in these foundational datasets.
- GPTBot — OpenAI (ChatGPT, API) — highest volume, most commonly blocked
- ClaudeBot — Anthropic (Claude) — respectful crawl rate, strict robots.txt compliance
- PerplexityBot — Perplexity AI — direct source citations with links
- Google-Extended — Google (Gemini, AI Overviews) — separate from Googlebot
- CCBot — Common Crawl — open dataset used by many AI training pipelines
Anatomy of an AI-Friendly robots.txt
A robots.txt file is a plain text file at your domain root (yoursite.com/robots.txt) that tells web crawlers which pages they can and cannot access. It uses a simple directive syntax: User-agent identifies the bot, Allow grants access, and Disallow restricts it.
The diagram below shows the anatomy of an AI-friendly robots.txt file. The key principle is explicit allows for AI crawlers combined with sensible disallows for sensitive paths. Do not rely on a blanket User-agent: * rule for AI bots — always specify them individually.
robots.txt Anatomy for AI Crawlers
The Copy-Paste AI-Friendly robots.txt Template
Here is a complete, production-ready robots.txt template. Copy this, customize the disallow paths for your site, and deploy. This template explicitly allows all five major AI crawlers while maintaining sensible restrictions on sensitive paths.
The template is organized in three sections: global rules that apply to all bots, explicit AI crawler rules, and metadata (sitemap location). Adjust the Disallow paths under User-agent: * to match your site’s actual directory structure.
# ============================================
# robots.txt - AI-Friendly Configuration
# Last Updated: 2026-03-06
# ============================================
# Default rules for all crawlers
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/
Disallow: /staging/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /*.json$
# ============================================
# AI Crawler Access - Explicitly Allowed
# ============================================
# OpenAI (ChatGPT)
User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/
# Anthropic (Claude)
User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/
# Perplexity AI
User-agent: PerplexityBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/
# Google AI (Gemini, AI Overviews)
User-agent: Google-Extended
Allow: /
# Common Crawl (AI training data)
User-agent: CCBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/
# ============================================
# Sitemap
# ============================================
Sitemap: https://yoursite.com/sitemap.xmlComplete AI-friendly robots.txt template
Platform-Specific Deployment Instructions
The process for editing robots.txt varies by platform. Here are step-by-step instructions for the most common setups.
WordPress: Navigate to your site root via FTP or file manager. The robots.txt file is in the root directory (same level as wp-config.php). If it does not exist, create it. If you use Yoast SEO, go to Yoast → Tools → File Editor → robots.txt. If you use Rank Math, go to Rank Math → General Settings → Edit robots.txt. After editing, clear any page cache.
Shopify: Shopify auto-generates robots.txt and does not allow direct editing via the admin panel. To customize it, go to Online Store → Themes → Edit Code → Add a new template called robots.txt.liquid. Shopify’s documentation provides the syntax for adding custom rules. This override approach lets you add AI crawler allows while keeping Shopify’s default rules.
Vercel / Next.js: Place a robots.txt file in your public/ directory. For dynamic generation, create an API route at app/robots.txt/route.ts that returns the content with text/plain content type.
Cloudflare Pages: Place the file in your project’s build output root. Alternatively, use a Cloudflare Worker to dynamically serve robots.txt content.
Apache (.htaccess): If you need to redirect or override robots.txt, add a rewrite rule in your .htaccess file. Ensure the rule serves the correct file with text/plain content type.
Nginx: Serve robots.txt with a location block in your nginx configuration that points to the file path and sets the correct content type.
// app/robots.txt/route.ts
import { NextResponse } from "next/server";
export function GET() {
const robotsTxt = `User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: CCBot
Allow: /
Sitemap: https://yoursite.com/sitemap.xml`;
return new NextResponse(robotsTxt, {
headers: { "Content-Type": "text/plain" },
});
}Next.js App Router dynamic robots.txt
How to Verify Your Changes Are Working
After deploying your updated robots.txt, verify it in three ways:
Step 1: Direct access. Navigate to yoursite.com/robots.txt in your browser. Confirm the AI crawler rules appear as expected. Check for syntax errors — extra spaces, missing colons, or incorrect indentation can cause parsing failures.
Step 2: Header verification. Use curl -I yoursite.com/robots.txt to check the response headers. The status code should be 200 and the content type should be text/plain. A 301 redirect to a different file or a 403 forbidden response means something is blocking access.
Step 3: Robots.txt tester. Use Google Search Console’s robots.txt Tester (under the legacy tools) to check whether specific user agents are blocked or allowed for specific paths. Test each AI crawler user agent manually.
Step 4: CDN and firewall check. If you use Cloudflare, check your WAF and Bot Fight Mode settings. Some CDN configurations aggressively challenge or block bots regardless of robots.txt. Cloudflare’s Bot Fight Mode in particular is known to block legitimate AI crawlers. You may need to create a WAF rule that allows specific user agents.
Monitor your server logs over the following 7–14 days to confirm AI crawlers are actually visiting. If you allow GPTBot but never see it in your logs, there may be a firewall or CDN rule intercepting requests before they reach your server.
# 1. Check the file is accessible
curl -s -o /dev/null -w "%{http_code}" https://yoursite.com/robots.txt
# Expected: 200
# 2. Check content type headers
curl -I https://yoursite.com/robots.txt 2>/dev/null | grep -i content-type
# Expected: content-type: text/plain
# 3. View the actual content
curl https://yoursite.com/robots.txt
# 4. Check server logs for AI crawler visits (Linux)
grep -E "GPTBot|ClaudeBot|PerplexityBot" /var/log/nginx/access.log | tail -20Quick verification commands
Beyond robots.txt: CDN and Firewall Considerations
robots.txt is necessary but not sufficient. Even if your robots.txt explicitly allows AI crawlers, other layers of your infrastructure might block them.
Cloudflare Bot Fight Mode is the most common offender. When enabled, it challenges or blocks bots that are not on Cloudflare’s verified bot list. As of early 2026, not all AI crawlers are on this list. Check your Cloudflare dashboard under Security → Bots and ensure your Bot Fight Mode settings are not interfering. Consider adding a custom WAF rule that skips the challenge for known AI user agents.
AWS WAF and CloudFront can similarly block bots based on user agent patterns. If you use AWS, check your WAF rules for any broad bot-blocking patterns.
Rate limiting is another consideration. AI crawlers are generally well-behaved, but if your rate limiting is too aggressive (e.g., 10 requests per minute per IP), you may inadvertently throttle them. Check your CDN and server rate limiting configuration.
Rendering gates are the final barrier. If your content is behind a JavaScript rendering wall and you do not have server-side rendering, AI crawlers (which typically do not execute JavaScript) will see an empty or skeleton page. Use server-side rendering or static generation for all content pages. Our complete AI readiness guide covers this in depth under the page speed factor.
After fixing robots.txt, run a free AgentReady scan to check for these additional access barriers. The Bot Access factor evaluates all layers, not just robots.txt.
- Cloudflare: Check Bot Fight Mode, create allow rules for AI user agents
- AWS WAF: Review bot detection rules for overly broad patterns
- Rate limiting: Ensure limits are not too restrictive for crawler traffic
- JavaScript rendering: Verify content is accessible without JS execution
- Geo-blocking: Confirm AI crawler IPs are not blocked by country-based rules
Frequently Asked Questions
Will allowing AI crawlers affect my site security?
No. AI crawlers only access publicly available pages, the same as any search engine crawler. They cannot access password-protected content, admin panels, or authenticated areas. Allowing them is no different from allowing Googlebot.
Can I allow AI crawlers but block specific pages?
Yes. Use the same Disallow rules you would use for any crawler. For example, you can allow GPTBot on your site but disallow /admin/ or /private/ paths. The syntax is identical to Googlebot rules.
How do I know if AI crawlers are visiting my site?
Check your server access logs for these user agent strings: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and CCBot. Most hosting control panels and analytics tools can filter by user agent.
Should I block any AI crawlers?
Only if you have a specific legal or business reason, such as protecting copyrighted training data. For most websites, the traffic and citation benefits of allowing AI crawlers far outweigh any concerns. Our recommendation is to allow all major AI crawlers by default.
Check Your AI Readiness Score
Free scan. No signup required. See how AI engines like ChatGPT, Perplexity, and Google AI view your website.
Scan Your Site FreeSEO veteran with 15+ years leading digital performance at 888 Holdings, Catena Media, Betsson Group, and Evolution. Now building the AI readiness standard for the web.
Related Articles
87% of Websites Block AI Crawlers Without Knowing It
38% of websites block at least one major AI crawler in their robots.txt, and most don't realize it. Our scan reveals which bots are blocked most and which industries are most restrictive.
GuidesThe Complete Guide to Making Your Website AI-Ready in 2026
Everything you need to know about making your website visible to AI systems in 2026 — the 8 factors that determine whether AI agents cite your content or skip it entirely.
GuidesHow to Create the Perfect llms.txt File (With Templates)
The llms.txt file tells AI models what your site is about and where to find key content. Here is exactly how to create one, with copy-paste templates for every site type.