Crawl Health
TL;DR
Crawl Health (7% weight) evaluates technical fundamentals like HTTP status codes, redirect chains, sitemap accuracy, and page indexability. Poor crawl health triggers a score floor that caps your total.
Last updated: 2026-03-09
What It Measures#
The Crawl Health factor examines the technical reliability of your site from a crawler's perspective. It checks HTTP status codes across your pages, flagging 4xx errors (pages not found), 5xx errors (server failures), and redirect issues (chains, loops, or incorrect targets). It evaluates your XML sitemap for accuracy: whether listed URLs are actually accessible, whether the sitemap includes all important pages, and whether
lastmod dates are accurate. The factor also checks for duplicate content issues such as non-canonical pages competing with originals, mixed HTTP/HTTPS versions of the same page, and trailing-slash inconsistencies. Every one of these issues represents a potential failure point where an AI crawler encounters something unexpected and may skip your content.Why It Matters for AI#
AI crawlers are efficient but not forgiving. When a crawler encounters a 404 error, a redirect loop, or a server timeout, it does not retry indefinitely. It moves on. If your sitemap lists 1,000 pages but 200 of them return errors, the crawler's trust in your sitemap degrades. It may reduce crawl frequency or skip your site in future cycles. At 7% of your total score, this factor has a modest direct weight. But like speed, its real danger is in triggering a score floor. Widespread crawl health problems cap your entire score regardless of how well you perform on other factors. Clean crawl health is table stakes for AI readiness. It is not glamorous, but it is foundational. See How Scoring Works for the complete factor model.
How to Check Yours#
Run a site crawl using a tool like Screaming Frog, Sitebulb, or Ahrefs Site Audit. Look for pages returning 4xx or 5xx status codes, redirect chains longer than two hops, and redirect loops. Compare your XML sitemap against actual crawl results. Are there URLs in the sitemap that return errors? Are there important pages missing from the sitemap? Check for canonical tag issues: pages with self-referencing canonicals that point to the wrong URL, or pages without canonical tags that create duplicate content signals. Your AgentReady™ scan reports crawl health issues organized by severity, making it easy to prioritize fixes.
Example: Clean XML Sitemap Entry
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/guides/ai-readiness</loc>
<lastmod>2026-03-09</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>https://example.com/product</loc>
<lastmod>2026-03-01</lastmod>
<changefreq>monthly</changefreq>
<priority>1.0</priority>
</url>
</urlset>xml
How to Improve#
Start with the highest-severity issues. Fix or remove pages returning 5xx server errors — these indicate infrastructure problems that affect AI crawler reliability. Set up proper 301 redirects for removed pages instead of letting them return 404s. Shorten redirect chains to a single hop where possible. Update your XML sitemap to include only pages that return 200 status codes and are intended for indexing. Remove URLs that redirect, return errors, or have
noindex tags. Ensure your sitemap lastmod dates accurately reflect when content was last meaningfully updated. Implement consistent canonical tags across your site. Choose a canonical URL pattern (with or without www, with or without trailing slash) and enforce it site-wide. Monitor crawl health continuously, not just during audits. Set up alerts for new 5xx errors and broken redirects. For related improvements, review Bot Access and Speed.Related Pages
Frequently Asked Questions
How many errors are too many?
There is no single threshold. The factor evaluates the ratio of healthy pages to problematic ones. A site with 1,000 pages and 5 errors is in good shape. A site with 100 pages and 30 errors has a serious crawl health problem. Focus on bringing the error ratio as close to zero as practical.
Do soft 404s count as errors?
Yes. A soft 404 is a page that returns a 200 status code but displays error-like content. These are problematic because they waste crawler resources and confuse AI systems about which pages have real content. Fix them by returning proper 404 status codes or redirecting to relevant pages.
How often should I update my sitemap?
Your sitemap should update automatically whenever pages are added, removed, or significantly changed. If you use a CMS, most SEO plugins handle this automatically. For custom sites, regenerate the sitemap as part of your deployment process. Stale sitemaps with inaccurate lastmod dates reduce crawler trust.
Was this page helpful?