984 Sites, 12 Industries: What Actually Predicts AI Citations?
We built a scoring framework, ran it on 984 websites, then checked which ones actually get cited by AI. The correlation is real — but only in specific industries. Here's what we found.
Founder & CEO at AgentReady
The Honest Question We Had to Answer
When we built the AgentReady scoring framework, we made a bet: that the factors we measure — schema markup, bot access, content quality, AI protocols, authority signals, crawl efficiency — actually correlate with whether AI systems cite a website. That's the whole thesis. And we needed to test it.
So we ran a proper study. 984 websites across 12 industries. We scored each one, then used a panel of 87 AI-generated queries to check which sites actually appeared in AI responses (ChatGPT, Perplexity, and Claude). We calculated Spearman rank correlation coefficients between readiness scores and citation rates for each industry.
The results were more nuanced than we expected — and we're going to share them with complete transparency, including the parts that complicated our thesis.
The Overall Finding: r=0.025 (Weak, But Not the Full Story)
Across all 984 sites and all 12 industries combined, the Spearman correlation between AI readiness score and citation rate is r=0.025. That's statistically negligible.
If you stopped there, you'd conclude our scoring framework predicts nothing. But stopping there would be wrong — because the aggregate masks dramatic industry-level variation. When we broke out the data by sector, the picture changed completely.
The key insight from our v3 study is this: current AI citations are driven primarily by brand recognition (how well-known a site is from training data), not by technical readiness. The more established a brand, the more AI systems cite it — regardless of schema markup, llms.txt, or crawl optimization. This is the Brand Fame Effect, and it explains why Wikipedia, WebMD, and Investopedia get cited constantly despite not being technically 'AI-ready' by our criteria.
But brand recognition isn't the whole story either.
Where Correlation Is Real: The YMYL Industries
When we isolated YMYL (Your Money or Your Life) industries — sectors where AI systems apply extra source quality scrutiny — the correlations jumped dramatically.
Healthcare: ρ=0.72 — By far the strongest. Medical sites with strong E-E-A-T signals (author credentials, clinical citations, schema markup) are cited significantly more often than technically weaker peers. AI systems are most cautious about medical misinformation, so they reward trust signals.
Government: ρ=0.40 — Second highest. Government sites with clear entity signals, structured service data, and accessible content are cited more reliably. Regulatory authority matters.
Education: ρ=0.35 — Strong in the .edu domain, where institutional credibility intersects with technical readiness.
Insurance: ρ=0.33 — Similar pattern to healthcare, with policy structure and trust signals driving citation rates.
Finance: ρ=0.20 — Meaningful but more modest, reflecting the complexity of financial information and AI systems' caution about specific recommendations.
For all other industries (tech, e-commerce, media, travel, etc.), correlations were below r=0.15 — weak enough that brand recognition likely explains the residual.
Spearman Correlation: AI Readiness Score vs. Citation Rate
Why YMYL Industries Are the Exception
The strong correlations in healthcare, government, and education aren't accidental. AI systems apply a quality filter called E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) when generating responses in high-stakes domains. That filter is more legible for AI systems when sites have structured signals — author credentials, institutional affiliations, citations, schema markup.
In other words: in YMYL industries, technical readiness acts as a proxy for credibility. Schema markup isn't just a crawlability signal — it's a trust signal. E-E-A-T signals aren't just SEO factors — they're the machine-readable version of authority that AI citation systems rely on.
For brands in healthcare, government, education, insurance, and finance: technical AI readiness is already delivering measurable citation advantages. The correlation data makes this actionable, not hypothetical.
What This Means for Your AI Strategy
The study doesn't invalidate technical AI readiness — it contextualizes it. Here's what we conclude:
If you're in a YMYL industry: AI readiness improvements have a direct, measurable impact on citation rates today. Healthcare, government, education, insurance, and finance organizations should treat AI readiness as a top-priority initiative.
If you're in a non-YMYL industry: Technical readiness is about positioning for the future, not capturing citations today. As AI systems shift to real-time crawl-based citation, brand recognition will matter less and technical signals will matter more. Sites investing now will have a structural advantage when that shift accelerates.
For everyone: The 95+ score premium suggests that once brand recognition and technical optimization combine, citation rates jump meaningfully. Smaller brands that achieve technical excellence may punch above their weight — especially in industries where dominant brands don't maintain clean technical signals.
You can explore the full dataset and download the raw correlation data on our /research page.
Frequently Asked Questions
How was the citation check conducted?
We used a panel of 87 industry-specific queries across ChatGPT, Perplexity, and Claude. For each site in the study, we checked whether the domain or brand was mentioned in AI responses to relevant queries. Citation rate is defined as the percentage of applicable queries where the site was mentioned.
Why is the overall correlation so weak?
Current AI citations are heavily influenced by brand recognition from training data — how well-known a site is. Well-known brands get cited regardless of technical readiness. The correlation becomes meaningful only in YMYL industries, where AI systems apply stricter source quality filters that technical signals help satisfy.
Does this mean AI readiness scoring is useless for non-YMYL sites?
No. As AI systems shift from training-data citations to real-time crawl-based citation (already underway with Perplexity and Google AI Overviews), technical readiness will increasingly drive visibility for all industries. Sites investing in technical AI readiness now are positioning for that shift — not optimizing for the current state.
Check Your AI Readiness Score
Free scan. No signup required. See how AI engines like ChatGPT, Perplexity, and Google AI view your website.
Scan Your Site FreeSEO veteran with 15+ years leading digital performance at 888 Holdings, Catena Media, Betsson Group, and Evolution. Now building the AI readiness standard for the web.
Related Articles
We Scanned 5,000 Websites for AI Readiness. The Results Are Alarming.
73% of websites are invisible to AI. We scanned 5,000 sites across 14 industries and the data reveals a massive readiness gap that most businesses don't even know exists.
OpinionThe Brand Fame Paradox: Why Famous Sites Get AI Citations Without Being Ready
We ran the study. The honest answer: overall, AI readiness scores have a weak correlation with AI citations (r=0.025). Famous brands dominate. But the story isn't that simple — and there's a clear path for everyone else.
GuidesThe Complete Guide to Making Your Website AI-Ready in 2026
Everything you need to know about making your website visible to AI systems in 2026 — the 8 factors that determine whether AI agents cite your content or skip it entirely.