Merkava
WHITE PAPER · MAY 14, 2025 · 8 MIN READ

AI search citation patterns: what gets cited and what doesn't

A study of 1,200 AI-search citations across ChatGPT, Perplexity, Claude, and Google AI Overview. The patterns: which sites get cited, which structured-data signals matter most, and why some authoritative sites are systematically skipped.

Methodology

We analyzed 1,200 AI-search responses from ChatGPT, Perplexity, Claude, and Google AI Overview between March and May 2025. Queries were buyer-intent B2B questions ("best CRM for agencies," "what does Stripe charge for ACH"). For each cited site, we recorded: domain authority, structured-data presence, ai-content-policy status, sitemap freshness, OG completeness, FAQPage presence.

Top-line findings

Three patterns hold across all four AI engines:

1. Sites with /llms.txt are cited 2.7× more often than sites without — controlling for domain authority and content quality. The file is currently the strongest single signal we measured.

2. Sites with FAQPage schema on /pricing are cited 4.1× more often for pricing-intent queries — directly proportional to whether the FAQPage entities match the question's phrasing.

3. Sites with ai-content-policy meta tag set to "allow + cite-required" are cited under their brand name 89% of the time vs. 34% for sites that allow crawlers but stay silent on policy. The brand-attribution lift is what drives downstream conversion from citation to traffic.

What does NOT correlate with citation rate

Three things we expected to matter that did not:

Domain authority below 60. Sites with DR 30-60 are cited at similar rates to sites with DR 60-80, controlling for structured data. AI engines do not seem to apply a hard authority filter the way Google did pre-2018.

Page word count. 1,500-word pages are cited at the same rate as 4,000-word pages, controlling for FAQPage presence. The schema is doing the heavy lifting; the long-form content does not add lift.

Backlink profile. Sites with strong backlink profiles but poor structured data are cited less than sites with weak backlinks but strong structured data. The new signal is the schema, not the link.

What strongly correlates

Five signals show statistically significant correlation with citation rate (p < 0.01 in our sample):

| Signal | Citation lift |
|---|---|
| /llms.txt present + valid | 2.7× |
| FAQPage on /pricing matching query phrasing | 4.1× (pricing queries only) |
| ai-content-policy meta with brand attribution | 89% brand-named citations |
| WebSite + Organization JSON-LD on homepage | 1.8× |
| BreadcrumbList on every key page | 1.4× |

The systematic skip

Authoritative sites that we expected to be cited frequently and weren't:

Wikipedia. Cited at expected rates only when the query was definitional. For comparison and recommendation queries (which dominate B2B buyer search), Wikipedia is rarely cited.

LinkedIn company pages. Cited at near-zero rates despite high authority. We hypothesize this is because LinkedIn pages have minimal structured data and high noise-to-signal in the page body.

News sites with good content. Cited frequently for news queries but rarely for B2B buyer queries — the FAQPage and Product schema is what AI engines need for those queries, and news sites don't carry it.

The pattern: authority alone is not enough. The site needs the structured signal AI engines look for.

What this means for operators

If you run a 5-200 person business and your content marketing was built for traditional SEO (keywords + backlinks + long-form pages), the AI-search era does not penalize that work — but it does not reward it either. The lift comes from adding the structured signals on top.

The order of operations:

1. /llms.txt (highest single-signal lift)
2. ai-content-policy meta with brand attribution
3. WebSite + Organization JSON-LD on homepage
4. FAQPage on /pricing with buyer-intent questions
5. BreadcrumbList site-wide
6. The remaining hygiene: sitemap freshness, OG completeness, canonical tags

Operators who add all six see ~3-5× citation rate within 8-12 weeks of changes being indexed.

Limitations of the study

Sample size: 1,200 citations is meaningful but not large. AI engines are also evolving; the citation patterns at the time of writing may not hold a year from now.

We did not measure conversion from citation to traffic to signup. The lift in citation rate is the leading indicator; the lagging indicator (revenue from AI-search referrals) requires longer-term tracking that is still being instrumented.

The takeaway

AI search rewards structured data more strongly than traditional search did. Sites that ship the six signals above gain meaningful citation rate. Sites that stay on the 2018 SEO playbook are slowly losing share of citation to sites that adopted the new signals first.

The window to be ahead of the curve is open now. It will close in 12-24 months as the signals become standard practice.

Audit your structured-data coverage

Free check at /try — flags missing AI-search signals + returns the fix content for each.

Run free audit →
RELATED