Merkava
BLOG · MAY 2 2026 · 7 MIN READ

The 10 SEO fixes most sites are missing for AI search

When ChatGPT, Perplexity, Claude, or Google AI Overview decide which sites to cite, they look at specific structured-data signals. Most websites have fewer than half of them. Here are the 10 fixes that move the needle, ranked by AI-search impact.

The way websites get found changed in the last 18 months and most operators haven't updated their sites for it.

Search used to be: type keywords, click a link. Now a meaningful share of buying-intent queries happens inside ChatGPT, Perplexity, Claude, Gemini, and Google's AI Overview. Those engines don't pick the top 10 blue links. They read the web, build an answer, and cite a few sources. Whether your domain shows up in the citation depends on whether the engine could parse your site cleanly.

Sites that have specific structured-data signals — JSON-LD schema, explicit AI-content policy, clean canonicals — get cited. Sites that don't get parsed inconsistently or skipped. The fixes below are the ones that move the most ground per hour of work.

This list is what we audit when an operator runs a free check at withmerkava.com/try. The audit is run by Beacon — Merkava's SEO agent — and the same 10 fixes show up most often, ranked here by AI-search impact.

1. /llms.txt — the file most sites are missing

What it is. A plain-text file at the root of your domain that tells AI crawlers what your site is, where the canonical pages are, and your AI-content policy. It's the AI-era cousin of robots.txt.

Why it matters. Perplexity, Claude, and ChatGPT all check for it. Without it, your site is parsed generically. With it, the AI gets a curated overview written in your voice.

The fix. A text file at yourdomain.com/llms.txt listing your top product / solutions / pricing pages with one-line descriptions. Roughly 30-50 lines for most sites. Copy-paste from a template; takes ~20 minutes the first time.

2. WebSite + Organization JSON-LD on the homepage

What it is. Two structured-data blocks in the <head> of your homepage that tell search engines who you are, what site this is, and how to surface internal search.

Why it matters. Without WebSite schema, AI engines have trouble attributing citations to your brand. With it, your domain shows up as the source name (not as a URL fragment).

The fix. A <script type="application/ld+json"> block with your name, URL, and brand. About 15 lines of JSON. Goes once on the homepage; persists.

3. FAQPage schema on /pricing

What it is. Structured Q&A blocks that answer the questions buyers ask before signing up.

Why it matters. AI search engines love FAQPage schema for direct quotation. A pricing page with FAQPage schema gets cited when someone asks ChatGPT "what does X charge for Y."

The fix. 4-8 Q&A pairs grounded in your actual pricing, formatted as FAQPage entities. The Q's should match how customers actually phrase the question, not how you'd phrase it internally.

4. BreadcrumbList on every key page

What it is. Navigation hierarchy as structured data — Home → Section → Page.

Why it matters. Google rewrites SERP listings using BreadcrumbList. Without it, your URL shows raw. With it, your domain is presented as a navigable structure.

The fix. A 10-line JSON-LD block per page describing the path. Auto-generatable from your URL structure; one-time setup.

5. Open Graph completeness

What it is. og:title, og:description, og:image, og:url meta tags on every shareable page.

Why it matters. Missing OG tags break previews on LinkedIn, Slack, Twitter — every share that doesn't render a card is a lost impression. AI summarizers also use OG metadata as the canonical short-form description of your page.

The fix. 4 meta tags per page. The image needs to be 1200×630 minimum. If you don't have OG images, generate them programmatically from page metadata.

6. Twitter Card meta

What it is. twitter:card, twitter:title, twitter:image on every page.

Why it matters. Twitter/X share previews depend on these. Without them, a shared link renders as a flat URL.

The fix. Three meta tags per page, mostly mirrors of your OG tags.

7. AI-content-policy meta tag

What it is. A meta tag that explicitly tells AI crawlers your policy: allow, summarize-ok, cite-required, brand attribution rules.

Why it matters. Most sites either block AI crawlers entirely (losing the citation traffic) or stay silent (and get parsed inconsistently). An explicit policy gets you cited correctly with brand attribution preserved.

The fix. One meta tag site-wide. Standard format: <meta name="ai-content-policy" content="allow; summarize-ok; cite-required; brand=YourBrand">.

8. Sitemap.xml regeneration + ping

What it is. A regenerated sitemap from the actual filesystem state, with priority + changefreq tuned per section, plus a ping to Google + Bing on every change.

Why it matters. Stale sitemaps mean new pages aren't crawled for weeks. A weekly regen + ping closes that gap.

The fix. A sitemap generator that runs on every deploy, plus a daily cron that pings the search engines if anything changed. ~50 lines of script for most sites.

9. Canonical link tags on every page

What it is. A <link rel="canonical"> on every page pointing to its preferred URL.

Why it matters. Without canonicals, duplicate-content variants (utm-tagged URLs, /index.html vs /, http vs https) split your ranking signal. With them, the signal consolidates on one URL.

The fix. One link tag per page. Auto-injectable based on URL.

10. Internal linking on under-linked pages

What it is. A check of which pages have under 3 internal inbound links, plus a fix that adds links from semantically-related pages.

Why it matters. Pages with fewer than 3 internal inbound links get crawled rarely and rank poorly. Closing the gap is one of the highest-leverage technical SEO moves.

The fix. A "related content" block in your footer or sidebar that closes the link loop. One template, applied to all under-linked pages.

How long this takes if you do it yourself

A motivated developer can ship all 10 fixes in 6-12 hours of focused work. The bottleneck is not the technical lift; it's deciding what to put in the FAQ schema, choosing OG image styles, and writing the /llms.txt content. Those are content decisions, not engineering decisions, and they're what slow down the in-house attempt.

If your site is in a Git repo and you'd rather have an SEO agent ship the fixes as PRs you review and merge, that's what Merkava's SEO agent (Beacon) does. Same 10 fixes, plus a weekly recheck so they stay landed as the site evolves.

Run the audit on your domain

Free, no signup. Returns a gap list across all 10 fixes plus the exact fix content for each gap. Sharable report URL.

Run free audit →