Technical GEO: Is Your Website Ready for AI Crawlers?

Technical GEO audits your robots.txt for AI bot access, measures content freshness and citation risk, calculates a citability score, and verifies schema markup and SSR.

Robots.txt Auditing for AI Bots

The first thing Technical GEO checks is your robots.txt file. This file controls which bots can crawl your website — and many sites are unknowingly blocking AI crawlers.

Screenshot: Robots.txt audit showing allowed/blocked status for each AI crawler

AI crawlers MentionLayer checks for:

BotCompanyUsed By
GPTBotOpenAIChatGPT, Microsoft Copilot
ClaudeBotAnthropicClaude
PerplexityBotPerplexityPerplexity.ai
Google-ExtendedGoogleGemini
BytespiderByteDanceTikTok AI
CCBotCommon CrawlTraining data for many models
AmazonbotAmazonAlexa AI

Common issues found: - Blanket disallow — "Disallow: /" for all bots blocks AI crawlers too - Overly restrictive rules — Blocking entire directories that contain valuable content - Stale rules — Old robots.txt that doesn't account for new AI crawlers - No explicit allow — Some AI bots require explicit "Allow" rules to crawl

MentionLayer shows you exactly which bots are allowed and which are blocked, with specific line numbers in your robots.txt. It generates a recommended robots.txt that allows AI crawlers while maintaining any legitimate blocks you have.

Content Freshness & Citation Risk

AI models prefer fresh, authoritative content. Technical GEO analyzes your key pages for freshness signals.

Screenshot: Content freshness report showing pages ranked by last-modified date and citation risk

What gets checked: - Last-modified dates on key pages (homepage, product pages, about page) - Publish dates on blog posts and articles - Content staleness indicators — References to past years, outdated statistics, deprecated features - Update velocity — How often you publish or update content

Citation risk assessment: Pages with outdated content are a citation risk. If an AI model cites your page that says "in 2023, the market is expected to grow..." — that makes your brand look stale. Technical GEO flags pages with the highest citation risk so you can prioritize updates.

Content freshness score factors: - Average page age across key pages - Percentage of pages updated in the last 90 days - Blog publishing frequency - Presence of "last updated" dates visible on pages

Citability Scoring Explained

Citability measures how easy it is for AI models to extract specific, citable information from your website.

Screenshot: Citability score breakdown showing per-page scores and improvement suggestions

High-citability content has: - Clear H2/H3 heading structure (AI models use headings to navigate) - Specific data points and statistics (numbers are more citable than generalities) - FAQ sections with concise answers (AI models love Q&A format) - Lists and comparison tables (structured data is easier to parse) - Author attribution and publication dates (trust signals) - Proper schema markup (helps AI understand content structure)

Low-citability content has: - Wall-of-text paragraphs with no headings - Vague marketing copy with no specifics - Content behind JavaScript rendering that AI crawlers can't access - Missing meta descriptions and structured data - No FAQ or knowledge-base style content

The citability score (0-100) is based on: - Heading structure quality (20%) - Specific, quotable statements per page (25%) - FAQ/structured content presence (20%) - Schema markup completeness (15%) - SSR/crawlability (10%) - Content depth and originality (10%)

MentionLayer provides page-by-page citability scores with specific recommendations for improvement.

Schema & SSR Verification

Technical GEO verifies that your website's technical foundation supports AI crawling and understanding.

Screenshot: Schema verification showing detected vs missing schema types and SSR status

Schema markup check: MentionLayer scans your website for JSON-LD structured data and reports which types are present vs missing. Key schema types for AI visibility: - Organization — Your company identity - Product / Service — What you sell - FAQ — Common questions (highly cited by AI models) - Article — Blog posts and content pieces - BreadcrumbList — Site navigation structure - Review — Customer reviews on your site - HowTo — Step-by-step guides

Server-Side Rendering (SSR) check: AI crawlers often can't execute JavaScript. If your content is client-side rendered (React SPA without SSR), AI bots may see a blank page. Technical GEO tests your key pages from a bot's perspective and flags any pages that require JavaScript to render content.

Recommendations generated: - Missing schema types with ready-to-use JSON-LD code - SSR configuration suggestions for your framework - Meta tag improvements for better AI understanding - Canonical URL and sitemap optimization tips

Frequently Asked Questions

What does the Technical GEO scan check?

Three things: (1) Your robots.txt file — which AI crawlers are allowed and which are blocked, (2) Content freshness — how recently your key pages were updated and whether stale content is hurting citability, (3) Citability score — a composite measure of how easy it is for AI models to extract and cite information from your website.

Why should I unblock AI crawlers in robots.txt?

If GPTBot, ClaudeBot, or PerplexityBot are blocked in your robots.txt, those AI models literally cannot read your website content. They'll rely entirely on third-party sources (forums, press, reviews) to learn about your brand — which may be incomplete or inaccurate. Unblocking AI crawlers lets them access your authoritative content directly.

What AI crawlers exist?

The main ones are GPTBot (OpenAI/ChatGPT), ClaudeBot (Anthropic/Claude), PerplexityBot (Perplexity), Google-Extended (Gemini), Bytespider (TikTok), CCBot (Common Crawl), and Amazonbot. MentionLayer checks for all of these in your robots.txt.

What is a citability score?

Citability measures how easy it is for AI models to extract useful, citable facts from your website. High citability means your content has clear headings, specific data points, FAQ sections, and structured markup. Low citability means AI models struggle to pull concrete information from your pages.

Next Steps