GEO and AEO: how to test your site for AI agents

For years, SEO was the single standard for how visible a website was. Search engines indexed pages, ranked them and sent traffic. The game was about keywords, links and structure. That still matters. But something has shifted in the last two years that changes what "visible" actually means.

AI agents now mediate user decisions. ChatGPT answers questions directly. Perplexity summarises pages instead of listing them. Claude reads documents and makes recommendations. A user asking "which QA consultant should I hire" may never visit a website at all — they receive a synthesised answer based on what AI could read, understand and trust from across the web.

That is where GEO and AEO come in.

What GEO and AEO actually mean

GEO — Generative Engine Optimization — is about making your content readable and citable by large language models. It is not about gaming an algorithm. It is about ensuring that when an AI agent crawls your page, it can extract meaning, context and trust signals efficiently.

AEO — Answer Engine Optimization — is a related concept. It focuses specifically on structuring content so that AI answer engines can pick it up and present it in responses to users. If your expertise does not appear in AI-synthesised answers, you are effectively invisible to a growing segment of your audience.

Neither replaces SEO. They extend it. A site that ranks well in Google but returns an error to Googlebot-Extended or blocks all AI crawlers still has a problem — it just shows up in a different report.

Why this matters for testing

Most developers I talk to think about AI agent readiness as a marketing concern. It is not. It is a configuration and architecture concern that has direct testable outcomes.

Does your robots.txt block GPTBot? Does it allow it, but block everything else that matters? Is there a llms.txt file that gives AI agents a structured index of your content? Can your pages respond to Accept: text/markdown requests, or do they always return HTML? Do your blog posts have a structured sitemap that AI crawlers can follow efficiently?

These are not hypothetical questions. They are checks. Each one has a pass or fail state, and each failure has a consequence for how well AI agents can use your content.

What to test

When I started thinking about AI agent readiness systematically, I identified five categories of checks that cover the most important failure points.

Discoverability — whether AI agents can find and navigate your content at all. This includes robots.txt configuration, a proper sitemap, HTTP link headers pointing to canonical resources and the ability to serve markdown via content negotiation. If an agent cannot index your content efficiently, none of the other optimisations matter.

Bot access control — whether your access rules are intentional. Many sites have robots.txt files written years ago that accidentally block modern AI crawlers. Others block all bots indiscriminately without realising they are also blocking legitimate agents. The right configuration requires a deliberate decision about which agents to allow, restrict or authenticate.

API and protocol surface — whether your site exposes structured access points. This includes an API catalogue (at /api.json or equivalent), OAuth discovery endpoints, MCP server declarations and A2A agent cards. These are newer standards, but they define how agentic systems interact with your platform beyond passive reading.

Commerce protocols — for sites that sell or transact, whether they expose machine-readable pricing, payment and checkout signals that AI agents can reason about.

Content signals — whether your pages signal their own purpose and trust level. This is about metadata, schema markup and directives that help agents decide whether to cite, summarise or skip your content.

The Agent-Ready Scanner

To make this testable without manual configuration every time, I built the Agent-Ready Scanner. It accepts a URL, runs a set of checks across all five categories and returns a report with pass, fail and warning states for each check.

The scanner covers 18 checks across the five categories. Each failed check includes an explanation and a fix prompt — a structured description of what needs to change and why. The idea is not to produce a score for bragging rights, but to give teams a concrete starting point for a conversation about what they actually want AI agents to be able to do with their site.

The most common failures I see: missing llms.txt, robots.txt that blocks all AI crawlers without distinction, no content negotiation support and no sitemap at all. These are fixable in an afternoon once you know they exist.

How to approach this as a tester

The way I approach AI agent readiness is the same way I approach any integration testing problem. Start with what the agent actually does — reads, navigates, requests, interprets — and verify each step.

That means: make real requests with the headers an AI crawler would send. Check what robots.txt returns for specific user agents. Verify that the sitemap is reachable, valid and up to date. Request pages with Accept: text/markdown and confirm the response format and headers are correct.

Where it gets interesting is in the signal layer. A page can technically be accessible and still fail to communicate what it is about. Schema markup missing, author information absent, no structured way for an agent to understand whether this is a product page, a blog post or a documentation reference. Those are quality issues — and quality issues are exactly what testers are for.

What this changes for QA

AI agent readiness is not a separate discipline. It is a natural extension of integration and configuration testing, applied to a new class of consumer: the autonomous agent.

The way I see it, a tester who understands HTTP, configuration files and structured content has everything they need to start. The domain knowledge is shallow enough to learn in a day. The testing discipline is what makes the difference between a one-time check and a systematic quality gate.

If AI agents are increasingly mediating how users discover and evaluate services, then making sure your site works for agents is not optional. It is the new baseline.

Summary

GEO and AEO extend classical SEO for AI-mediated discovery
the key testable areas are discoverability, bot access control, API surface, commerce protocols and content signals
most failures are configuration issues: missing files, wrong bot rules, no content negotiation
the Agent-Ready Scanner provides structured, reproducible checks with actionable fix prompts
treating AI agent readiness as an integration testing problem is the fastest way to get systematic coverage

Is your site ready for AI?

Check whether AI tools read your content correctly — or fill in the gaps themselves. A quick test that shows what AI sees and what it ignores.

Test my site

What automated accessibility tests actually cover — and what they miss How to combine AI and human QA (and why automation alone is not enough)Testing authentication in Playwright (without pain and flaky tests)How to design a Playwright regression suite that supports releases instead of blocking them QA audit before release: what to check before you ship A test strategy that scales: what to set up before adding more tests

GEO and AEO: a new layer your website needs to pass