By Ryan Kings, Founder & CTO at AEOForged · Published June 2026 · 10 min read
AI Readiness Checklist 2026: 33 Signals Every Website Should Pass
AI Readiness Checklist 2026: 33 Signals Every Website Should Pass
AI readiness is the measurable degree to which a website can be cited by answer engines and operated by autonomous agents. In 2026, that readiness splits into two lanes: content readiness (whether ChatGPT, Perplexity, and Google AI Overviews can extract and cite your pages) and agent readiness (whether AI agents can discover, read, and act on your site without human help).
This checklist covers 33 discrete signals across both lanes. Each signal maps to AEOForge's aeo_agent_ready scanner, which cross-references Cloudflare's isitagentready.com (18 checks) and Google Lighthouse's Agentic Browsing category (9 audits). A site that passes all applicable signals is fully AI-ready by every major 2026 benchmark.
Measured proof: AEOForged.com went from 11 detected signals (21 missing) to 28/28 applicable signals after implementing this checklist. The remaining 4 signals are commerce-only protocols (x402, UCP, ACP, MPP) that do not apply to SaaS sites. Cloudflare's isitagentready.com score moved from 25% Basic to 75% Advanced. Lighthouse Agentic Browsing went from 50% to 100%. The FORKOFF agent-readiness benchmark improved from 46% (Tier D) to 100% (Tier A).
What content readiness signals should you check?
Content readiness is the set of on-page signals that enable answer engines to extract, attribute, and cite your content. These 17 signals cover structure, schema, and quality dimensions that determine whether your pages appear in AI-generated answers.
Structure and scoring
- Site-wide AEO score above 75 across 8 dimensions (Structure, Direct Answer, Schema, Entity, E-E-A-T, Recency, Readability, Extractability)
- Each page has a single H1 heading that matches the primary search intent
- Heading hierarchy uses no skipped levels (H1 to H2 to H3, never H1 to H3)
- Stable heading IDs exist on all headings so agents can deep-link to specific claims
- Question-format headings are followed by a direct answer within 1-2 sentences
- Token count per page stays under 8,000 tokens (the standard agent context window ceiling)
- Text-to-HTML ratio is above 15%, indicating substantive content rather than scaffolding
Structured data signals
Google's structured data documentation defines the required markup types. Every site should implement:
- Organization schema on the homepage with verified sameAs links to official profiles
- WebSite schema with SearchAction on the homepage
- Article or BlogPosting schema on all editorial pages with author, datePublished, and dateModified
- FAQPage schema on any page containing question-and-answer sections
- BreadcrumbList schema on inner pages for navigation context
- Product/Offer schema on pricing or product pages (where applicable)
- All JSON-LD validates without errors in Google's Rich Results Test
Content quality and E-E-A-T
Google's helpful content guidelines define the quality bar that answer engines inherit. The relevant signals are:
- Every factual claim is grounded in a retrievable source with an inline citation or reference
- Author credentials and publication dates are visible on all editorial pages
- Content is updated when underlying facts change, with machine-readable lastModified dates
- At least 3 unique external source domains are cited per article (demonstrates research breadth)
What agent readiness signals does your site need?
Agent readiness is the infrastructure layer that enables autonomous AI agents to discover, parse, and interact with your site programmatically. These 16 signals span discoverability, content accessibility, access control, and protocol support.
Discoverability signals
- robots.txt exists with a Sitemap: directive and explicit rules for AI bots (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, CCBot)
- Content-Signal directive in robots.txt declares intent:
Content-Signal: search=yes, ai-input=yes, ai-train=no - sitemap.xml is valid XML with
<lastmod>dates and is referenced from robots.txt - HTTP Link headers on the homepage include rel=sitemap and rel=describedby pointing to llms.txt
- Auto-discovery link tags in
<head>reference llms.txt, agents.json, and agent-instructions.md
Content accessibility signals
The llms.txt specification defines how AI models consume site-level context. Required signals:
- llms.txt exists at /llms.txt with an H1 title, blockquote summary, 5+ absolute links, under 20KB, no HTML
- llms.txt validation passes the llmstxt.org validator without errors
- Markdown content negotiation returns text/markdown when
Accept: text/markdownis sent - Server-side rendering ensures key content is in the initial HTML response, not behind client-side JavaScript
- Schema.org density includes Organization + WebSite + at least one content-type schema (Article, Product, or FAQ)
Bot access control signals
- AI-specific rules exist in robots.txt for at least 5 named AI bots (not only a wildcard User-agent)
- Content-Signal directive is present and parseable
- Web Bot Auth / HTTP Message Signatures support is implemented (optional for advanced sites; Cloudflare-proxied sites inherit this via RFC 9727 API Catalog discovery)
Protocol and capability discovery signals
- agents.json at /agents.json declares a typed action manifest describing what agents can do on the site
- agent-permissions.json declares scope constraints and rate limits on those actions
- agent-instructions.md at /agent-instructions.md provides brand voice rules, pricing constraints, and escalation contacts
- MCP Server Card at /.well-known/mcp/server-card.json (SEP-2127 compliant)
- A2A Agent Card at /.well-known/agent-card.json (v1.0) + /.well-known/agent.json (v0.3 compat)
- Agent Skills Index at /.well-known/agent-skills/index.json
- API Catalog at /.well-known/api-catalog per RFC 9727 (if you expose public APIs)
- OAuth/OIDC discovery at /.well-known/openid-configuration (if the site has authenticated workflows)
Which signals are optional for non-commerce sites?
Four signals apply only to e-commerce or transactional sites. Non-commerce sites (SaaS, media, documentation) score N/A on these and should not implement them:
- x402 payment headers for agent-initiated micropayments
- UCP at /.well-known/ucp (Google Universal Commerce Protocol)
- ACP (OpenAI Agentic Commerce Protocol)
- MPP (Machine Payment Protocol via OpenAPI x-payment-info extensions)
How should you prioritise fixes?
The highest-impact order for reaching full AI readiness from zero starts with discoverability, then accessibility, then protocols. Each item below includes estimated implementation time for a standard Next.js or WordPress site:
- robots.txt with AI bot rules + Content-Signal (10 minutes)
- sitemap.xml validated and referenced from robots.txt (10 minutes)
- llms.txt authored as a spec-compliant llmstxt.org document (1 hour)
- Auto-discovery link tags added to your layout head (15 minutes)
- Markdown content negotiation via a single middleware file (1-2 hours)
- Schema.org minimum: Organization + WebSite JSON-LD (1 hour)
- agents.json + MCP Server Card generated and deployed (1 hour)
- A2A Agent Card generated alongside MCP in the previous step (included)
- agent-instructions.md written as a brand-voice runbook (30 minutes)
- Agent Skills Index + API Catalog for sites with public APIs (2 hours)
How can you audit all 33 signals at once?
AEOForge's aeo_agent_ready tool checks every signal in a single free call. The report returns per-signal pass/fail results, Cloudflare isitagentready.com compatibility percentage, Lighthouse Agentic Browsing compatibility percentage, and prioritised fix recommendations sorted by impact.
For sites that want automated implementation, aeo_make_agent_ready (8 credits) analyses the live site, detects the platform, and returns deployable files. Next.js sites receive TypeScript middleware and route handlers. WordPress sites receive PHP snippets. Cloudflare Workers sites receive compatible worker scripts.
Key takeaways
- AI readiness in 2026 is two distinct disciplines: content readiness for citations and agent readiness for operability.
- The 33 signals map directly to Cloudflare isitagentready.com, Google Lighthouse Agentic Browsing, and the FORKOFF benchmark.
- Content readiness covers structure, structured data, and E-E-A-T quality signals that answer engines require for citation.
- Agent readiness covers robots.txt rules, llms.txt, protocol manifests, and capability discovery files.
- Four commerce-only protocols (x402, UCP, ACP, MPP) are N/A for non-transactional sites and should not inflate your missing-signal count.
- The fastest path to full readiness starts with robots.txt and sitemap.xml (20 minutes total) and progresses through llms.txt, schema, and protocol files.
- AEOForged.com achieved 28/28 applicable signals, proving the checklist is implementable for a production SaaS site in a single sprint.