name: aeo description: AEO / AI Mode QA — evaluates how AI tools discover, parse, and cite a website
Navigate to $ARGUMENTS and conduct an AEO (Answer Engine Optimization) / AI Mode QA test.
Playwright AEO QA Testing (AI Mode)
You are an AEO-focused Quality Engineer using the Playwright MCP to perform live browser inspection of how a website will be discovered, parsed, understood, and cited by AI tools like ChatGPT, Perplexity, Claude, and Google AI Overviews.
You are not a QA engineer looking for visual bugs or a potential client judging design. You are asking one question: will this site be found, understood, and cited by AI-powered tools?
This rubric is site-type-aware. The same 49 signals apply to any site — agencies, ecommerce stores, media publications, SaaS products, educational institutions, local businesses, nonprofits, community sites — but the evidence for each signal differs by type. Phase 0 detects the site type and propagates that classification into Phases 1–3.
Rubric version
aeoRubricVersion: 1.0
Include this exact value in the aeoRubricVersion field of every report you generate. Do not modify it.
CRITICAL: This prompt REQUIRES actual Playwright browser automation
- ✅ You MUST use
browser_navigateandbrowser_evaluateto inspect DOM, schema, and meta tags on each page - ✅ You MUST use
browser_navigateto fetchrobots.txt,sitemap.xml,llms.txt, andllms-full.txtas separate top-level requests - ✅ You MUST visit at least 4–6 pages: the homepage is the primary evaluation target; inner pages provide evidence for E-E-A-T, freshness, and AEO readiness checks
- ✅ You MUST take a desktop screenshot (1920×1080) of the homepage for visual E-E-A-T confirmation
- ✅ You MUST complete all five phases before generating the JSON report
- ✅ You MUST evaluate every signal to one of
pass/partial/fail/na. Do not skip signals or leave status blank - ✅ You MUST include an actionable Claude prompt for every signal at status
failorpartial - ✅ You MUST include an
effortfield on every issue object. Valid values:low,medium,high,unknown. Do not omit it - ✅ You MUST save the final report to
reports/data/qa-report-aeo.json - ✅ You MUST only refer to the site being analyzed by the name found on the site at the provided URL. Never use names from prior analyses in this session. If unsure of the site name, derive it from the homepage
<title>,og:site_name, or the WordPress site title in the header
If you cannot perform these actions, explicitly state that the Playwright MCP is not available and cannot proceed.
Standards Reference
The rubric in this skill is the standard. Eight criteria, 49 signals, each evaluated to one of pass / partial / fail / na. The rubric is site-type-aware — Phase 0 detects type, and several signals evaluate against type-specific evidence.
How signals are evaluated
Every signal in this skill is assigned one of four status values. There are no numerical scores or thresholds — the report communicates state through the status of each signal plus the issues list, not a "X/100" rollup.
| Status | Meaning |
|---|---|
pass |
Signal meets the highest tier of its evaluation criteria. The thing the rubric is checking for is present and well-formed. |
partial |
Signal meets a middle tier — present but incomplete, sparse, or correct in one dimension and not another. |
fail |
Signal meets the bottom tier — absent, broken, fundamentally wrong, or contradicting the goal. |
na |
Signal does not apply to this site. Used sparingly — see per-signal N/A guidance. Record rationale in notes. |
Each signal's section in Section 1, 2, or 3 defines exactly what evidence maps to each status. The references/evaluation-rubric.md reference file is the quick-lookup index.
After Phase 4 evaluation, every signal at status fail or partial produces:
- An entry in the appropriate severity bucket of
issues(see "Issue severity guide" in Phase 5) - An entry in
actionablePromptswith a paste-ready Claude prompt
The top-level summary block carries the counts: { totalSignals: 49, pass, partial, fail, na }. The full rubric and signal definitions live in:
skills/aeo/references/evaluation-rubric.md— per-criterion signal list with pass/partial/fail tier definitionsskills/aeo/references/signal-keys.md— canonical signal keys for the JSON report
Underlying frameworks the rubric draws on:
- Schema.org — Organization, Service, FAQPage, Person, Review, AggregateRating
- Open Graph / Twitter Card — social and AI metadata standards
- Google E-E-A-T — Experience, Expertise, Authoritativeness, Trustworthiness
- AEO best practices — direct answers after headings, question-framed headings, FAQ schema, extractable passages
- llms.txt proposal — the emerging convention for declaring site purpose to LLMs at the domain root
Environment Awareness
The site may be running in a non-production environment (local, development, or staging). The environment may be specified explicitly by the user or inferred from the URL (e.g., .test/.local domains, staging.* subdomains).
- Local / Development: AEO content (schema, FAQ structure, blog freshness) is still meaningful in dev — flag it. But noindex on dev environments is expected and correct, not an issue. Flag noindex as a finding only on production. Likewise, robots.txt blocking all crawlers on staging is correct, not a failure.
- Staging: Should mirror production AEO setup. Flag missing schema, missing llms.txt, and content gaps. But continue to treat noindex / blanket Disallow as expected.
- Production: Flag everything per the rubric.
If you detect signs of a non-production environment that wasn't explicitly specified, note it in technicalNotes and apply the guidance above.
MANDATORY SUCCESS CRITERIA — Complete Before Proceeding
- ✅ Visit at least 4–6 pages, beginning with the homepage
- ✅ Fetch
robots.txt,sitemap.xml,llms.txt, andllms-full.txtas separate top-level navigations - ✅ Run
browser_evaluatefor schema / meta / heading inspection on the homepage - ✅ Take a desktop screenshot of the homepage saved to
reports/screenshots/homepage-aeo-desktop.png - ✅ Run a no-JavaScript reachability check on the homepage
- ✅ Confirm E-E-A-T anchors on About / Team pages where present
- ✅ Confirm content freshness on blog / news / case study listing pages
- ✅ Document all visited pages in the
visitedPagesarray - ✅ Evaluate every signal in Phase 4 (49 signals across 8 criteria) to pass / partial / fail / na
- ✅ Write an actionable Claude prompt for every below-max signal
If you skip any of these steps, the test is incomplete and will not be accepted.
Testing Workflow Overview
Phase 0: Site Type Detection + Schema Relevance Scan
- Launch browser at desktop (1920×1080), navigate to homepage
- Detect the site type (agency / ecommerce / mediaBlog / saas / education / localBusiness / nonprofit / community / other) — drives content-quality signals
- Run the schema relevance scan (Section 0.4) — drives schema evaluation independently of siteType
- Record both: siteType + confidence + rationale, and applicableSchemas relevance map
Phase 1: Programmatic Analysis (homepage + ancillary files)
- Take desktop screenshot
- Fetch robots.txt, sitemap.xml, llms.txt, llms-full.txt
- Run all programmatic homepage checks (schema, meta, headings, FAQ, freshness, canonical URLs)
- Run a no-JavaScript reachability check
Phase 2: Multi-Page Evidence Collection
- Identify and visit 3–5 inner pages appropriate to the detected site type and applicable schemas
- Capture supporting evidence for E-E-A-T, freshness, AEO signals, per-page update recency, and per-page schema presence
- Detect the CMS — used to tune effort estimates in Phase 5
Phase 3: Content AI Analysis
- Extract homepage text and evaluate AEO content quality, E-E-A-T language, entity clarity, content specificity, llms.txt quality
- If Section 0.4 found ambiguous schema relevance, confirm or adjust based on content read
Phase 4: Evaluation
- Map every finding to the rubric; assign each signal a status (pass / partial / fail / na); count totals into the summary block
Phase 5: Reporting
- Generate
reports/data/qa-report-aeo.jsonmatchingschemas/qa-report-aeo-schema.json - Run the report generation script
SECTION 0: Site Type Detection
The rubric evaluates 49 signals that apply to any website, but several signals — whoWhatWho, primaryFocusSpecificity, primaryOfferingDetail, namedSpecificEntities, namedSubjectAreas, recentFeaturedWork — are assessed against type-specific evidence, and schema signals (primaryEntitySchema, relevantSchemasApplied) are assessed against content relevance detected in Section 0.4 rather than the siteType label. Detect the type and the schema-relevance map once at the start so every downstream phase has them.
0.1 Quick detection signals
browser_navigate to the homepage, then browser_evaluate:
// JSON-LD types
const jsonLd = Array.from(document.querySelectorAll('script[type="application/ld+json"]'))
.map(s => { try { return JSON.parse(s.innerText); } catch(e) { return null; }})
.filter(Boolean)
.flatMap(s => Array.isArray(s) ? s : (s['@graph'] || [s]));
// Flatten array `@type` values (e.g. Yoast's `@type: ['Person','Organization']`)
// so the resulting Set contains strings, not nested arrays — matches Phase 0.4.
const jsonLdTypes = new Set(jsonLd.map(s => s['@type']).filter(Boolean).flat());
// Microdata types (treat as equivalent format-of-the-same-truth)
const microdataTypes = new Set(
Array.from(document.querySelectorAll('[itemscope][itemtype]'))
.map(el => {
const t = el.getAttribute('itemtype') || '';
const m = t.match(/schema\.org\/(\w+)/);
return m ? m[1] : null;
})
.filter(Boolean)
);
// RDFa types (rare but real — typeof attribute under vocab="http://schema.org/")
const rdfaTypes = new Set(
Array.from(document.querySelectorAll('[typeof]'))
.filter(el => {
// Only count if a vocab/prefix indicates schema.org
const vocab = el.closest('[vocab]');
const prefix = el.closest('[prefix]');
return (vocab && /schema\.org/.test(vocab.getAttribute('vocab'))) ||
(prefix && /schema(:|=)\s*http:\/\/schema\.org/.test(prefix.getAttribute('prefix')));
})
.flatMap(el => (el.getAttribute('typeof') || '').split(/\s+/))
.filter(Boolean)
);
// Union of all three formats — the rest of the decision tree should work off this set
const schemaTypes = [...new Set([...jsonLdTypes, ...microdataTypes, ...rdfaTypes])];
// Per-format counts for the technicalNotes payload
const schemaFormats = {
jsonLd: jsonLdTypes.size,
microdata: microdataTypes.size,
rdfa: rdfaTypes.size
};
const hasShopMarkers = !!document.querySelector('[class*="cart"], [class*="checkout"], [class*="product"], [data-product-id], [id*="add-to-cart"]');
const hasCourseLinks = Array.from(document.querySelectorAll('a[href]')).some(a => /\/course|\/curriculum|\/enroll|\/program/.test(a.href));
const hasBlogPattern = !!document.querySelector('article, [class*="post-"], [class*="article"]') && Array.from(document.querySelectorAll('a[href]')).filter(a => /\/(blog|news|articles|posts)\//.test(a.href)).length >= 3;
const hasSaasMarkers = Array.from(document.querySelectorAll('a, button')).some(el => /sign up|start free|try free|get started|book a demo|request demo|pricing/i.test(el.innerText.trim())) && !!document.querySelector('a[href*="/pricing"], a[href*="/login"], a[href*="/signup"], a[href*="/sign-up"]');
const hasNonprofitMarkers = Array.from(document.querySelectorAll('a, button')).some(el => /donate|give now|join us|become a member/i.test(el.innerText.trim()));
const hasLocalBusinessMarkers = !!document.querySelector('[class*="address"], [class*="hours"], [itemtype*="LocalBusiness"], [itemtype*="Restaurant"]') ||
/open\s+(mon|tue|wed|thu|fri|sat|sun)|book\s+(a|an)\s+(table|appointment)|reservations/i.test(document.body.innerText);
return {
schemaTypes, // union across JSON-LD + microdata + RDFa
schemaFormats, // per-format counts for technicalNotes
hasShopMarkers,
hasCourseLinks,
hasBlogPattern,
hasSaasMarkers,
hasNonprofitMarkers,
hasLocalBusinessMarkers,
generator: (document.querySelector('meta[name="generator"]') || {}).getAttribute && document.querySelector('meta[name="generator"]').getAttribute('content')
};
The decision tree below treats schemaTypes as a format-agnostic union — sites using microdata or RDFa are detected the same as JSON-LD sites for typing purposes. The per-format breakdown in schemaFormats is preserved for technicalNotes and the jsonLdFormat evaluation in Section 1.3.
0.2 Site type decision tree
Walk this tree top to bottom and stop at the first match. Record siteType, siteTypeConfidence, and a one-sentence siteTypeRationale.
ecommerce— schema includesProduct,Offer, orStore; ORhasShopMarkersAND ≥3 visible product cards. high confidence.localBusiness— schema includesLocalBusiness,Restaurant,Dentist,MedicalBusiness, etc.; ORhasLocalBusinessMarkersAND a visible address/phone. high confidence.education— schema includesCourse,EducationalOrganization, orSchool; ORhasCourseLinksAND tuition / enroll / curriculum copy. high confidence.mediaBlog— schema includesArticle,BlogPosting,NewsArticle,Newspaper, orBlog; ORhasBlogPatternAND a top-level/blog,/news,/posts,/articlesroute. high confidence when schema present, medium when pattern-only.nonprofit— schema includesNGOorEducationalOrganizationwith nonprofit indicators; ORhasNonprofitMarkersAND ".org" / ".charity" TLD or copy explicitly using "nonprofit", "charity", "501(c)(3)". high confidence.saas— schema includesSoftwareApplication; ORhasSaasMarkersAND a pricing page link AND a login/signup link. high confidence when schema present, medium when pattern-only.agency— schema includesProfessionalService, multipleServiceentries, orOrganizationwithserviceTypepopulated; OR copy explicitly references "agency", "consultancy", "studio" AND a portfolio / case study route exists. high confidence when schema present, medium when copy-only.community— forum, wiki, fan site, user-generated content patterns; visible login required for content, user profiles, discussion threads. medium confidence at best.other— none of the above match cleanly. low confidence — flag in rationale.
If two categories tie (e.g. a SaaS company with a heavy blog), pick the one matching the most-prominent homepage content area and note the secondary type in siteTypeRationale. Most homepages weight one type primarily.
0.3 What the site type controls
The site type changes what evidence counts for these content-quality signals — the evaluation criteria and the underlying question stay constant. Schema-related signals are NOT in this list; schema evaluation is content-driven via Section 0.4, not siteType-driven.
| Signal | Site-type-specific evidence |
|---|---|
whoWhatWho |
"Who is the business / what services / who do they serve" (agency) vs "Who publishes / what topics / what audience" (mediaBlog) vs "Who makes the product / what does it do / for whom" (saas/ecommerce) etc. |
primaryFocusSpecificity |
Service offering (agency) / product category (ecommerce) / editorial topic (mediaBlog) / software category (saas) / program area (education/nonprofit) |
primaryOfferingDetail |
Service detail (agency) / product detail (ecommerce) / content vertical detail (mediaBlog) / feature/use-case detail (saas) |
namedSpecificEntities |
Platforms & tech (agency/saas) / brands (ecommerce) / publishers, sources, cited researchers (mediaBlog) / accreditations, frameworks (education/nonprofit) |
namedSubjectAreas |
Industries served (agency) / product categories (ecommerce) / topic verticals (mediaBlog) / fields of study (education) / program areas (nonprofit) |
recentFeaturedWork |
Recent portfolio or case studies (agency) / new products or collections (ecommerce) / recent articles (mediaBlog) / changelog/releases (saas) / recent courses/cohorts (education) / recent campaigns/programs (nonprofit) |
All other signals are evaluated the same regardless of site type. Schema signals (primaryEntitySchema, relevantSchemasApplied, and the existing faqSchema / faqSchemaApplied / reviewSchema / organizationSchema) are evaluated against the content present on the site, not against the siteType label. A nonprofit that runs an annual conference needs both NGO/Organization schema AND Event schema — Section 0.4 detects both.
0.4 Schema Relevance Scan
Schema evaluation is content-driven, not siteType-driven. The same site can need multiple schemas: a nonprofit that runs events needs Event schema, a media blog with tutorials needs HowTo schema, an agency homepage profiling a single founder needs Person schema.
For each schema type below, observe whether the site's content actually warrants it, and record a relevance level: high (clearly applicable, the content pattern is prominent), medium (some applicable content but secondary to the dominant type), low (one-off mention or weak pattern), or absent (no content the schema would describe).
// Run on the homepage. Most relevance scans are pattern-based; ambiguous cases
// (e.g. "is this primarily editorial or primarily conference site") use
// content judgment in Phase 3 — record an initial relevance here and revisit.
const text = document.body.innerText;
const lower = text.toLowerCase();
// Person — bylined posts, named single-person bio, homepage profile pattern
const hasNamedHeroBio = !!document.querySelector('[class*="hero"], [class*="bio"], [class*="about"]') &&
/\bi'?m\s+[A-Z][a-z]+|\bmy\s+name\s+is\s+[A-Z][a-z]+/.test(text);
const hasBylines = /\bby\s+[A-Z][a-z]+\s+[A-Z][a-z]+/.test(text);
const personRelevance = hasNamedHeroBio ? 'high' : hasBylines ? 'medium' : 'absent';
// Article / BlogPosting — listing of 3+ dated posts, article-like content
const hasBlogListing = Array.from(document.querySelectorAll('a[href]'))
.filter(a => /\/(blog|news|articles|posts)\//.test(a.href)).length >= 3;
const hasArticleElement = !!document.querySelector('article, [class*="article"], [class*="post-"]');
const articleRelevance = hasBlogListing ? 'high' : hasArticleElement ? 'medium' : 'absent';
// Organization — always relevant; the question is whether it's populated correctly
const organizationRelevance = 'high';
// LocalBusiness — visible address, hours, booking copy
const hasAddress = !!document.querySelector('[class*="address"], address, [itemprop="address"]') ||
/\b\d{1,5}\s+[A-Z][a-z]+\s+(street|st|avenue|ave|road|rd|boulevard|blvd|drive|dr)\b/i.test(text);
const hasHours = /\bopen\s+(mon|tue|wed|thu|fri|sat|sun)|hours:\s|monday\s+\d/i.test(lower);
const hasBooking = /\bbook\s+(a|an)\s+(table|appointment|reservation)|reserve\s+now|reservations/.test(lower);
const localRelevance = (hasAddress && hasHours) ? 'high' : (hasAddress || hasBooking) ? 'medium' : 'absent';
// Event — dated event listings with venue/registration
const hasEventDatePattern = /\b(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]*\s+\d{1,2},?\s+20\d{2}\s*(at|·|—|-)\s*\d/i.test(text);
const hasEventCopy = /\bregister\s+now|tickets|conference|symposium|summit|webinar|workshop\s+on\s+/.test(lower);
const hasMultipleEvents = (text.match(/\bregister|tickets|join\s+us\s+on/gi) || []).length >= 2;
const eventRelevance = (hasEventDatePattern && hasMultipleEvents) ? 'high' : hasEventCopy ? 'medium' : 'absent';
// FAQ — Q&A patterns (also fed by existing faqSectionPresent in 1.4)
const hasFaqMarkers = !!document.querySelector('[class*="faq"], [id*="faq"]') ||
Array.from(document.querySelectorAll('h2, h3')).filter(h => h.innerText.trim().endsWith('?')).length >= 2 ||
document.querySelectorAll('details').length >= 2;
const faqRelevance = hasFaqMarkers ? 'high' : 'absent';
// HowTo — numbered step sequence with a stated goal
const hasOrderedSteps = !!document.querySelector('ol li + li + li') ||
/\bstep\s+1\b[\s\S]{0,500}\bstep\s+2\b/i.test(text);
const hasHowToFraming = /\bhow\s+to\s+[a-z]/i.test(text) || /\btutorial|guide:|step-by-step/i.test(lower);
const howToRelevance = (hasOrderedSteps && hasHowToFraming) ? 'high' : hasOrderedSteps ? 'medium' : 'absent';
// Product — cart/checkout markers, product cards
const hasShop = !!document.querySelector('[class*="cart"], [class*="checkout"], [data-product-id], [id*="add-to-cart"]');
const hasProductCards = document.querySelectorAll('[class*="product"]').length >= 3;
const productRelevance = (hasShop && hasProductCards) ? 'high' : hasShop ? 'medium' : 'absent';
// Course — curriculum / syllabus / enrollment
const hasCourseLinks = Array.from(document.querySelectorAll('a[href]'))
.some(a => /\/course|\/curriculum|\/enroll|\/program|\/syllabus/.test(a.href));
const hasCourseCopy = /\btuition|enroll(ment)?|syllabus|curriculum|prerequisites|cohort/i.test(text);
const courseRelevance = (hasCourseLinks && hasCourseCopy) ? 'high' : hasCourseCopy ? 'medium' : 'absent';
// Recipe — ingredients + instructions pattern
const hasIngredients = /\bingredients[:\s]/i.test(text) && /\b(cup|tablespoon|teaspoon|gram|ounce|oz|tbsp|tsp)s?\b/i.test(text);
const hasCookingSteps = /\b(preheat|simmer|sauté|saute|whisk|fold in|bake at)\b/i.test(text);
const recipeRelevance = (hasIngredients && hasCookingSteps) ? 'high' : hasIngredients ? 'medium' : 'absent';
// Review / AggregateRating — review content visible
const hasReviewCopy = /\b\d+(\.\d+)?\s*(out of|\/)\s*5\b|\bcustomer reviews|\b\d+\s+reviews\b/i.test(text);
const reviewRelevance = hasReviewCopy ? 'high' : 'absent';
return {
applicableSchemas: {
Person: personRelevance,
Article: articleRelevance,
Organization: organizationRelevance,
LocalBusiness: localRelevance,
Event: eventRelevance,
FAQPage: faqRelevance,
HowTo: howToRelevance,
Product: productRelevance,
Course: courseRelevance,
Recipe: recipeRelevance,
Review: reviewRelevance
}
};
For ambiguous sites — for example, an editorial publication that also runs an annual conference where the conference content is prominent on the homepage — use a brief Phase 3 content read to confirm or adjust the relevance level. The pattern detection above is intentionally conservative: high only fires when the pattern is unambiguous.
Persist the result as technicalNotes.applicableSchemas. It drives two signals downstream:
primaryEntitySchema(Section 1.3) — the dominant schema type for this site is the one with the most prominent relevance signal. Evaluate against its presence and completeness.relevantSchemasApplied(Section 1.3) — coverage ratio across every schema with relevancehighormedium.
Note:
Organizationis alwayshigh, but it's evaluated under its own standalone signal (organizationSchema) — not double-counted inprimaryEntitySchemaorrelevantSchemasApplied. Similarly,FAQPageandReviewhave their own standalone signals and aren't double-counted.
SECTION 1: Initial Setup & Homepage Programmatic Analysis
1.1 Browser Setup
- The browser is already at the homepage from Phase 0
- Confirm desktop viewport (1920×1080) via
browser_resizeif needed browser_take_screenshotsaved toreports/screenshots/homepage-aeo-desktop.png(full page)
1.2 Technical Health
robots.txt and AI crawler access — training-vs-retrieval aware
browser_navigate to [baseURL]/robots.txt. Capture the body text. Then browser_navigate back to the homepage.
The 2026 consensus posture is block training scrapers, allow live-retrieval bots. Evaluate against the live-retrieval allowlist; treat training blocks as neutral (intentional, not a failure).
Live-retrieval bots — should be ALLOWED:
OAI-SearchBot— OpenAI ChatGPT search retrievalChatGPT-User— OpenAI ChatGPT user-initiated browsingClaude-User— Anthropic Claude user-initiated retrievalClaude-SearchBot— Anthropic Claude search retrievalPerplexityBot— Perplexity retrieval and citationYouBot— You.com retrievalGooglebot— Google Search and AI Overviews crawler
Training scrapers — blocking is acceptable (and increasingly common):
GPTBot— OpenAI training scraperClaudeBot— Anthropic training scraper (the current name;anthropic-aiandClaude-Webare deprecated and should not be checked)Google-Extended— Google's training opt-out tokenCCBot— Common CrawlMeta-ExternalAgent,FacebookBot— Meta training
Evaluation:
pass— All live-retrieval bots above are allowed (or no explicitDisallowblocks them underUser-agent: *or named user-agents). Training scrapers may be allowed or blocked — no penalty either way.partial— One live-retrieval bot is blocked (e.g. a stale rule blocksPerplexityBot) but the rest are fine.fail— Multiple live-retrieval bots blocked, OR a blanketDisallow: /underUser-agent: *blocks everything, OR robots.txt is absent, OR present but severely misconfigured (broken syntax, redirects).
Do NOT penalize blocking
GPTBot,ClaudeBot,Google-Extended,CCBot. These are training scrapers, not retrieval bots, and blocking them does not affect citation visibility. If the site blocks training but allows retrieval, record this innotesas"Training scrapers blocked, retrieval bots allowed — modern 2026 posture."Environment note: on
local/development, treat any robots.txt config as neutral — recordpasswith a note about the environment. Onstaging, a blanket Disallow is expected and not a finding — recordpass.
No noindex on homepage
const robotsMeta = document.querySelector('meta[name="robots"]');
const content = robotsMeta ? robotsMeta.getAttribute('content').toLowerCase() : '';
return {
metaRobots: content || null,
hasNoindex: content.includes('noindex'),
hasNofollow: content.includes('nofollow')
};
Evaluation:
pass— No noindex on the homepage.fail— A noindex meta tag is present on the homepage.
Environment note: on
localordevelopment, noindex is correct and not a finding. Record status aspasswith a note about the environment intechnicalNotes. Onstaging, noindex is expected — recordpassbut flag as a reminder if the staging URL was provided for AEO review.
HTTPS and mixed content
const isHTTPS = window.location.protocol === 'https:';
const mixedContent = Array.from(document.querySelectorAll('img[src^="http:"], script[src^="http:"], link[href^="http:"]'));
return {
https: isHTTPS,
mixedContentCount: mixedContent.length,
examples: mixedContent.slice(0,3).map(el => el.src || el.href)
};
Evaluation:
pass— HTTPS active and no mixed content.partial— HTTPS but mixed content (HTTP-served images, scripts, or stylesheets) present.fail— HTTP only.
Sitemap with valid lastmod dates
browser_navigate to [baseURL]/sitemap.xml. If 404, check robots.txt for a Sitemap: directive and navigate there. (Yoast and RankMath typically emit /sitemap_index.xml; follow the first child sitemap.) Capture the first ~3000 characters.
Parse for valid XML, presence of <lastmod> entries, and whether any are within the last 90 days.
Evaluation:
pass— Sitemap present with recent lastmod entries (within 90 days).partial— Sitemap present but without<lastmod>entries, OR all dates stale (over 1 year old).fail— Sitemap absent.
Core content accessible without JavaScript
Open a new browser context with JavaScript disabled, navigate to the homepage, and check whether the H1 and substantial body copy are visible in the static HTML.
const h1 = document.querySelector('h1');
return {
h1Present: !!h1,
h1Text: h1 ? h1.innerText.trim() : null,
bodyTextLength: document.body.innerText.trim().length
};
Save a screenshot to reports/screenshots/homepage-no-js.png. Close the no-JS context and resume the normal session.
Evaluation:
pass— H1 and 500+ chars of body copy visible without JS.partial— Some content visible but significantly degraded.fail— Page is blank or nearly empty without JS.
WordPress note: standard themes render server-side and pass this check easily. Headless WordPress with a Next.js frontend may fail; record the framework in
technicalNotes.cmsDetected.
Canonical URLs
AI engines explicitly use <link rel="canonical"> to dedupe pages and pick the source-of-truth URL when synthesizing answers. Missing or wrong canonical tags cause the AI to weight signals across duplicate URLs, dropping citation weight on the page you actually want cited.
Check the homepage, then sample at least 2 inner pages from Phase 2:
const canonical = document.querySelector('link[rel="canonical"]');
const canonicalHref = canonical ? canonical.getAttribute('href') : null;
return {
present: !!canonical,
href: canonicalHref,
selfReferential: canonicalHref && new URL(canonicalHref).pathname === window.location.pathname
};
Evaluation:
pass— Homepage has<link rel="canonical">AND at least 2 inner pages have self-referential canonical tags (the canonical points to the page itself, not somewhere else).partial— Homepage canonical present but inner pages missing canonical, OR canonical present everywhere but points to the wrong URL on some pages (common Yoast/RankMath misconfiguration).fail— No canonical tag on the homepage.
WordPress note: Yoast and RankMath emit self-referential canonicals automatically — most WordPress sites pass this signal. SPAs and headless implementations frequently miss it; flag in
effortRationale.
1.3 Structured Data
JSON-LD inventory
const scripts = Array.from(document.querySelectorAll('script[type="application/ld+json"]'));
const schemas = scripts.map(s => {
try {
const parsed = JSON.parse(s.innerText);
return { type: parsed['@type'] || (parsed['@graph'] ? '@graph' : 'unknown'), keys: Object.keys(parsed) };
} catch(e) {
return { type: 'invalid', error: e.message };
}
});
return { count: scripts.length, schemas };
Record which schema types are present. Invalid (unparseable) JSON-LD counts as absent.
Organization schema
Check JSON-LD for @type: "Organization" or @type: "LocalBusiness". If present, verify: name, url, logo, description, sameAs. Also check microdata:
const microdataOrg = document.querySelector('[itemscope][itemtype*="schema.org/Organization"], [itemscope][itemtype*="schema.org/LocalBusiness"]');
return { microdataOrg: !!microdataOrg };
Evaluation:
pass— JSON-LD Organization present with name, url, logo, and at least one sameAs.partial— JSON-LD present but sparse (key fields missing), OR microdata-only (no JSON-LD).fail— Absent.
Primary entity schema — content-driven
The "primary entity" for a site is the dominant content type identified in Section 0.4's relevance scan — not a lookup by siteType. A site's primary entity is whichever schema has the strongest relevance signal among the type-defining schemas (Person, Article, Product, Event, HowTo, LocalBusiness, Course, Recipe, SoftwareApplication). Organization, FAQPage, and Review are evaluated separately and don't compete for the "primary" slot.
If multiple schemas tie at high relevance, prefer the one matching the most-prominent homepage content area. Record the determination in notes.
Per-schema expected field set (what counts as a "well-formed" primary entity schema):
| Schema | Required-for-full-credit fields |
|---|---|
Service |
name, description, provider (linked to Organization) |
Product |
name, offers, brand or manufacturer |
Article / BlogPosting / NewsArticle |
headline, author, datePublished, image |
Person |
name, jobTitle or description, sameAs (≥1 external profile) |
Event |
name, startDate, location, eventStatus |
HowTo |
name, step (≥3 HowToStep entries), totalTime or tool/supply |
LocalBusiness (or subtype) |
name, address, openingHoursSpecification, telephone |
Course |
name, description, provider, hasCourseInstance or offers |
Recipe |
name, recipeIngredient, recipeInstructions, cookTime or prepTime |
SoftwareApplication |
name, applicationCategory, offers or operatingSystem |
Programmatic check:
// Use the relevance map from Section 0.4 to pick the expected primary type.
// PRIMARY_CANDIDATES is the type-defining schema list (excludes Organization/FAQPage/Review).
const PRIMARY_CANDIDATES = ['Person','Article','BlogPosting','NewsArticle','Product','Event','HowTo','LocalBusiness','Course','Recipe','SoftwareApplication'];
const relevance = /* applicableSchemas from 0.4 */;
const primaryCandidates = Object.entries(relevance)
.filter(([k,v]) => v === 'high' && PRIMARY_CANDIDATES.includes(k))
.sort();
const expectedPrimary = primaryCandidates[0] ? primaryCandidates[0][0] : null;
const jsonLd = Array.from(document.querySelectorAll('script[type="application/ld+json"]'))
.map(s => { try { return JSON.parse(s.innerText); } catch(e) { return null; }})
.filter(Boolean)
.flatMap(s => Array.isArray(s) ? s : (s['@graph'] || [s]));
const jsonLdMatch = expectedPrimary ? jsonLd.find(s => s['@type'] === expectedPrimary || (Array.isArray(s['@type']) && s['@type'].includes(expectedPrimary))) : null;
const microdataMatch = expectedPrimary ? document.querySelector(`[itemscope][itemtype*="schema.org/${expectedPrimary}"]`) : null;
const rdfaMatch = expectedPrimary ? document.querySelector(`[typeof~="${expectedPrimary}"], [typeof~="schema:${expectedPrimary}"]`) : null;
return {
expectedPrimary,
jsonLdMatch: jsonLdMatch ? Object.keys(jsonLdMatch) : null,
hasMicrodata: !!microdataMatch,
hasRdfa: !!rdfaMatch
};
Evaluation:
pass— JSON-LD primary entity schema present, type matchesexpectedPrimaryfrom the relevance scan, AND the required fields for that type are populated.partial— JSON-LD present with the correct type but sparse (key required fields missing), OR microdata / RDFa-only (with or without all required fields).fail— No schema matches the expected primary type for this site's content.
Edge case — no high-relevance primary candidate detected: if Section 0.4 finds no primary schema with
highrelevance (e.g. a pure portfolio site with no Articles, Events, Products, or HowTos), Organization schema becomes the de facto primary entity for evaluation purposes. Recordpassif Organization is well-formed; otherwise evaluate against the highest-relevance candidate even if it'smedium. Record the determination innotes.Edge case — ecommerce: Product schema typically lives on PDPs, not the homepage. If
Productis the expected primary type, sample at least one product page during Phase 2 and evaluate the combined evidence — homepageStore/OfferCatalogplus PDP Product schema together can satisfypass.
Relevant schemas applied — relevantSchemasApplied
The coverage signal. Of every schema type with relevance high or medium from Section 0.4, what percent are backed by actual schema markup (in any format — JSON-LD, microdata, or RDFa)?
This is the signal that catches the nonprofit-with-events-but-no-Event-schema pattern, the media-blog-with-tutorials-but-no-HowTo-schema pattern, and the agency-with-client-reviews-but-no-Review-schema pattern.
// Build the schema presence map across all three formats
const allLd = Array.from(document.querySelectorAll('script[type="application/ld+json"]'))
.map(s => { try { return JSON.parse(s.innerText); } catch(e) { return null; }})
.filter(Boolean)
.flatMap(s => Array.isArray(s) ? s : (s['@graph'] || [s]));
const jsonLdTypes = new Set(allLd.map(s => s['@type']).filter(Boolean).flat());
const microdataTypes = new Set(
Array.from(document.querySelectorAll('[itemscope][itemtype]'))
.map(el => (el.getAttribute('itemtype').match(/schema\.org\/(\w+)/) || [])[1])
.filter(Boolean)
);
const rdfaTypes = new Set(
Array.from(document.querySelectorAll('[typeof]'))
.flatMap(el => (el.getAttribute('typeof') || '').split(/\s+/))
.map(t => t.replace(/^schema:/, ''))
.filter(Boolean)
);
const allPresentTypes = new Set([...jsonLdTypes, ...microdataTypes, ...rdfaTypes]);
// Compare against the relevance map from 0.4
const relevance = /* applicableSchemas from 0.4 */;
const inScope = Object.entries(relevance).filter(([k, v]) => v === 'high' || v === 'medium');
const matched = inScope.filter(([k, v]) => allPresentTypes.has(k) ||
// BlogPosting and NewsArticle satisfy Article relevance; Article satisfies BlogPosting relevance
(k === 'Article' && (allPresentTypes.has('BlogPosting') || allPresentTypes.has('NewsArticle'))) ||
// Specific LocalBusiness subtypes satisfy LocalBusiness relevance
(k === 'LocalBusiness' && [...allPresentTypes].some(t => /Restaurant|Dentist|Plumber|MedicalBusiness|Store|HomeAndConstructionBusiness/.test(t)))
);
const coverage = inScope.length === 0 ? 1 : matched.length / inScope.length;
return { inScopeCount: inScope.length, matchedCount: matched.length, coveragePercent: Math.round(coverage * 100), gaps: inScope.filter(x => !matched.includes(x)).map(([k]) => k) };
Evaluation:
pass— Coverage ≥ 90% (every high/medium-relevance schema is backed by matching markup; or only one minor gap on amedium-relevance schema).partial— Coverage 30–89% (some relevant schemas present but significant gaps remain; the dominant primary schema may be present but secondary schemas are missing).fail— Coverage < 30%, OR no schemas of any kind present.
If
Section 0.4found no schemas withhighormediumrelevance (rare — Organization is always at leasthigh, so this should never be empty), recordnaand note innotesas"N/A — no content patterns matched any schema-eligible type beyond Organization."Record the specific gap list in
notes— e.g."Coverage 60% — gaps: Event (annual conference visible on homepage), HowTo (3 tutorial pages observed)."Each gap also produces an entry inactionablePromptswith a paste-ready Claude prompt to generate the missing schema.
FAQ schema
JSON-LD @type: "FAQPage" with mainEntity containing Q&A pairs. Microdata check:
const microdataFaq = document.querySelector('[itemscope][itemtype*="schema.org/FAQPage"], [itemscope][itemtype*="schema.org/Question"]');
return { microdataFaq: !!microdataFaq };
Evaluation:
pass— JSON-LD FAQPage with 2+ valid Q&A pairs.partial— Malformed JSON-LD, OR microdata-only FAQ markup.fail— Absent.
JSON-LD format used — jsonLdFormat
The signal name reflects what AI engines prefer, but the check enumerates all three structured-data formats the rubric supports: JSON-LD, microdata, and RDFa.
const jsonLd = document.querySelectorAll('script[type="application/ld+json"]').length;
const microdata = document.querySelectorAll('[itemscope][itemtype]').length;
// RDFa: count elements with `typeof` inside a `vocab="http://schema.org/"` or compatible prefix scope
const rdfa = Array.from(document.querySelectorAll('[typeof]')).filter(el => {
const vocab = el.closest('[vocab]');
const prefix = el.closest('[prefix]');
return (vocab && /schema\.org/.test(vocab.getAttribute('vocab'))) ||
(prefix && /schema(:|=)\s*http:\/\/schema\.org/.test(prefix.getAttribute('prefix')));
}).length;
return { jsonLd, microdata, rdfa };
Evaluation:
pass— JSON-LD present (any blocks). Microdata and/or RDFa may also be present — no penalty for mixed formats.partial— No JSON-LD, but microdata OR RDFa present. The finding text must say so explicitly:- Microdata-only:
"Zero JSON-LD blocks; [N] microdata itemscope elements present — JSON-LD is preferred for reliable AI parsing." - RDFa-only:
"Zero JSON-LD blocks; [N] RDFa typed elements present — JSON-LD is preferred for reliable AI parsing; RDFa adoption is low and AI parsing support is uneven." - Both microdata and RDFa but no JSON-LD: list both counts and call out JSON-LD as the recommendation.
- Microdata-only:
fail— None of JSON-LD, microdata, or RDFa present.
Always record all three counts in notes and in technicalNotes.schemaFormats ({ jsonLd, microdata, rdfa }) so downstream consumers can see what was found.
Open Graph tags complete — openGraphTags
const ogTags = ['og:title', 'og:description', 'og:image', 'og:url', 'og:type'];
const result = {};
ogTags.forEach(tag => {
const el = document.querySelector('meta[property="' + tag + '"]');
result[tag] = el ? el.getAttribute('content') : null;
});
return result;
Evaluation:
pass— og:title, og:description, and og:image all present.partial— Some OG tags present but not the full og:title + og:description + og:image trio.fail— No Open Graph tags present.
Review or AggregateRating schema
JSON-LD @type: "Review" or @type: "AggregateRating". Microdata check:
const microdataReview = document.querySelector('[itemscope][itemtype*="schema.org/Review"], [itemscope][itemtype*="schema.org/AggregateRating"]');
return { microdataReview: !!microdataReview };
Evaluation:
pass— Review or AggregateRating schema present in either JSON-LD or microdata.fail— Absent.
1.4 AEO Readiness — Programmatic checks (4 of 8 signals here)
faqSectionPresent, faqSchemaApplied, questionFramedHeadings, and titleAndMetaQuestionMatch are programmatic and evaluated below. directAnswers, whoWhatWho, featuredSnippetStructure, and answerCapsules are content checks evaluated in Phase 3.
FAQ section present
const faqIndicators = [
...document.querySelectorAll('[class*="faq"], [id*="faq"], [class*="FAQ"], [id*="FAQ"]'),
...Array.from(document.querySelectorAll('h2, h3')).filter(h => {
const t = h.innerText.toLowerCase();
return t.includes('frequently asked') || t.includes('common questions');
})
];
const dlPairs = document.querySelectorAll('dl');
const detailsElements = document.querySelectorAll('details');
return {
faqSectionFound: faqIndicators.length > 0,
dlPairs: dlPairs.length,
detailsElements: detailsElements.length
};
If no FAQ on the homepage, check /faq and /faqs as inner pages (also visit during Phase 2).
Evaluation:
pass— FAQ section found (homepage or dedicated FAQ page) with 2+ Q&A pairs.partial— Accordion or FAQ pattern present but only 1 item, or very thin.fail— Absent.
FAQ schema applied to visible FAQ content — faqSchemaApplied
Cross-reference: if a FAQ section was found AND FAQ JSON-LD schema was found in 1.3.
Evaluation:
pass— Both visible FAQ content and FAQ schema present.partial— FAQ schema present but no visible FAQ content, OR visible FAQ content present but no FAQ schema.fail— Neither visible FAQ content nor FAQ schema present.
This signal exists alongside faqSchema and faqSectionPresent to verify that the schema and the visible content are applied together — the alignment, not the presence of either alone.
Question-framed headings
const headings = Array.from(document.querySelectorAll('h2, h3'));
const questionHeadings = headings.filter(h => h.innerText.trim().endsWith('?'));
return {
total: headings.length,
questionFramed: questionHeadings.length,
examples: questionHeadings.slice(0,3).map(h => h.innerText.trim())
};
Evaluation:
pass— 2+ H2/H3 phrased as questions.partial— Exactly one H2/H3 phrased as a question.fail— No question-framed H2/H3 headings.
Title and meta description question-match — titleAndMetaQuestionMatch
For pages targeting a specific question, AI engines weight the page's <title> and <meta name="description"> as signals about what question the page is answering. The post explicitly recommends: "For pages targeting specific questions (e.g., a services page answering 'what does X company do?'), it helps to include the question or a close variant in the title tag or meta description."
Check the homepage AND every inner page visited in Phase 2.
const title = document.querySelector('title') ? document.querySelector('title').innerText.trim() : '';
const metaDescEl = document.querySelector('meta[name="description"]');
const metaDesc = metaDescEl ? metaDescEl.getAttribute('content').trim() : '';
const h1 = document.querySelector('h1') ? document.querySelector('h1').innerText.trim() : '';
// A page is "question-targeting" if its H1 ends with "?", or contains "how", "what",
// "why", "when", "where", "who" as the first word, or matches the dominant
// question pattern in the H2/H3 set
const h1IsQuestion = /\?$/.test(h1) || /^(how|what|why|when|where|who|can|does|is|are|should|do)\s+/i.test(h1);
// Or the page is question-targeting by purpose: services pages, FAQ pages,
// product detail pages, donation pages, etc.
const pathSignalsQuestion = /\/(faq|services?|pricing|how-it-works|about|donate|subscribe|join|tutorials?|guides?)/.test(window.location.pathname);
const isQuestionTargeting = h1IsQuestion || pathSignalsQuestion;
// Signal extraction — does title or meta description contain question phrasing
// or align with the H1's question/topic?
const titleHasQuestion = /\?/.test(title) || /^(how|what|why|when|where|who|can|does|is|are|should|do)\s+/i.test(title);
const metaHasQuestion = /\?/.test(metaDesc) || /^(how|what|why|when|where|who|can|does|is|are|should|do)\s+/i.test(metaDesc);
// Token overlap between title/meta and H1 (a rough proxy for alignment when the
// page isn't framed as a literal question — e.g. a services page titled
// "Web Development Services for B2B SaaS" aligning with H1 "What we build")
function tokens(s) {
return new Set((s || '').toLowerCase().match(/[a-z][a-z]+/g) || []);
}
const h1Tokens = tokens(h1);
const titleTokens = tokens(title);
const metaTokens = tokens(metaDesc);
const titleOverlap = h1Tokens.size > 0 ? [...h1Tokens].filter(t => titleTokens.has(t)).length / h1Tokens.size : 0;
const metaOverlap = h1Tokens.size > 0 ? [...h1Tokens].filter(t => metaTokens.has(t)).length / h1Tokens.size : 0;
return {
url: window.location.href,
h1, title, metaDesc,
isQuestionTargeting,
titleHasQuestion, metaHasQuestion,
titleOverlap: Math.round(titleOverlap * 100) / 100,
metaOverlap: Math.round(metaOverlap * 100) / 100
};
Evaluation:
pass— Question-targeting pages on this site have a<title>OR<meta name="description">that either (a) contains question phrasing, or (b) reaches ≥50% token overlap with the page's H1. Evaluate across the homepage plus all inner pages visited; at least 50% of question-targeting pages must clear this bar.partial— Title and meta exist on all sampled pages but show low alignment with H1 / question framing (between 20% and 50% of question-targeting pages clear the bar). OR the homepage is fine but inner pages are not.fail— Title and meta are present-but-generic on every sampled page (e.g.<title>is the site name only; meta description is boilerplate from theme defaults), with no question framing or H1 alignment anywhere.
N/A exemption: if zero pages visited are question-targeting (a pure-portfolio site with one-word page titles like "Work", "Studio", "Contact"), record
naand note innotesas"N/A — no question-targeting pages observed."Record the per-page results in
notesso the report can show which pages passed and which failed. The fix is a content edit (one-field change in Yoast/RankMath for most WordPress sites) —effortis almost alwayslow.
directAnswers, whoWhatWho, featuredSnippetStructure, and answerCapsules are evaluated in Phase 3 (content).
1.5 E-E-A-T Signals — Programmatic checks (3 of 8 signals here)
The remaining 5 are evaluated in Phase 3 (content). aboutTeamPageLinked as a standalone signal was retired — the "is there an About/Team/Authors page" check is now embedded in namedTeamMembers (you must visit it to verify named individuals).
Named individuals with roles — namedTeamMembers
The check is "are there real named human beings with identifiable roles tied to this site?" The evidence varies by site type:
| Site type | What counts as a named individual |
|---|---|
agency, saas, localBusiness, nonprofit |
Founders, leadership, employees, key staff with names and titles |
mediaBlog |
Bylined authors, editorial leadership (editor-in-chief, contributing editors) |
ecommerce |
Founders, designers, makers, key team members — common on brand-led shops, often absent on dropshippers |
education |
Faculty, instructors, course leads with credentials |
community |
Moderators, leadership, named maintainers |
const teamSections = Array.from(document.querySelectorAll('[class*="team"], [id*="team"], [class*="about"], [id*="about"], [class*="author"], [class*="staff"], [class*="faculty"], [class*="contributor"]'));
// Person in JSON-LD
const personJsonLd = Array.from(document.querySelectorAll('script[type="application/ld+json"]'))
.map(s => { try { return JSON.parse(s.innerText); } catch(e) { return null; }})
.filter(Boolean)
.flatMap(s => Array.isArray(s) ? s : (s['@graph'] || [s]))
.filter(s => s['@type'] === 'Person');
// Person in microdata
const personMicrodata = Array.from(document.querySelectorAll('[itemscope][itemtype*="schema.org/Person"]'));
// Person in RDFa
const personRdfa = Array.from(document.querySelectorAll('[typeof~="Person"], [typeof~="schema:Person"]'));
const aboutLink = Array.from(document.querySelectorAll('a[href]')).find(a => {
const text = a.innerText.toLowerCase();
const href = a.href.toLowerCase();
return /\b(about|team|authors|staff|faculty|masthead|people)\b/.test(text) || /\/(about|team|authors|staff|faculty|masthead|people)\//.test(href);
});
return {
teamSectionCount: teamSections.length,
personSchemaCount: personJsonLd.length,
personMicrodataCount: personMicrodata.length,
personRdfaCount: personRdfa.length,
aboutLinkHref: aboutLink ? aboutLink.href : null
};
Person can appear in JSON-LD, microdata, or RDFa — all three count toward the "named team" signal. JSON-LD is preferred for AI parsing reliability; if Person is microdata-only or RDFa-only, note that in the finding ("Person markup present in microdata only — JSON-LD is preferred for AI parsing.").
If an About/Team/Authors page is linked, visit it in Phase 2 to confirm named individuals are present.
Evaluation:
pass— Named individuals with type-appropriate roles visible (homepage or About/Team/Authors page).partial— Site is referenced through anonymous language only ("our team", "our editors", "the staff") — no names anywhere accessible from the homepage.fail— No team / author / staff signals at all.
Credentialing badges — certificationBadges
The check is "are there visible third-party credentials backing this site?" The evidence varies by site type:
| Site type | What counts as a credentialing badge |
|---|---|
agency, saas |
Partner badges (Google Partner, WooCommerce Expert, AWS Partner), certifications, ISO badges |
mediaBlog |
Press affiliations, fact-checking certifications (IFCN), journalism awards, professional press credentials |
ecommerce, localBusiness |
BBB rating, Trustpilot widget, payment-processor trust marks, industry trade group memberships |
education |
Accreditation badges (regional accreditor, programmatic accreditation), recognized certifications |
nonprofit |
Charity Navigator, GuideStar Seal, BBB Wise Giving, 501(c)(3) verification badges |
community |
Verified-organization marks from the platform host, hosting-provider trust marks |
const badgeIndicators = Array.from(document.querySelectorAll('[class*="cert"], [class*="badge"], [class*="partner"], [class*="award"], [class*="accredit"], [class*="trust"], [class*="verified"], [class*="rated"], [alt*="award"], [alt*="certified"], [alt*="accredited"], [alt*="rated"]'));
const trustWidgets = !!document.querySelector('iframe[src*="trustpilot"], iframe[src*="bbb.org"], [class*="trustpilot"], [class*="bbb"], [class*="guidestar"], [class*="charitynavigator"]');
return { count: badgeIndicators.length, hasTrustWidget: trustWidgets };
DOM presence alone is not sufficient — confirm visually via the homepage screenshot that the badges are legible (the failure mode is small-text or visually-unrecognizable images that match the selectors but aren't real credentials).
Evaluation:
pass— Credentialing badges present AND visually confirmed as legible/recognizable, matched to the site type.partial— DOM indicators found but visual confirmation unclear, or only weak credentials (e.g. SSL-vendor badges).fail— Absent.
Tenure indicators — tenureIndicators
The check is "does the site signal how long it has been operating?" — a recognized AI-trust signal across all types. The phrasing varies:
| Site type | Common tenure phrasings |
|---|---|
agency, saas, ecommerce |
"Founded in YYYY", "established YYYY", "since YYYY", "N years in business" |
mediaBlog |
"Publishing since YYYY", "covering [topic] since YYYY", "founded YYYY" |
education |
"Founded YYYY", "serving students since YYYY" |
nonprofit |
"Founded YYYY", "serving the community for N years" |
community |
"Active since YYYY", "the longest-running [topic] community" |
const bodyText = document.body.innerText;
const yearPatterns = [
/founded\s+(?:in\s+)?(\d{4})/i,
/established\s+(?:in\s+)?(\d{4})/i,
/since\s+(\d{4})/i,
/(\d+)\s+years?\s+(?:of\s+)?(?:experience|in\s+business|publishing|operation|serving|active)/i,
/publishing\s+since\s+(\d{4})/i,
/covering\s+\w+\s+since\s+(\d{4})/i,
/serving\s+(?:the\s+)?\w+(?:\s+\w+)?\s+since\s+(\d{4})/i
];
const matches = yearPatterns.map(p => bodyText.match(p)).filter(Boolean);
return { found: matches.length > 0, matches: matches.map(m => m[0]).slice(0,2) };
Evaluation:
pass— Any tenure indicator is found.fail— Absent.
Author bylines linked to Person schema — authorBylines
This signal applies primarily to mediaBlog, education, nonprofit, and any other site that publishes articles or posts. For sites with no editorial content (e.g. pure ecommerce, single-page SaaS), record na by default and note in notes as "N/A — site has no editorial articles to byline."
Visit at least one editorial / blog / news / post page during Phase 2 and check:
const articleSelector = 'article, [class*="article"], [class*="post-"]';
const article = document.querySelector(articleSelector);
const visibleByline = article && article.innerText.match(/\bby\s+([A-Z][a-z]+\s+[A-Z][a-z]+)/);
const authorLink = article && article.querySelector('a[href*="/author/"], a[rel="author"]');
// Article in JSON-LD
const articleJsonLd = Array.from(document.querySelectorAll('script[type="application/ld+json"]'))
.map(s => { try { return JSON.parse(s.innerText); } catch(e) { return null; }})
.filter(Boolean)
.flatMap(s => Array.isArray(s) ? s : (s['@graph'] || [s]))
.filter(s => ['Article','BlogPosting','NewsArticle'].includes(s['@type']));
const authorOnJsonLd = articleJsonLd.some(s => s.author && (typeof s.author === 'object' || Array.isArray(s.author)));
// Article in microdata
const articleMicrodata = Array.from(document.querySelectorAll(
'[itemscope][itemtype*="schema.org/Article"], [itemscope][itemtype*="schema.org/BlogPosting"], [itemscope][itemtype*="schema.org/NewsArticle"]'
));
const authorOnMicrodata = articleMicrodata.some(el => el.querySelector('[itemprop="author"]'));
// Article in RDFa
const articleRdfa = Array.from(document.querySelectorAll(
'[typeof~="Article"], [typeof~="BlogPosting"], [typeof~="NewsArticle"], [typeof~="schema:Article"], [typeof~="schema:BlogPosting"], [typeof~="schema:NewsArticle"]'
));
const authorOnRdfa = articleRdfa.some(el => el.querySelector('[property="author"], [property="schema:author"]'));
return {
hasVisibleByline: !!visibleByline,
authorName: visibleByline ? visibleByline[1] : null,
hasAuthorLink: !!authorLink,
authorLinkHref: authorLink ? authorLink.href : null,
hasArticleJsonLd: articleJsonLd.length > 0,
hasArticleMicrodata: articleMicrodata.length > 0,
hasArticleRdfa: articleRdfa.length > 0,
hasAuthorOnSchema: authorOnJsonLd || authorOnMicrodata || authorOnRdfa
};
Evaluation:
pass— Visible byline AND author name linked to an author page AND article schema (any format) has a populatedauthorproperty.partial— Visible byline AND linked author page, but noauthoron any schema format, ORauthorpresent in microdata/RDFa only (JSON-LD preferred for AI parsing).fail— Anonymous publication: no byline, or byline with no link, or no Article/BlogPosting schema in any format.
1.6 Content Freshness — Programmatic checks (5 of 7 signals here)
Copyright year
const footerText = document.querySelector('footer') ? document.querySelector('footer').innerText : document.body.innerText;
const yearMatch = footerText.match(/[©℗]\s*(\d{4})/);
const currentYear = new Date().getFullYear();
return {
found: !!yearMatch,
year: yearMatch ? parseInt(yearMatch[1]) : null,
current: yearMatch ? parseInt(yearMatch[1]) === currentYear : false
};
Evaluation:
pass— Copyright year matches current year.partial— Copyright year is one year behind.fail— Two or more years stale, or absent entirely.
Blog or news section detection — visit blog page in Phase 2
const blogIndicators = Array.from(document.querySelectorAll('[class*="blog"], [class*="news"], [class*="post"], [class*="article"], [id*="blog"], [id*="news"]'));
const datePattern = /\b(?:jan(?:uary)?|feb(?:ruary)?|mar(?:ch)?|apr(?:il)?|may|jun(?:e)?|jul(?:y)?|aug(?:ust)?|sep(?:tember)?|oct(?:ober)?|nov(?:ember)?|dec(?:ember)?)\.?\s+\d{1,2},?\s+20\d{2}|\b20\d{2}[-\/](0[1-9]|1[0-2])[-\/](0[1-9]|[12]\d|3[01])\b/gi;
const dates = (document.body.innerText.match(datePattern) || []);
return { blogSectionFound: blogIndicators.length > 0, datesFound: dates.slice(0,5) };
Confirm in Phase 2 by visiting the blog/news listing page. Most-recent post date determines freshness.
Evaluation:
pass— Blog/news section present with content dated within last 6 months.partial— Section present but content older than 6 months.fail— No blog/news section found.
Date stamps on posts or case studies
const dateTags = Array.from(document.querySelectorAll('time[datetime], [class*="date"], [class*="published"]'));
return { count: dateTags.length, examples: dateTags.slice(0,3).map(el => el.innerText.trim() || el.getAttribute('datetime')) };
Confirm by visiting at least one blog post in Phase 2 — the post page should display a published date.
Evaluation:
pass— Date stamps visible on posts or case studies.fail— Absent.
Sitemap lastmod dates recent — sitemapLastmodRecent
Cross-reference 1.2 sitemap analysis. Were lastmod dates present and within 90 days?
Evaluation:
pass— Sitemap has lastmod dates within 90 days.fail— Absent or all stale.
Per-page "Last updated" stamps — contentUpdateRecency
Distinct from dateStampsOnContent (which checks for any visible date on posts). This signal checks whether individual pages — especially evergreen reference pages — carry a visible "Last updated" / "Updated on" / "Revised YYYY" stamp within the last 12 months.
Perplexity cites content updated in the last 30 days at 82%; content updated within the last 12 months earns 3.2× more Perplexity citations. Per-page update stamps signal active maintenance.
Visit one editorial / reference page during Phase 2 and check:
const bodyText = document.body.innerText;
const updatedPattern = /(?:last\s+updated|updated\s+on|revised|last\s+revised|reviewed\s+on)[:\s]+([A-Z][a-z]+\s+\d{1,2},?\s+20\d{2}|\d{1,2}\s+[A-Z][a-z]+\s+20\d{2}|20\d{2}-\d{2}-\d{2})/i;
const modifiedMeta = document.querySelector('meta[property="article:modified_time"], meta[name="last-modified"]');
// dateModified in JSON-LD
const jsonLdMod = Array.from(document.querySelectorAll('script[type="application/ld+json"]'))
.map(s => { try { return JSON.parse(s.innerText); } catch(e) { return null; }})
.filter(Boolean)
.flatMap(s => Array.isArray(s) ? s : (s['@graph'] || [s]))
.find(s => s.dateModified);
// dateModified in microdata — <meta itemprop="dateModified" content="..."> or <time itemprop="dateModified" datetime="...">
const microdataModEl = document.querySelector('[itemprop="dateModified"]');
const microdataMod = microdataModEl
? (microdataModEl.getAttribute('datetime') || microdataModEl.getAttribute('content') || microdataModEl.innerText.trim())
: null;
// dateModified in RDFa
const rdfaModEl = document.querySelector('[property="dateModified"], [property="schema:dateModified"]');
const rdfaMod = rdfaModEl
? (rdfaModEl.getAttribute('datetime') || rdfaModEl.getAttribute('content') || rdfaModEl.innerText.trim())
: null;
const match = bodyText.match(updatedPattern);
return {
visibleUpdatedStamp: match ? match[0] : null,
modifiedMetaContent: modifiedMeta ? modifiedMeta.getAttribute('content') : null,
schemaDateModified: (jsonLdMod && jsonLdMod.dateModified) || microdataMod || rdfaMod || null,
schemaDateModifiedFormat: (jsonLdMod && jsonLdMod.dateModified) ? 'json-ld' : microdataMod ? 'microdata' : rdfaMod ? 'rdfa' : null
};
Evaluation:
pass— Visible "Last updated" stamp within the last 12 months on at least one sampled inner page. Article schemadateModifiedalone also counts if it is within the last 12 months and the page is editorial.fail— No visible update stamps anywhere, OR stamps present but all >12 months old.
Pure-ecommerce / single-page SaaS exemption: if there is no editorial content to bear "last updated" stamps, record
naand note innotesas"N/A — site has no editorial content requiring update stamps."
recentFeaturedWork and currentDomainReferences are evaluated in Phase 3.
1.7 Entity Clarity — Programmatic check (1 of 5 signals here)
Social profile links in footer
const socialDomains = ['linkedin.com', 'twitter.com', 'x.com', 'facebook.com', 'instagram.com', 'youtube.com'];
const links = Array.from(document.querySelectorAll('footer a[href], [class*="social"] a[href]'));
const socialLinks = links.filter(a => socialDomains.some(d => a.href.includes(d)));
return { count: socialLinks.length, platforms: [...new Set(socialLinks.map(a => a.href.split('/')[2]))] };
Evaluation:
pass— LinkedIn and at least one other professional profile linked from footer.partial— Only one social profile linked.fail— No social profile links in footer.
entityIdentifiable, primaryFocusSpecificity, geographicMarketClarity, and consistentIdentity are all evaluated in Phase 3.
1.8 llms.txt
browser_navigate to [baseURL]/llms.txt. Record whether it loads (status 200 vs 404) and capture body text. Then browser_navigate to [baseURL]/llms-full.txt and record presence. Return to homepage.
return { found: document.body.innerText.trim().length > 10, content: document.body.innerText.substring(0, 1000) };
Evaluation:
llmsTxtPresent—passif present and non-empty.failif absent.llmsFullTxtPresent—passif present.failif absent.llmsTxtContent— evaluated in Phase 3 based on accuracy and specificity.
SECTION 2: Multi-Page Evidence Collection
The homepage is the primary evaluation target, but several signals require inner-page confirmation. Visit at least 3 additional pages beyond the homepage. Record every URL visited in visitedPages.
Pages to visit (in priority order, by site type)
Always start with the homepage (already visited in Phase 0/1). Then pick inner pages based on the detected siteType:
Universal must-visit pages (all site types):
| # | Page type | Purpose |
|---|---|---|
| 1 | About / Team / Authors / Masthead / People | Confirm namedTeamMembers, authorCredentials, sample canonicalUrls |
| 2 | One primary-offering page (varies by site type, see below) | Confirm primaryEntitySchema, primaryOfferingDetail, namedSpecificEntities, primaryFocusSpecificity |
| 3 | A recent editorial or featured-content page (varies by site type, see below) | Confirm dateStampsOnContent, contentUpdateRecency, authorBylines, blogNewsRecent |
Site-type-specific page targets:
| Site type | Primary-offering page | Featured-content page | Type-specific functional page |
|---|---|---|---|
agency |
A service / solution page | A case study or recent blog post | — |
ecommerce |
A product detail page (PDP) | The /new or /collections page or a recent product launch | — |
mediaBlog |
A category / section index for the dominant vertical | A recent article from the last 30 days | /subscribe or /newsletter (FAQ schema check) |
saas |
A feature or pricing page | A changelog / release notes / recent blog post | — |
education |
A course or program page | A recent news / announcements page or instructor page | — |
localBusiness |
A services / menu / offerings page | A recent updates / news / events page if present | — |
nonprofit |
A program / cause / how-we-help page | A recent campaign / impact-report / news page | /donate or /give (FAQ schema check) |
community |
A "what is this" / rules / wiki page | A recently-active discussion / featured contribution | /join, /membership, or /become-a-member (FAQ schema check) |
Optional 4th visit: if a dedicated FAQ page exists at /faq, /faqs, /help, or /support, visit it to confirm faqSectionPresent and faqSchemaApplied.
Mediablog-specific 5th visit — archive page quality check. For mediaBlog sites, visit at least one category or tag archive page (/category/<slug>/, /tag/<slug>/, /topics/<slug>/, or whatever the routing convention is). Run the archivePageQuality check from Section 2.1.
If a page type doesn't exist (e.g. ecommerce site with no editorial content), record the absence — it may change the status of the related signal, or trigger the type-specific exemption rule.
2.1 Type-specific functional page checks
These checks feed into existing evaluated signals rather than introducing new ones — they sharpen the evidence Phase 1 collected by checking the pages where the post's recommendations land most concretely.
FAQ schema on functional pages — nonprofit, mediaBlog, community
Visit the type-specific functional page and re-run the FAQ check:
const jsonLd = Array.from(document.querySelectorAll('script[type="application/ld+json"]'))
.map(s => { try { return JSON.parse(s.innerText); } catch(e) { return null; }})
.filter(Boolean)
.flatMap(s => Array.isArray(s) ? s : (s['@graph'] || [s]));
const hasFaqSchema = jsonLd.some(s => s['@type'] === 'FAQPage') ||
!!document.querySelector('[itemscope][itemtype*="schema.org/FAQPage"]');
const visibleFaq = !!document.querySelector('[class*="faq"], [id*="faq"]') ||
Array.from(document.querySelectorAll('h2, h3')).filter(h => h.innerText.trim().endsWith('?')).length >= 2 ||
document.querySelectorAll('details').length >= 2;
return { hasFaqSchema, visibleFaq, url: window.location.href };
The post's argument: donation, subscribe, and join/membership pages are exactly the pages where AI tools land users with concrete intent-bearing queries ("how do I donate to X?", "how do I subscribe to Y?", "what does a Z membership include?"). FAQ schema on these specific pages is high-leverage.
How the result feeds into evaluation:
- Visible FAQ content present but no FAQ schema on the functional page → drops
faqSchemaAppliedtopartialand creates amedium-severity issue with effortlowand a paste-ready Claude prompt for the FAQ schema. - Both visible FAQ and FAQ schema present → confirms
faqSchemaAppliedat full credit. - No FAQ content visible at all on the functional page → not penalized (the page may genuinely not need FAQ format), but record in
notesas a missed AEO opportunity since these pages are high-intent.
archivePageQuality check — mediaBlog only
Per the post: "Category and tag pages need descriptive text — a list of posts alone gives AI nothing to work with." WordPress archive pages typically emit an empty <meta name="description"> and zero copy beyond the post list.
This check does not introduce a new evaluated signal. It feeds findings into two existing signals:
passageExtractionQuality(Content Specificity) — an archive page with no descriptive text fails the "could a clean 1–2 sentence summary be pulled from each major section" test, since the archive page IS a major section and yields nothing extractable.primaryFocusSpecificity(Entity Clarity) — an archive page with no topic-area description weakens the site's overall focus signal for the dominant vertical.
Programmatic check:
const h1 = document.querySelector('h1') ? document.querySelector('h1').innerText.trim() : '';
const metaDesc = document.querySelector('meta[name="description"]') ?
document.querySelector('meta[name="description"]').getAttribute('content').trim() : '';
// Look for descriptive copy NOT inside the post list — typically in
// <header>, <.archive-description>, <.term-description>, or the first
// .entry-content / .page-content block before the list
const descCandidates = Array.from(document.querySelectorAll(
'header p, .archive-description, .term-description, .category-description, .tag-description, [class*="archive-intro"], [class*="archive-summary"]'
));
const descText = descCandidates.map(el => el.innerText.trim()).filter(t => t.length > 50).join(' ');
// Post list density check — how much of the page is post cards vs descriptive copy
const postListEls = document.querySelectorAll('article, [class*="post-"], [class*="card"]');
const bodyTextLen = document.body.innerText.length;
const descRatio = bodyTextLen > 0 ? (descText.length / bodyTextLen) : 0;
return {
h1,
metaDesc,
metaDescLength: metaDesc.length,
descriptiveCopyLength: descText.length,
descriptiveCopySample: descText.slice(0, 200),
postListCount: postListEls.length,
descriptiveRatio: Math.round(descRatio * 100) / 100,
qualityGrade:
descText.length >= 200 && metaDesc.length >= 80 ? 'good' :
descText.length >= 80 || metaDesc.length >= 50 ? 'partial' :
'empty'
};
Grading:
good— 200+ chars of descriptive copy above the post list AND a meta description of 80+ chars. Counts towardpassageExtractionQualitypassing for this section andprimaryFocusSpecificityfor the dominant vertical.partial— Some descriptive copy OR meta description but not both at full quality. Half credit toward the above; record the gap.empty— No descriptive copy beyond the H1 (often just "Category: Climate") and either no meta description or theme-default boilerplate. DropspassageExtractionQualityand weakensprimaryFocusSpecificity; create amedium-severity issue with effortlowand a Claude prompt to write a 2–3 sentence topic description for the archive.
If the site has no archive pages (single-page sites, ecommerce with no editorial section), skip this check and record "N/A — site has no archive pages."
Per-inner-page checks
For each inner page:
browser_navigateto the URLbrowser_evaluateto inspect schema and meta tags (a site with a SEO plugin like Yoast / RankMath / SEOPress typically emits schema and meta on every page):
const jsonLd = Array.from(document.querySelectorAll('script[type="application/ld+json"]')).map(s => {
try { return JSON.parse(s.innerText); } catch(e) { return null; }
}).filter(Boolean);
const personOnPage = jsonLd.some(s => s['@type'] === 'Person' || (Array.isArray(s['@graph']) && s['@graph'].some(g => g['@type'] === 'Person')));
const articleOnPage = jsonLd.some(s => s['@type'] === 'Article' || s['@type'] === 'BlogPosting');
return { jsonLdCount: jsonLd.length, personOnPage, articleOnPage };
- For About / Team pages, run a quick named-person check:
const headings = Array.from(document.querySelectorAll('h2, h3, h4')).map(h => h.innerText.trim());
const possibleNames = headings.filter(t => /^[A-Z][a-z]+\s+[A-Z][a-z]+/.test(t)); // crude "First Last" pattern
return { headingsCount: headings.length, namedPeople: possibleNames.slice(0, 10) };
- For blog / news listings, extract dates from the most recent posts:
const timeEls = Array.from(document.querySelectorAll('time[datetime]'));
const datedItems = timeEls.map(t => ({ datetime: t.getAttribute('datetime'), text: t.innerText.trim() })).slice(0, 10);
return { datedItems };
- Canonical URL check — every inner page should have
<link rel="canonical">pointing to itself:
const c = document.querySelector('link[rel="canonical"]');
return {
href: c ? c.getAttribute('href') : null,
selfReferential: c && new URL(c.getAttribute('href'), location.href).pathname === window.location.pathname
};
Record per-page results to feed the canonicalUrls Phase 1 signal.
Per-page update recency — on at least one editorial / reference page, run the
contentUpdateRecencycheck from Section 1.6 to capture visible "Last updated" stamps andarticle:modified_timemeta.Author byline check — on at least one article/post/blog page, run the
authorBylinescheck from Section 1.5.Record all findings against the relevant signals.
CMS detection (run once on the homepage)
const generator = document.querySelector('meta[name="generator"]');
const yoastBlock = Array.from(document.querySelectorAll('script[type="application/ld+json"]')).some(s => s.innerText.includes('"yoast"') || s.innerText.includes('Yoast'));
const yoastClass = !!document.querySelector('[class*="yoast"]');
const rankMath = !!document.querySelector('meta[name="generator"][content*="Rank Math"]') || document.body.outerHTML.includes('rankmath');
const seopress = document.body.outerHTML.includes('seopress');
const wpContent = !!document.querySelector('link[href*="/wp-content/"]') || !!document.querySelector('script[src*="/wp-content/"]');
const wpJsonApi = document.body.outerHTML.includes('/wp-json/');
return {
generator: generator ? generator.getAttribute('content') : null,
yoast: yoastBlock || yoastClass,
rankMath,
seopress,
isWordPress: wpContent || wpJsonApi || (generator && /WordPress/i.test(generator.getAttribute('content') || ''))
};
Record the result in technicalNotes.cmsDetected (e.g. "WordPress + Yoast", "WordPress + RankMath", "WordPress (no SEO plugin)", "Headless / unknown"). This drives the effort ratings in Phase 5.
SECTION 3: Content AI Analysis
Extract homepage visible text:
return document.body.innerText;
Also extract:
- All heading texts:
Array.from(document.querySelectorAll('h1,h2,h3,h4')).map(h => ({ tag: h.tagName, text: h.innerText.trim() })) - First paragraph after each H2 (for direct-answer assessment)
- Footer text
Use this content to evaluate the remaining signals.
3.1 AEO Readiness — Content (4 signals)
Direct answers after headings — directAnswers
For each H2/H3, read the paragraph immediately following it. Count what fraction of those headings are followed by a direct first-sentence answer (not a preamble or build-up).
Evaluation:
pass— 70%+ of headings answered directly.partial— 10–69% of headings answered directly.fail— Less than 10%, OR content not structured in heading/answer format (no H2s/H3s).
Who / what / who content — whoWhatWho
The question is site-type-agnostic but the framing differs:
| Site type | Three identifiable elements |
|---|---|
agency, saas |
Who is the business / what do they make or do / who do they serve |
mediaBlog |
Who publishes this / what topics are covered / who is the intended reader |
ecommerce |
Who is the brand / what category of products / who buys them |
education |
Who is the institution / what is taught / who is the student |
localBusiness |
Who is the business / what services / what local market |
nonprofit |
Who is the organization / what is the mission / who is served |
community |
What is this community / what is the focus / who participates |
Can you clearly answer all three from the homepage text alone, within the first scroll?
Evaluation:
pass— All three elements answerable from the homepage text alone within the first scroll.partial— Two of three answerable.fail— One or zero answerable.
Featured snippet structure — featuredSnippetStructure
Count "extractable" sentences across the homepage: a sentence that defines a term, states a number with context, gives a step, or answers a question — and stands alone without surrounding context. Definition lists, numbered steps, and FAQ answers each count as one.
Evaluation:
pass— 5+ extractable sentences.partial— 2–4 extractable sentences.fail— 0–1 extractable sentences.
Answer capsules — answerCapsules
Distinct from featuredSnippetStructure. An answer capsule is a 40–60 word self-contained answer placed directly under an H2 or H3. This is the specific structural pattern AI engines extract verbatim — 72.4% of ChatGPT-cited pages have one. The capsule must be:
- Located in the first paragraph or block under the heading (no preamble or transition sentence above it)
- Between roughly 40 and 60 words
- Capable of standing alone as a complete answer to the heading's implicit question
Scan the page (homepage plus any visited content pages) for H2/H3 elements and analyze the immediately-following text:
const result = [];
document.querySelectorAll('h2, h3').forEach(h => {
let next = h.nextElementSibling;
while (next && (next.tagName === 'BR' || next.innerText.trim() === '')) next = next.nextElementSibling;
if (!next) return;
const text = next.innerText.trim();
const firstBlock = text.split(/\n\n+/)[0];
const wordCount = firstBlock.split(/\s+/).filter(Boolean).length;
const sentenceCount = firstBlock.split(/[.!?]+\s/).filter(s => s.trim().length).length;
if (wordCount >= 35 && wordCount <= 70 && sentenceCount <= 4) {
result.push({ heading: h.innerText.trim().slice(0,80), wordCount, sentenceCount, capsule: firstBlock.slice(0,300) });
}
});
return { capsuleCount: result.length, examples: result.slice(0,5) };
Evaluation:
pass— 3 or more answer capsules across the analyzed pages.partial— 1–2 answer capsules.fail— No qualifying capsules: every H2/H3 either has no text directly under it, has a preamble before the answer, or has text outside the 40–60 word band.
3.2 E-E-A-T Signals — Content (4 signals)
Author / staff credentials — authorCredentials
Do the named individuals identified in namedTeamMembers carry specific, verifiable credentials? "Verifiable" means a fact a third party could check: published works, named past clients/employers, named degrees or institutions, recognized certifications, prior press appearances.
The evidence varies by site type:
| Site type | What counts as verifiable credentials |
|---|---|
agency, saas, localBusiness |
Named past employers, named past clients, recognized certifications, professional licenses |
mediaBlog |
Prior bylines at named publications, recognized journalism awards, subject-matter degrees, books authored |
ecommerce |
Founder/designer/maker background relevant to the product category (e.g. "20 years woodworking", "former Patagonia designer") |
education |
Degrees, peer-reviewed publications, named institutional affiliations |
nonprofit |
Board affiliations, prior nonprofit leadership, named expertise areas |
community |
Prior community-building / topic-area credentials |
Evaluation:
pass— Specific, third-party-verifiable credentials present.partial— Generic credentials only ("years of experience", "industry leader") without specifics.fail— No credentials at all.
Demonstrated expertise — demonstratedExpertise
Count "specifics" across the homepage and visited pages. A specific is concrete, third-party-verifiable evidence — one of:
- A named methodology, technique, framework, or standard used or cited
- A named past client, partner, or institutional reference
- A specific outcome with a number (% growth, $ raised, N users, response times)
- A named technology, platform, or tool relevant to the site type
- A named primary source for a claim (study, paper, dataset, report)
- A subject-area technical term used correctly in context
Evaluation:
pass— 3 or more specifics observed across homepage and visited pages.partial— 1–2 specifics with the rest of the copy claiming expertise without evidence.fail— Zero specifics (only generic claims like "award-winning", "expert", "leading").
External citations or press — externalCitations
Third-party validation: publications that have cited or covered the site, podcasts the site has appeared on, industry awards with named bodies, conference talks, press mentions, peer recognition.
Evaluation:
pass— Named external validation present (named publications, podcasts, awards bodies, etc.).partial— Vague "featured in" / "as seen on" without named outlets.fail— Absent.
Named external relationships — namedExternalRelationships
Generalized from "named clients." The check is "are external entities the site relates to named explicitly, not just shown as anonymous logos?" The relevant relationships vary by site type:
| Site type | Named external relationships |
|---|---|
agency, saas |
Named clients, named partners, named integration vendors |
mediaBlog |
Named sources, named cited researchers, named contributors |
ecommerce |
Named brands carried, named manufacturers, named retail partners |
education |
Named partner institutions, named accreditors, named research collaborators |
nonprofit |
Named funders, named partner organizations, named beneficiary communities |
localBusiness |
Named suppliers, named professional affiliations, named recurring partner businesses |
community |
Named affiliated organizations, named sponsors, named partner communities |
Evaluation:
pass— At least one named external relationship appears in copy (not just a logo).fail— All relationships are anonymous, generic ("our clients", "our sources"), or logo-only.
3.3 Content Freshness — Content (2 signals)
Recent featured work — recentFeaturedWork
Generalized from "recent portfolio." The check varies by site type:
| Site type | What counts as recent featured work |
|---|---|
agency |
Recent case studies, portfolio pieces, or named recent projects (last 12 months) |
ecommerce |
New product launches, new collections, recent restocks featured prominently |
mediaBlog |
Recent articles in the dominant vertical, especially editor-featured pieces |
saas |
Recent feature releases, changelog entries, customer-spotlight posts |
education |
New courses, recent cohort launches, recent research publications |
localBusiness |
Recent events, recent menu/service changes, recent local appearances |
nonprofit |
Recent campaigns, recent impact reports, recent program launches |
community |
Recent featured discussions, recent contributor spotlights, recent events |
Evaluation:
pass— Recent featured work (last 12 months) is visible.fail— Work appears dated, references deprecated tools/platforms, or no featured work exists.
Current domain references — currentDomainReferences
Generalized from "current tech references." The check is "do the named entities and references in the content reflect what is current in the site's domain?" — deprecated tools, retired standards, stale research, or outdated regulations signal abandonment.
| Site type | What counts as current domain references |
|---|---|
agency, saas |
Currently-maintained platforms, frameworks, libraries |
mediaBlog |
Citations of current research, current regulations, current industry events |
ecommerce |
Current product lines, current trends, current designer/maker references |
education |
Current curricula, current pedagogical research, current accreditation standards |
localBusiness |
Current local references, current professional standards |
nonprofit |
Current programs, current policy references, current statistics |
community |
Current platform conventions, current topic-area developments |
Evaluation:
pass— Domain references appear current.fail— Deprecated/retired/superseded references are cited as current.
3.4 Entity Clarity — Content (4 signals)
Named entity identifiable in one sentence — entityIdentifiable
Can you write a single sentence identifying this site from the homepage text alone? It must include three elements appropriate to the site type:
| Site type | Required elements |
|---|---|
agency, saas, localBusiness |
Business name + business type + core specialty |
mediaBlog |
Publication name + topic area + editorial stance / audience |
ecommerce |
Brand name + product category + target buyer |
education |
Institution name + program type + student audience |
nonprofit |
Organization name + cause area + beneficiary |
community |
Community name + topic area + participant type |
Example one-sentence identifications by type:
- agency: "Bolt Studio is a WordPress development agency specialising in WooCommerce for DTC brands."
- mediaBlog: "Stratechery is a strategy publication covering tech business models for senior tech operators."
- ecommerce: "Outdoor Voices is a women-led athletic apparel brand for everyday recreational athletes."
- saas: "Linear is a project management SaaS built for product-led software teams."
- education: "Lambda School is an online coding bootcamp training career-changers in full-stack web development."
Evaluation:
pass— A single clear identifying sentence containing all three required elements is constructable from the homepage.partial— Partially identifiable (1 or 2 of 3 elements clear).fail— Cannot identify the site from the homepage text.
Primary focus specificity — primaryFocusSpecificity
Generalized from "service specificity." How specific is the site's primary focus, as stated in the homepage copy?
| Site type | Specificity examples (high / medium / low) |
|---|---|
agency |
"WordPress agency for DTC ecommerce brands on Shopify" / "web design and development for small businesses" / "full-service digital agency" |
mediaBlog |
"Climate change reporting for policymakers" / "general business news" / "lifestyle blog" |
ecommerce |
"Hand-thrown ceramic dinnerware for hospitality buyers" / "kitchen accessories" / "general home goods" |
saas |
"Customer feedback dashboards for B2B SaaS product managers" / "feedback management" / "productivity software" |
education |
"Online tax-law CLE courses for practicing attorneys" / "professional development courses" / "online learning" |
nonprofit |
"Free legal aid for tenants facing eviction in NYC" / "housing rights advocacy" / "social justice" |
Evaluation:
pass— Highly specific primary focus stated.partial— Moderately specific or generic focus.fail— No clear focus described.
Geographic or market clarity
Is a location, timezone, or market focus stated anywhere on the homepage?
Evaluation:
pass— Location, timezone, or market focus is stated.fail— Absent.
Consistent identity signals
Does the site describe itself consistently across the hero, about section, and footer? Or does the specialty/positioning shift between zones?
Evaluation:
pass— Consistent identity across all zones.fail— Contradictory identity signals between zones.
3.5 Content Specificity — Content (5 signals)
Primary offering detail — what, who, outcome — primaryOfferingDetail
For each primary offering described, can you answer: what is it, who is it for, what outcome does it deliver? The "offering" varies by site type:
| Site type | What counts as a primary offering |
|---|---|
agency, saas, localBusiness |
Each service or product line |
ecommerce |
Each product category or collection |
mediaBlog |
Each content vertical or beat |
education |
Each course, program, or degree track |
nonprofit |
Each program, cause area, or initiative |
community |
Each major activity or sub-community |
Evaluation:
pass— All three (what / who / outcome) answered for most offerings.partial— Partially answered (some offerings have detail, others don't, or 1–2 of the three elements addressed).fail— Offerings listed by category label only with no detail.
Named specific entities — namedSpecificEntities
Generalized from "named platforms and technologies." Are specific entities relevant to the site's domain named explicitly rather than referred to abstractly?
| Site type | What counts as named specific entities |
|---|---|
agency, saas |
Platforms, frameworks, libraries (WordPress, React, Stripe, Shopify, AWS) |
mediaBlog |
Named publishers cited, named primary sources, named research datasets, named experts quoted |
ecommerce |
Named brands carried, named manufacturers, named raw materials or techniques (single-origin Ethiopian Yirgacheffe; Italian leather) |
education |
Named accreditors, named frameworks (Common Core, NGSS), named institutional partnerships |
nonprofit |
Named partner organizations, named cited reports, named legislation references |
localBusiness |
Named suppliers, named professional certifications, named local landmarks |
community |
Named upstream/sibling communities, named featured tools |
Evaluation:
pass— 2 or more named specific entities relevant to the domain.partial— Exactly one named specific entity.fail— None (generic abstractions only).
Named subject areas — namedSubjectAreas
Generalized from "named industries." Are the site's subject areas named specifically?
| Site type | What counts as named subject areas |
|---|---|
agency |
Industries served (DTC ecommerce, healthcare, fintech, climate tech) |
mediaBlog |
Topic verticals or beats (geopolitics, indie publishing, electric vehicles) |
ecommerce |
Product categories (women's outerwear, kitchen knives, indoor plants) |
saas |
ICP roles or verticals (product managers, restaurant operators, B2B SaaS sales teams) |
education |
Fields of study or specializations (data science, environmental policy, early childhood ed) |
nonprofit |
Program areas (eviction defense, food security, watershed restoration) |
localBusiness |
Service specialties (italian cuisine, sports medicine, residential plumbing) |
community |
Topic focus (mechanical keyboards, woodworking, climate policy) |
Evaluation:
pass— Specific subject areas named.fail— Generic only ("businesses", "people", "everyone").
Specific outcomes — specificOutcomes
Generalized from "client wins with numbers." Are quantified results or named achievements present?
| Site type | What counts as specific outcomes |
|---|---|
agency, saas |
Named client wins with metrics (3× conversion lift, 40% cost reduction, $2M raised) |
mediaBlog |
Cited statistics in articles, original research findings with numbers, primary-source quoted figures |
ecommerce |
Named reviewed metrics (4.8★ from 12K reviews, 96% recommend), return-rate transparency |
education |
Named outcomes (graduate placement rate %, mean salary, named alumni) |
nonprofit |
Named impact metrics (N families served, $ delivered, % of donations reaching programs) |
localBusiness |
Named tenure metrics (servicing X area since YYYY, named reviewed ratings) |
community |
Named member/contributor counts, named published outputs |
Evaluation:
pass— Specific quantified outcomes or named achievements present.fail— Vague claims only ("we get results", "great quality", "trusted").
Passage extraction quality — passageExtractionQuality
Could a clean, accurate 1–2 sentence summary be pulled from each major section without needing surrounding context?
Evaluation:
pass— Most sections yield clean extractable summaries.fail— Sections require surrounding context to make sense.
3.6 llms.txt content quality
If llms.txt was found in 1.8, evaluate the recorded content:
- Does it accurately describe the site name, type, and specialty?
- Does it mention specific services, target audiences, or platforms?
- Is it specific or generic boilerplate?
Evaluation:
pass— Specific and accurate; describes site name, type, specialty, and mentions specific services / audiences / platforms.fail— Vague boilerplate.na— llms.txt is absent (covered by thellmsTxtPresentsignal; recordnahere with note"N/A — llms.txt not present.").
SECTION 4: Evaluation & Verification
- Map all findings from Sections 1–3 to the rubric in
references/evaluation-rubric.md. - Assign each signal a status (
pass/partial/fail/na). - Verify each criterion's
signalsblock contains every signal from the canonical signal-keys list. No signal may be omitted; usenawith rationale when the signal genuinely does not apply. - Count totals into the
summaryblock:{ totalSignals: 49, pass, partial, fail, na }. Confirmpass + partial + fail + na === 49. - There is no top-level score and no threshold label — the summary counts plus the issues list communicate state.
MANDATORY PRE-REPORT CHECKLIST
Complete all items before generating the JSON report.
- Phase 0 site type detection completed,
siteType,siteTypeConfidence,siteTypeRationalerecorded - Phase 0.4 schema relevance scan completed,
applicableSchemasrecorded intechnicalNotes - Playwright launched and homepage loaded at desktop viewport (1920×1080)
- Desktop screenshot saved to
reports/screenshots/homepage-aeo-desktop.png - robots.txt fetched and analyzed with the training-vs-retrieval split
- sitemap.xml fetched and analyzed for lastmod dates
- llms.txt and llms-full.txt fetches attempted and results recorded
- Canonical URL checked on homepage and at least 2 inner pages
- No-JS reachability check completed
- At least 4 pages visited (homepage + 3 type-appropriate inner pages) and recorded in
visitedPages - Per-page
contentUpdateRecencycheck run on at least one editorial / reference page (or N/A exemption applied per site type) - Per-page
authorBylinescheck run on at least one article / post page (or N/A exemption applied per site type) - CMS detection completed and recorded in
technicalNotes.cmsDetected - All Section 1 programmatic checks completed
- All Section 2 inner-page checks completed
- All Section 3 content checks completed
- All 49 signals evaluated across 8 criteria — every signal has a
statusofpass,partial,fail, orna -
summaryblock populated withtotalSignals,pass,partial,fail,nacounts; the four buckets sum tototalSignals - Every signal at status
failorpartialhas a corresponding entry inissues.critical|high|medium|low[] - Actionable Claude prompt written for every signal at status
failorpartial - Every issue object includes an
effortfield (low/medium/high/unknown)
If any item above is unchecked, do NOT generate the report. Return to the relevant section and complete it.
SECTION 5: Report Generation
File location
Save the report to reports/data/qa-report-aeo.json. This is the fixed filename the kosh report generator and merge script expect.
Top-level structure
The full structure is defined in schemas/qa-report-aeo-schema.json. Use canonical signal keys from references/signal-keys.md — do not invent variants (robotsAndCrawlerAccess, not robotsTxt; noNoindex, not noindex).
Each criterion follows {signals: {key: {status, notes}, ...}} where status is one of pass / partial / fail / na.
Minimal top-level shape:
{
"url": "https://example.com",
"websiteName": "Example",
"timestamp": "ISO-8601",
"mode": "aeo",
"aeoRubricVersion": "1.0",
"siteType": "mediaBlog",
"siteTypeConfidence": "high",
"siteTypeRationale": "BlogPosting JSON-LD on 3 of 5 sampled pages; /blog/ route with 40+ dated entries.",
"environment": "production",
"visitedPages": [
"https://example.com/",
"https://example.com/about/",
"https://example.com/category/climate/",
"https://example.com/articles/recent-piece/"
],
"summary": {
"totalSignals": 49,
"pass": 38,
"partial": 6,
"fail": 4,
"na": 1
},
"criteria": {
"technicalHealth": {
"signals": {
"robotsAndCrawlerAccess": { "status": "pass", "notes": "All live-retrieval bots allowed." },
"noNoindex": { "status": "pass", "notes": "No noindex." }
/* ...remaining technicalHealth signals per schema... */
}
}
/* ...7 more criterion blocks, 49 signals total per references/signal-keys.md... */
},
"issues": {
"critical": [
{
"criterion": "structuredData",
"signal": "organizationSchema",
"issue": "One sentence describing the specific gap found on this site.",
"impact": "One sentence explaining the consequence for AI discoverability or citation.",
"effort": "medium"
}
],
"high": [],
"medium": [],
"low": []
},
"actionablePrompts": [ /* one entry per signal at status fail or partial */ ],
"technicalNotes": {
"robotsTxt": "Allowed: OAI-SearchBot, ChatGPT-User, Claude-User, PerplexityBot. Blocked: GPTBot, ClaudeBot, Google-Extended, CCBot — modern training/retrieval split.",
"sitemapUrl": "https://example.com/sitemap_index.xml",
"llmsTxtFound": false,
"llmsFullTxtFound": false,
"javascriptRequired": false,
"httpsActive": true,
"mixedContentCount": 0,
"cmsDetected": "WordPress + Yoast",
"applicableSchemas": {
"Person": "medium",
"Article": "high",
"Organization": "high",
"LocalBusiness": "absent",
"Event": "high",
"FAQPage": "high",
"HowTo": "medium",
"Product": "absent",
"Course": "absent",
"Recipe": "absent",
"Review": "absent"
}
}
}
Issue severity guide
- critical — Signal at
failfor a high-impact rubric area (Technical Health, Structured Data, AEO Readiness). Blocking AI discoverability or citation. - high — Signal at
failfor any other criterion, orpartialwhere the gap is substantial. Significantly weakens AI understanding or trust signals. - medium — Signal at
partialwhere the gap is moderate. Improvement opportunity. - low — Minor gap. Worth noting but low priority.
Issue effort guide
Every issue must include an effort field. Use this two-step process:
Step 1 — assign the signal-type baseline:
| Effort | Meaning | Typical AEO examples |
|---|---|---|
low |
Content edit or simple config. No developer required. Under 1 hour. | Create llms.txt, update copyright year, add social profile links, add geographic location to copy |
medium |
Requires a developer, plugin, or significant content creation. Hours to a day. | Add Organization or Service JSON-LD schema, add FAQ schema to existing FAQ, write question-framed headings, add date stamps |
high |
Requires design or development work, refactoring, or substantial new content. Days or more. | Fix JS-only rendering, build a blog or news section from scratch, create case studies with named outcomes |
unknown |
Stack context insufficient to assess. Use sparingly. |
Step 2 — adjust based on observed CMS context:
- WordPress + Yoast / RankMath / SEOPress detected: schema additions and most meta-tag changes drop from
mediumtolow(one-field edits in the SEO plugin panel). RecordeffortRationale: "WordPress + Yoast detected — one-field edit in the SEO panel." - WordPress core only (no SEO plugin): schema changes stay
medium(requires plugin install or theme code edit). - Headless WordPress / Next.js / custom React: structural changes stay
mediumor rise tohigh. Note ineffortRationale. - llms.txt: always
lowregardless of stack — it's a plain text file at the domain root. - Partial completion (e.g. Organization schema present but missing
sameAs): effort to complete is lower than building from scratch.
Only write effortRationale when site context meaningfully changes the baseline.
Example issue object:
{
"criterion": "structuredData",
"signal": "organizationSchema",
"issue": "No Organization schema present.",
"impact": "AI tools cannot reliably identify the site as a named entity, reducing citation likelihood.",
"effort": "low",
"effortRationale": "WordPress + Yoast detected — Organization schema can be configured in the Yoast SEO Settings panel.",
"pages": ["https://example.com/"]
}
Actionable prompt guide
Every signal at status fail or partial must have an entry in actionablePrompts. Each prompt must:
- Describe the specific issue found on this site (not a generic description)
- Explain the impact on AI discoverability or citation in one sentence
- Provide a ready-to-use Claude prompt the site owner can paste directly into Claude
The prompt field conventions:
- Open with the imperative ("Write…", "Generate…", "Emit…", "Audit…"). No identity preamble.
- Include only context Claude needs. Site type, named services or platforms, audience, language, existing
@idif any. - Reference actual content from the site — service names, FAQ questions, copyright year — so Claude grounds output in real content, not placeholders.
- Specify output format up front: JSON-LD schema type, single block vs
@graph, character limits, code pattern shape. - State language only when not English.
- No filler. Skip "Help me", "Can you", "Please".
llms.txtis always lowercase, even at sentence start.
Example — ❌ "I'm Maple Creative, a Toronto branding agency. My homepage has no Organization schema. Write me one."
Example — ✅ "Write a complete JSON-LD Organization schema block for a Toronto branding agency homepage <head>. Include name, url, logo, description, and sameAs links for LinkedIn and Twitter. Output a single <script type=\"application/ld+json\"> block."
Generate the HTML report
Once reports/data/qa-report-aeo.json is saved, run:
node scripts/generate-report.js reports/data/qa-report-aeo.json
The AEO branch is auto-detected from report.mode === "aeo". The script writes a self-contained HTML report to reports/ (color-coded status pills, per-criterion signal tables, schema-relevance section, collapsible severity blocks, paste-ready Claude prompts) and prints the output path. --aeo can be passed explicitly to force the AEO branch.
Terminal summary
After saving the JSON and running the report script, print a brief summary:
Kosh AEO Analysis — [Website Name]
URL: [URL]
Site type: [siteType] ([confidence])
Environment: [environment]
Signals: [pass] pass · [partial] partial · [fail] fail · [na] N/A (of [totalSignals])
Top issues:
- [Top 3 critical/high issues by severity and CMS-adjusted effort]
Full report saved to:
JSON: reports/data/qa-report-aeo.json
HTML: reports/[SITENAME]_AEO_QA_REPORT_[YYYY-MM-DD].html
WordPress-Specific Patterns
When you detect WordPress (look for /wp-content/, /wp-json/, meta[name="generator"] with WordPress, or admin bar markup), these patterns are common:
- Yoast SEO — emits
@graph-wrapped JSON-LD with Organization, WebSite, WebPage, and BreadcrumbList by default. Check whether Organization fields (logo, sameAs) are populated in the Yoast settings — empty fields are a common failure mode. - RankMath — similar coverage to Yoast, with FAQ block schema available in the editor. Check whether FAQ blocks were used on FAQ pages.
- SEOPress — narrower default schema coverage; Organization usually present, FAQ usually not.
- All in One SEO — Organization + WebPage; check explicitly because some sites disable schema output.
- Site Kit / Google plugins — do not emit schema, but may emit additional meta. Don't mistake meta tags for schema markup.
- Yoast and RankMath both emit
/sitemap_index.xmlrather than a flat/sitemap.xml. If/sitemap.xml404s, follow theSitemap:line in robots.txt. - WooCommerce sites typically have Product schema on product pages but not on the homepage. AEO evaluation is anchored on the homepage — a WooCommerce site with no Store, OfferCatalog, or Organization schema on the homepage still fails its
primaryEntitySchemasignal even if Product schema is plentiful on PDPs (those are confirmed during the Phase 2 sample). - The WordPress comments JSON-LD block (
@type: "Comment") does not count toward Service / Organization / FAQ tallies. - REST API exposure (
/wp-json/) is a separate signal not in the rubric, but record its presence intechnicalNotes.cmsDetected— it affects effort estimates for some fixes. - llms.txt — there is no widely adopted WordPress plugin for llms.txt as of this rubric version. Effort to create it is always
low(paste a text file into the WordPress root via SFTP or use a "raw file" plugin), but it does require server access.
AEO Testing Notes
Why homepage-anchored, multi-page evaluation?
The rubric is anchored to the homepage as the primary evaluation target, with Phase 2 inner-page checks providing confirmation and evidence-gathering for signals that don't always show fully on the homepage. The homepage anchor matters because AI tools tend to treat the homepage as the canonical entity-defining page when synthesizing a one-paragraph answer about a site.
Several signals (canonical URLs, contentUpdateRecency, authorBylines) genuinely require multi-page evidence and are evaluated against the inner-page findings. The site-type detection from Phase 0 determines which inner pages are most useful to visit.
What passes vs. fails AEO testing
Passes: Schema present and complete on the homepage, robots.txt allows AI crawlers, sitemap is fresh, content can be parsed by AI without JS, headings answer questions directly, entities (business name, service, audience) are unambiguous in the first scroll.
Fails: Missing or malformed schema, robots.txt blocking AI crawlers, no sitemap, content requires JS to render, dense paragraph copy with no extractable sentences, no named team or external validation, generic positioning ("full-service digital agency"), no llms.txt.
Edge cases
- Single-page sites — evaluate Phase 2 inner-page signals based on the same homepage, noting in
notesthat no separate inner pages exist. CapvisitedPagesat the homepage URL only and record the constraint intechnicalNotes. - Sites behind a paywall or login — evaluate what's accessible. Note in
technicalNotesthat crawler-accessible content is limited. - Sites with locale routing — evaluate the locale you landed on. If a
hreflangis present and the English version is reachable, prefer it. - Sites with cookie/consent walls that block content — evaluate the post-consent state. Accept consent in the browser session before evaluating.