audit-website-aeo - SKILL.md Agent Skill

name: audit-website-aeo description: Audits a live website for AI-engine discoverability (AEO/GEO). Crawls the site, runs 16 deterministic checks plus a 6-dimension content evaluation, and produces a scored report (A-F) with prioritized fixes. Use to get a baseline before improve-aeo-geo, or to measure progress after changes.

Audit Website AEO/GEO Skill

You audit a live website the way an AI agent would — crawling its pages, parsing structure, and judging whether the content is citation-worthy for ChatGPT, Claude, Perplexity, and Google AI Overviews.

The audit has two halves:

Foundational (50%) — 16 deterministic pass/fail checks run by a script. Reproducible, no judgment.
Intelligence (50%) — 6 content-quality dimensions you score by reading the pages, using the rubric below.

Final score = 0.5 × foundational + 0.5 × intelligence, mapped to an A-F grade.

This skill produces a diagnosis. To then fix a codebase, hand off to the improve-aeo-geo skill.

Workflow

Follow this sequence exactly.

Step 1: Get inputs

Ask the user for:

Website URL (required) — the live site to audit.
Crawl depth (optional) — how many pages to crawl. Default 10, max 30.
Output location (optional) — where to save the report. Default: current directory, or workspace/<customer-name>/ if working a customer project.

If the user already gave a URL when invoking the skill, don't re-ask — just confirm crawl depth and proceed.

Step 2: Run the deterministic audit

Run the bundled script from this skill's scripts/ directory. It requires only Node 18+ — no npm install.

node <skill-path>/scripts/aeo-audit.mjs <url> --max-pages=10 --out=<output-dir>/aeo-audit.json

The script crawls (sitemap + robots.txt + internal links), runs the 16 checks per page, aggregates site-wide, and writes a JSON report. It also prints a human-readable summary. Tell the user the foundational score and the failed checks.

If the script errors (site unreachable, 0 pages crawled), report the error and stop — don't fabricate a score.

Step 3: Read the JSON report

Read the aeo-audit.json file. The key fields:

scoring.foundationalScore — the deterministic score (0-100). This is final — do not change it.
checks — the 16 site-wide checks with pass/fail and details.
pagesForReview — up to 5 representative pages (home + richest content pages), each with an aiView object containing title, metaDescription, h1, headings, schemaTypes, jsonLdSummary, textExcerpt, internalLinkCount, author, publishedDate, modifiedDate. Use these for Step 4.
prioritizedFixes, worstPages, coverage, heuristicIntelligenceSignals — supporting context. The heuristic signals are a deterministic prior — a sanity check, not the real evaluation.

Step 4: Score the 6 intelligence dimensions

You are an AI agent that just found this site via web search. A user asked you a question and you landed here. Decide: would you cite this site in your answer?

Read the textExcerpt, headings, and metadata of each page in pagesForReview. Then score all 6 dimensions below, each 0-5, using only what you actually observed (no assumptions about pages you didn't see). Write the rationale before the score.

Rubric (0-5 each)

Answer Readiness — If a user asked a question about this site's topic, could you find a direct answer here? The #1 factor — content answering questions in the first paragraph gets 4.8x more citations.

0 = No answers; purely promotional or navigational
1 = Vague content that talks around topics but never directly answers
2 = Some answers exist but buried deep, not in opening paragraphs
3 = Several questions answerable; some definition-first or FAQ-style content
4 = Most common questions answerable; answers lead sections
5 = Exceptional (dedicated FAQ blocks, definition-first paragraphs, Q&A format throughout)

Quotability — Can you extract a clean, self-contained 40-60 word passage to quote? Comparison tables get 2.8x citations; FAQ blocks +156%.

0 = No extractable content (interactive-only, single dense block)
1 = Content requires full-page context; no passage stands alone
2 = A few passages extractable but most need surrounding context
3 = Several self-contained paragraphs; some lists or structured blocks
4 = Good quotability (tables, lists, FAQ sections, clear answer blocks)
5 = Highly quotable (comparison tables, step-by-step blocks, definition paragraphs throughout)

Evidence Density — Statistics, data points, named sources, in-text citations? Adding in-text citations = +115% visibility; statistics = +40% citation rate.

0 = No evidence; only marketing copy and vague claims
1 = Vague claims only ("best in class", "industry leading")
2 = Mostly generalities; rare specific data points
3 = Some statistics and named sources; cites a few external sources
4 = High density (numbers, dates, named sources, links to references)
5 = Exceptional (statistics every 150-200 words, in-text citations throughout, verifiable metrics)

Content Depth — Enough substance to thoroughly answer questions on the topic? Long-form (2000+ words) gets 3x more citations.

0 = Empty or placeholder content only
1 = Minimal (a few sentences, no real substance)
2 = Thin (surface-level, missing key details a user would need)
3 = Adequate (covers main points but lacks sub-topics or examples)
4 = Rich (comprehensive coverage, multiple sub-topics, examples, data)
5 = Exceptional (authoritative depth, multi-faceted, a go-to reference)

Freshness — Current enough to cite confidently? 76% of ChatGPT's most-cited pages were updated in the last 30 days.

0 = No date signals; content appears abandoned or timeless-generic
1 = Dates present but clearly outdated (2+ years, stale references)
2 = Moderately dated; no "last updated" indicator
3 = Reasonably current OR explicit "last updated" date visible
4 = Recent content with update timestamps and current references
5 = Clearly current (recent dates, active maintenance evident)

Structural Clarity — Does the HTML parse cleanly into readable text? A prerequisite — clean heading hierarchy = 3.2x more citations.

0 = Unreadable (no text, blocked, non-semantic markup)
1 = Very poor (walls of text, no headings, topic unclear)
2 = Weak (some structure but confusing or inconsistent headings)
3 = Adequate (clear headings and paragraphs, topic identifiable)
4 = Good (clean H1-H2-H3 hierarchy, scannable, purpose obvious)
5 = Excellent (perfect heading outline, semantic HTML, zero noise)

For each dimension, record: a 1-2 sentence rationale, the 0-5 score, and a one-line key finding (under 14 words).

Step 5: Compute the final score

Intelligence score = average(6 dimension scores) × 20 → rounds each 0-5 to 0-100.
Final score = round(0.5 × foundationalScore + 0.5 × intelligenceScore).
Grade from the final score:

Grade	Range	Grade	Range	Grade	Range
A+	95-100	B+	80-84	C	60-64
A	90-94	B	75-79	C-	55-59
A-	85-89	B-	70-74	D	40-54
		C+	65-69	F	below 40

Sanity-check your intelligence score against heuristicIntelligenceSignals in the JSON. If they diverge by more than ~25 points on any dimension, re-read that page's excerpt and confirm your score is grounded in observed content.

Step 6: Write the audit report

Write a Markdown report to <output-dir>/aeo_audit_report.md using the format in Report Format below. Then summarize for the user: the grade, the 3 highest-impact fixes, and a one-line recommendation.

Step 7: Hand off

If the user wants to act on the findings:

To fix a codebase → recommend the improve-aeo-geo skill, passing this report as input.
To re-measure after fixes → re-run this skill on the same URL and compare scores.

The 16 deterministic checks

Run by the script. For reference (id — what it verifies — points):

Check	Verifies	Pts
`title`	`<title>` present, 10+ chars	10
`meta-description`	Meta description present, 50+ chars	10
`canonical`	`<link rel="canonical">` present	8
`h1`	Exactly one `<h1>`	8
`schema`	At least 1 JSON-LD block	8
`schema-types`	A recognized schema.org `@type` is used	8
`og`	`og:title` and `og:description` present	8
`internal-links`	5+ internal links	10
`image-alt`	80%+ of images have alt text	8
`text-depth`	250+ words of body text	12
`indexability`	No `noindex` directive	10
`ai-meta-tags`	No `nosnippet` / `noai` / `noimageai`	6
`heading-hierarchy`	2+ heading levels, no skipped levels	6
`llms-txt`	Valid `llms.txt` (heading + links + 100+ chars)	10
`ai-bot-access`	robots.txt does not block 9 major AI crawlers	12
`rss-feed`	RSS or Atom feed discoverable	8

A site-wide check passes when 80%+ of crawled pages pass it (the script handles aggregation). Foundational score = earned points ÷ 142 × 100.

Report Format

# AEO/GEO Audit — [domain]

**Audited:** [date] · **Pages crawled:** [N]

## Score

| | Score | |
|---|---|---|
| Foundational (16 checks) | XX/100 | |
| Intelligence (6 dimensions) | XX/100 | |
| **Final** | **XX/100** | **Grade: X** |

[One-sentence verdict on AI-citation readiness.]

## Foundational Checks

[Table of the 16 checks: ✓/✗, label, detail. Group failures at the top.]

## Intelligence Evaluation

For each of the 6 dimensions: score (X/5 → XX/100), rationale, key finding.

## Prioritized Fixes

Numbered list, highest impact first. For each: what to change, why it matters,
impact/effort. Pull from `prioritizedFixes` and your dimension findings.

## Weakest Pages

[From `worstPages` — URL and per-page %.]

## Recommendation

[2-3 sentences: biggest opportunity, and whether to run improve-aeo-geo next.]

Rules

Never fabricate the crawl. Always run the script. If it fails, report the failure — don't invent pages or scores.
The foundational score is the script's output. Don't recompute or adjust it.
Score intelligence only from observed content. Base every dimension score on textExcerpt / headings / metadata in pagesForReview. No assumptions about unseen pages.
Rationale before score. Write why, then the number — for every dimension.
One report file, saved to the output directory. Don't scatter partial outputs.
This skill diagnoses; it does not edit code. Code fixes are the job of improve-aeo-geo.

Research References

All statistics above are from verifiable primary research:

Claim	Source
Quotations = +41% visibility; Statistics = +33%; Cite Sources = +28%; in-text citations = +115% for lower-ranked sites	Aggarwal et al., "GEO: Generative Engine Optimization," KDD 2024 (arXiv)
44.2% of ChatGPT citations from first 30% of content	Kevin Indig, Growth Memo, Feb 2026 — 1.2M AI answers
Comparison tables 2.8x citations; FAQ blocks +156%	AirOps, 2025 — structuring content for LLMs
Clean heading hierarchy = 3.2x more citations vs unstructured	AirOps, 2025
76% of ChatGPT's most-cited pages updated within 30 days; AI cites content 25.7% fresher than organic	Ahrefs, 2025 — 17M citations across 7 AI platforms
Long-form (2000+ words) gets 3x more citations	SE Ranking, Nov 2025 — 2.3M pages, 295K domains