name: static-seo
description: >
Audits and improves SEO for static HTML sites. Use when the user asks to
audit, set up, or improve SEO on a static site (Hugo, Jekyll, 11ty,
Gatsby, Next.js static export, hand-rolled HTML, or wp-static-clone
output), or mentions head metadata, structured data, JSON-LD, sitemaps,
IndexNow, Open Graph images, schema endpoints, NLWeb, hreflang, or
search engine indexing for a static site. For Astro projects, use
astro-seo instead — its recipes produce less hand-rolled boilerplate
by routing through @jdevalk/astro-seo-graph.
Static SEO
Audits and improves the SEO setup of a static HTML site against nine categories — head metadata, structured data, content quality, Open Graph images, sitemaps and indexing, agent discovery, performance, redirects, and CI validation. Recipes are platform-neutral: raw <meta> tags, raw JSON-LD, hand-rolled sitemap.xml, generic CI tooling. Audit framework parallels astro-seo but without the @jdevalk/astro-seo-graph spine.
Code recipes live in AGENTS.md — read it when you need to implement a specific fix. This file has the workflow and audit checklist.
Workflow
- Detect the project — confirm this is a static site, identify the build tool, find where to apply changes.
- Audit — score nine categories and produce actionable findings.
- Improve — generate or modify files to close the gaps. Recipes are in
AGENTS.md. - Metadata pass — invoke
metadata-checkon every short string the skill generated (titles, descriptions, schemadescriptionfields, FAQ answers). - Verify — run any build, validate the output, remind the user about non-file tasks (Search Console, Bing Webmaster Tools, IndexNow key submission).
Phase 0: Detect the project
Confirm the basics before auditing:
- It is actually a static site. Look for built HTML —
index.htmlat the repo root, or under_site/(Jekyll),public/(Hugo / 11ty / Gatsby),dist/(Vite / Astro static),out/(Next.js withoutput: export). If everything is server-rendered (Express, PHP, dynamic routes), this is the wrong skill — point atastro-seofor Astro,wp-readme-optimizerfor WP plugin pages, or recommend a generic SEO audit instead. - Source build tool. Drives where to apply head metadata changes:
_config.yml/_layouts/→ Jekyllconfig.toml/config.yaml/themes/→ Hugo.eleventy.js/_includes/→ 11tygatsby-config.js/src/components/SEO.*→ Gatsbynext.config.jswithoutput: 'export'andpages/_document.tsx→ Next.js static exportastro.config.mjs→ Astro (in which case useastro-seoinstead)- No build config, just HTML files → hand-rolled or scraped (e.g.
wp-static-cloneoutput). Edit the HTML directly or use a post-process script.
- Canonical site URL. Search the repo for the production origin — typically in
_config.yml,config.toml,astro.config, or hardcoded into a layout. If it's missing, empty, orlocalhost, flag as a blocking issue before anything else. Canonicals, sitemaps, OG image URLs, and JSON-LD@idvalues all derive from this. - Deployment target. Read
vercel.json,netlify.toml,wrangler.toml, orpublic/_headersto determine the host. Drives redirect and header syntax in Phase 2. - Is the site multilingual? Check for locale subdirectories (
/en/,/nl/,/de/) or build-tool i18n config. If yes, hreflang matters; if no, skip it. - What's already in
<head>? Open one built HTML file and inventory: title, description, canonical, robots, Open Graph, Twitter cards, JSON-LD, hreflang. The audit in Phase 1 is faster if you can reference what's there.
Ask only what you can't detect.
Phase 1: Audit
Score each category out of 10. For each, give 2–4 specific findings that quote the actual HTML, config, or template. Within each category, checks are tiered:
- Must — ship blockers. A failure causes visible SEO regression.
- Should — standard practice. Skipping costs reach.
- Nice — forward-looking or situational. Useful but not baseline for every site.
Skip Nice checks for small personal sites unless the user asks for the full treatment.
1. Head metadata (/10)
- Must — every page has a
<title>and<meta name="description">. - Must —
<link rel="canonical">set, with tracking parameters stripped, derived from the production origin. - Must — canonical omitted when the page is
noindex(per Google's recommendation). - Must —
<title>length 30–65 characters,<meta name="description">length 70–200 characters (the SERP-truncation bounds; same asmetadata-checkdefaults). - Should —
<meta name="robots">includesmax-snippet:-1,max-image-preview:large,max-video-preview:-1. - Should — Open Graph (
og:title,og:description,og:image,og:url,og:type,og:site_name) on every page. - Should — Twitter Card tags (
twitter:card) suppressed when they duplicate Open Graph (Twitter falls back automatically). - Should —
hreflangalternates on multilingual sites. Skip if monolingual. - Should — single
<meta name="robots">and singleog:imageper page (legacy tooling sometimes emits duplicates).
2. Structured data / JSON-LD (/10)
- Must — at least one
<script type="application/ld+json">block on every important page. - Should — linked
@graphrather than a flatArticleobject — entities wired with@idreferences so aBlogPostingcan point at itsPersonauthor andWebPageparent. - Should —
WebSite,Blog/WebPage,Person/Organization,BlogPosting/Article,BreadcrumbList,ImageObjectall present where relevant. - Should — trust signals:
publishingPrinciples,copyrightHolder,copyrightYear,knowsAbout,SearchAction. - Must — validates in Rich Results Test and ClassySchema.
3. Content quality (/10)
- Must — every page has a unique
<title>and<meta name="description">(no duplicate metadata across the site). - Must — exactly one
<h1>per page. - Should —
<title>and<meta name="description">audited viametadata-check(Phase 2.5) for front-loading, concreteness, and SERP fit. - Should — body prose audited via
readability-checkfor individual long-form posts (don't bulk-audit). - Should — every
<img>has analtattribute (oralt=""/role="presentation"for decorative). - Should — internal links use root-relative paths (
/foo/) not absolute (https://...) — survives domain changes and previews.
4. Open Graph images (/10)
- Must — every page has an OG image.
- Must — 1200×675 (Google Discover minimum 1200px wide, 16:9 ratio).
- Should — JPEG (social platforms don't reliably support WebP / AVIF for OG).
- Should — generated at build time (satori-cli, sharp, ImageMagick, Vercel OG, Bannerbear) rather than uploaded manually per page — prevents drift.
- Should — URL derived deterministically from the slug (e.g.
/og/<slug>.jpg) so adding a page automatically gets an image. - Should — every
<img>in body content has analtattribute (oralt=""/role="presentation"). Validated in CI viahtml-prooferor similar.
5. Sitemaps and indexing (/10)
- Must —
/sitemap.xml(or/sitemap_index.xmlfor split sitemaps) reachable, valid XML, every URL returns 200. - Must —
robots.txtpresent at the site root and references the sitemap. - Must — RSS feed exists, advertised via
<link rel="alternate" type="application/rss+xml">, contains full post content (not truncated excerpts). - Should — split per-section if the site has multiple content types (
sitemap-posts.xml,sitemap-pages.xml) — easier to debug indexing in GSC. - Should —
<lastmod>populated from git commit timestamps (most accurate), build timestamps (acceptable), or frontmatter dates (last resort). Filesystemmtimefrom CI checkout is wrong — it's the checkout date, not the content date. - Should — IndexNow integrated. Verification key as a static
/<key>.txtroute at the site root, plus a build-or-deploy hook that POSTs new URLs tohttps://api.indexnow.org/IndexNow. Gate the submission on the production host (e.g.CF_PAGES_BRANCH=main,VERCEL_ENV=production,CONTEXT=production) — unconditional submission pings the endpoint from local builds with URLs the production host hasn't served yet, which gets the key marked invalid (403) and forces rotation.
6. Agent discovery (/10)
- Should — schema endpoints (
/schema/<type>.json) exposing corpus-wide JSON-LD per content type. Static JSON files committed to the repo or regenerated by the build. - Should — schema map (
/schemamap.xml) listing every schema endpoint, withSchemamap:directive inrobots.txt. - Should —
llms.txtat the site root listing pages (title + description) for LLM consumers. Static text file; one line per page. - Should — markdown-alternate URLs (
/blog/post.mdnext to/blog/post/) serving clean markdown with YAML frontmatter so AI agents can consume content without HTML parsing. Either commit.mdsiblings, or use a Cloudflare Transform Rule +Vary: Acceptfor content negotiation (CF Pages strips custom Vary headers, so use the Transform Rule's URL rewrite alone — recipe inAGENTS.md). - Should — API catalog at
/.well-known/api-catalogper RFC 9727, asapplication/linkset+json(RFC 9264). Lists schema endpoints, schemamap, RSS feed, and any site-specific APIs. Static JSON file. - Should — Content Signals directive in
robots.txt(Content-Signal: ai-train=yes, search=yes, ai-input=yesor your preferred policy). One line, IETF draft, low cost. - Should —
Linkheader on/*pointing to discovery files (sitemap, llms.txt, api-catalog, schemamap). Agents reading response headers find them without parsing HTML. Host-specific (_headerson Cloudflare Pages / Netlify,vercel.jsonon Vercel, server config elsewhere). - Nice — MCP server card at
/.well-known/mcp/server-card.jsonand / or A2A agent card at/.well-known/agent-card.json. Only relevant when the site exposes an MCP server or A2A agent. - Nice —
<link rel="nlweb">pointing to a conversational endpoint. NLWeb adoption is early; the tag is one line and worth having, but not a scoring blocker in 2026. - Nice — ARD (Agentic Resource Discovery) catalog at
/.well-known/ai-catalog.json, listing what the domain offers (MCP server, A2A agent, OKF bundle, site-specific APIs). v0.9 draft, so optional — but it's the discovery layer the MCP / A2A cards and OKF bundle pay off through. Two gotchas: the base spec (Agent-Card/ai-catalog) names the media-type fieldmediaTypewhile the ARD layer (ards-project/ard-spec) names ittype— emit both keys with the same value so the entry validates under either reading (both specs require consumers to ignore unknown keys). ARD also adds an optionalrepresentativeQueriesarray of sample prompts per entry. Static JSON file. - Nice — OKF (Open Knowledge Format) bundle: a tree of typed Markdown concept files (one per page, paths mirroring canonical URLs) packaged as a single
.tar.gz, with index files in between. Serving and discovery are explicit non-goals of OKF — that's what the ARD catalog entry is for. v0.9 draft, optional. There's no registered media type yet, so the interim string isapplication/okf-bundle+gzip(tracked in knowledge-catalog#111 and ard-spec#27); mark it interim, it may change. Best generated from the same source the rest of the site derives from rather than a hand-maintained copy.
7. Performance (/10)
- Must — hashed assets serve
Cache-Control: public, max-age=31536000, immutable(or equivalent) so they only download once. - Should — primary web font preloaded as
woff2via<link rel="preload" as="font" crossorigin>. - Should —
No-Vary-Searchresponse header strips UTM parameters from cache key (so/?utm_source=xand/share a cache entry). - Should — no render-blocking JavaScript on first paint (defer or async non-critical scripts).
- Should — images use modern formats (WebP / AVIF for in-page content; JPEG only for OG cards) and
loading="lazy"below the fold. - Should — all Lighthouse / PageSpeed Core Web Vitals in the green: LCP < 2.5s, INP < 200ms, CLS < 0.1.
8. Redirects and error handling (/10)
- Must —
_redirects/vercel.json/ server config maintained for every URL that ever existed and moved. - Must —
301not302for permanent moves. - Must —
404.html(or equivalent) returns a 404 status, not 200. Verify withcurl -Iagainst a deliberately wrong URL. - Should — fuzzy-match suggestion on the 404 page ("did you mean…?") for typos in URLs. Static implementation: pre-build a JSON index of valid slugs, fuzzy-match client-side.
9. CI validation (/10)
- Must — broken-link checker in CI. linkinator or lychee on every push that touches content. Internal and external links both — internal catches build regressions, external catches link rot. Schedule a weekly run for external-only checks.
- Should — HTML validation via html-proofer or W3C Validator — catches malformed markup that breaks crawlers.
- Should — Lighthouse CI on every push, with score thresholds (e.g. SEO ≥ 95, Performance ≥ 80).
- Should — JSON-LD validation in CI via the Schema.org JSON-LD validator (no official CLI; pipe through a Node script that POSTs to the API).
- Should — title and description length validation in CI: a small script that walks built HTML, extracts
<title>and<meta name="description">, flags anything outside SERP-truncation bounds (title 30–65, description 70–200).
Phase 2: Improve
Based on the audit, produce concrete code. Always ask before overwriting. Read AGENTS.md for detailed recipes.
Branch on the Phase 0 findings. If the site is a known build tool (Hugo, Jekyll, 11ty, Gatsby, Next.js), apply changes at the source level (templates, config, frontmatter). For hand-rolled or scraped HTML, post-process the built output. The recipes in AGENTS.md are output-shape recipes — show the HTML you want to land at — and a per-tool note on where to plumb them through.
AGENTS.md sections: Head metadata (the canonical block), JSON-LD graph, OG image generation (build-script options), sitemap.xml and robots.txt, IndexNow, llms.txt, Markdown alternates, API catalog, Content Signals, Link headers, Performance headers, Redirects by host, CI workflows.
Phase 2.5: Metadata and readability pass
Invoke the metadata-check skill on every short string the skill generated or modified: page titles, meta descriptions, schema description fields, FAQ answers, and any frontmatter excerpt values you wrote. It checks front-loading, concreteness, filler, active voice, title / description duplication, difficult words, SERP-truncation fit (title 30–65, description 70–200), and one-idea-per-field. Apply fixes directly. Skip the pass for technical strings (URLs, schema @id values, enum values).
If the project has a blog or docs section, mention as a follow-up that the readability-check skill can audit individual posts for multi-paragraph prose quality — but don't audit the entire content corpus yourself.
Phase 3: Verify
- Build the site (or rebuild if mid-iteration). Surface any build warnings.
- Spot-check the built HTML: one page's
<head>should be clean, canonical correct, JSON-LD graph present and linked. - Run the homepage through Rich Results Test and ClassySchema.
- Confirm
/sitemap.xmlexists, returns 200, and references the right URLs. - Confirm
/robots.txtreferences the sitemap and includes Content Signals. - If IndexNow is wired, confirm the key verification route returns the key at
/<key>.txt. - Run the local broken-link check (
linkinator http://localhost:8080 --recurse). - Remind the user about tasks that can't be automated:
- Register the site in Google Search Console and Bing Webmaster Tools.
- Submit the sitemap in both.
- Generate an IndexNow key and commit the
.txtverification file. - Install Plausible or equivalent privacy-friendly analytics.
Output format
## Static SEO audit: [site name]
### Score
| Category | Score |
| ------------------------------------- | ----: |
| 1. Head metadata | x/10 |
| 2. Structured data / JSON-LD | x/10 |
| 3. Content quality | x/10 |
| 4. Open Graph images | x/10 |
| 5. Sitemaps and indexing | x/10 |
| 6. Agent discovery | x/10 |
| 7. Performance | x/10 |
| 8. Redirects and error handling | x/10 |
| 9. CI validation | x/10 |
| **Total** | xx/90 |
### Findings
[Grouped by category. Quote actual HTML / config. Be specific.]
### Files generated or changed
[List with short description of each.]
### Next steps
[Non-file tasks: GSC, Bing Webmaster Tools, IndexNow key, analytics.]
Key principles
- Audit the output, fix at the source. The skill checks the built HTML, but recommends changes at whatever layer produces it (Hugo template, Jekyll layout, 11ty include, post-process script). Don't recommend a fix that won't survive the next build.
- Static = host-portable. Every recipe should work on any static host. Where syntax differs (
_redirectsvsvercel.json), give both forms. - Topics, not keyphrases. When reviewing content, focus on topical coverage and readability, not keyword density.
- Agent discovery matters now. Schema endpoints, schema map, llms.txt, API catalog, markdown alternates, MCP / A2A cards — the crawler is no longer the only consumer.
- Defer to
astro-seofor Astro. That skill produces less hand-rolled boilerplate via@jdevalk/astro-seo-graph.static-seois for everything else.