zbeam-crosslinker - SKILL.md Agent Skill

name: zbeam-crosslinker description: "Adds internal crosslinks to an existing page (target-first, density scaled to depth). Trigger on 'crosslink [slug]' or in a batch quality run."

Z-Beam Crosslinker

You add internal links to an existing page. Work target-first, not keyword-first.

The naive approach — scan the prose and link whatever common words appear — floods every page with steel, aluminum, and a handful of other over-linked hubs, while hundreds of less-known pages get nothing. Invert it: first decide which under-linked, topically relevant pages deserve a link from here, then hunt this page's prose for an interesting, distinctive place to introduce each one.

Two goals:

Spread link equity to a variety of less-known pages — strongly favour targets that currently have few inbound links over the popular hubs. Variety of targets is the point.
Place each link in a distinctive context with a specific anchor — never a bare, over-common anchor like "steel"; prefer a grade number, a full distinctive name, or a specific technical phrase, each sitting in a different section of the page.
Create a context when none exists. When an under-linked target is genuinely relevant but unmentioned, you may write a short, factually true clause that introduces it, then link it (see references/scoring-algorithm.md → Creating a context).

Linking is one-directional by design — no reciprocal or back-link logic. Density scales with page depth (Step 3 budget); links spread across distinct contexts, never clustered.

Input required: slug of the page to crosslink. Output: updated YAML file + diff summary at data/audit/crosslinks-[slug]-[YYYY-MM-DD].md

Modes

Default (manual / reviewed runs): full behaviour — existing-anchor links and created contexts.
existing-anchor-only (autonomous cadence, e.g. weekly-batch): place only existing-anchor links — never create a context. When a relevant under-linked target has no natural anchor, log it to data/audit/crosslink-created-context-candidates-[slug]-[date].json. Everything else (target-first selection, relevance gate, exclusion list, budget) is unchanged.

Step 0: Governance — prior knowledge check + hypothesis write

python3 skills/shared/check-prior-knowledge.py --slug [slug]

If USE_PRIOR or SUPPLEMENT: note prior findings and dead ends. If dead ends present (e.g. "previous crosslink attempt showed no impression lift"), factor into hypothesis.

Infer hypothesis from scoring results and orphan status — never ask the user. Example: "Adding [n] crosslinks to [slug] (orphan score: [x]). Hypothesis: internal linking will reduce orphan status and increase discovery via related pages, improving inbound-link count on targets within 28 days."

python3 skills/shared/change-log-utils.py write \
  --slug [slug] --skill zbeam-crosslinker --mode crosslink \
  --hypothesis "[inferred — specific, testable, 28-day window]"

Skip change-log write if zero links were added (run Step 0 check first; defer write to after Step 5 if budget is unknown).

Step 1: Build the crosslink index

Read references/scoring-algorithm.md for the full cache-first index build (Materials, Applications, Videos), inbound-link ledger construction, and bucket definitions (0–1 under-linked / 2–5 normal / ≥6 saturated). Remove the current page's own slug from the index — never link to self.

Step 2: Select target pages (target-first)

Read references/scoring-algorithm.md → Step 2: Target selection for the full relevance gate, ranking rules, and saturated-hub demotion logic. Produce ~10–15 ranked candidates.

Step 3: Find a distinctive context + anchor for each target

A context is any reader-facing body prose field on the page — links may land in any of them, not just page.description.

Discover contexts generically — do not use a fixed list. Walk the page's YAML and treat every sentence-bearing body string as a candidate: FAQ answers, every heading.description, every *.prose, page.description, problem / standard / regulatory *.description and itemDescription, body.sections[*].items[*].text / .description / .details[*], localContext.regulatoryNotes, enhancements.summary, and any other body field that holds prose.

Never place a link in these (not body prose, or not markdown-rendered):

SEO metadata: page.metaDescription, openGraph.*, twitter.*
Structured data: jsonLd.*, schema.*, any *Schema / videoSchema block
Titles / names / questions: page.title, any *.title, *.name, heading.title, FAQ question, longName, author-bio fields
keywords, citations[*].*, and any URL / slug / id / date / image / alt field

Render dependency: body prose becomes HTML via renderMarkdown (section & heading descriptions, prose, body sections) and parseSimpleMarkdown (FAQ answers — ExpertAnswersPanel); both turn [text](url) into <a href>. Any body field rendered through those is link-safe.

For each ranked candidate, find the single best (context, anchor) pair:

Acceptable anchor (best to worst): grade/model number (316L, 6061); full distinctive name (hastelloy, weld decay, passivation); specific multi-word technical phrase already in prose.
Reject generic anchors: bare steel, aluminum, metal, surface, coating, cleaning, oxide. Drop the target if only a generic anchor is available.
Natural fit: the sentence must already be about the target's topic. Never bend prose to fit a link.
One link per context.

If no distinctive natural anchor exists, read references/scoring-algorithm.md → Creating a context for rules (all mandatory) before writing any new prose.

Assemble the final set:

Walk ranked targets; take each target's best (context, anchor) pair.
Global de-dup: one link per target, one link per anchor phrase; no self-link; no link inside existing [...].
One link per context (second allowed only if context >400 chars, anchors >120 chars apart, different targets).
Per-page budget: ~1 link per 120 words of prose, capped at 10.
Variety check: if more than ~⅓ of links point to the same category, drop the weakest and substitute an under-linked candidate.

Step 4: Apply crosslinks surgically

(a) Existing anchor — replace the matched keyword with a markdown link:

new_text = f'[{matched_keyword}]({target_url})'

(b) Created context — insert the short true clause with the link embedded at the natural position.

Apply replacements last to first by position. Use surgical string replacement (NOT yaml.dump):

with open(page_path, 'r') as f:
    content = f.read()
field_block = extract_field_block(content, field_path)
new_block = field_block.replace(old_keyword_occurrence, new_text, 1)
content = content.replace(field_block, new_block, 1)
with open(page_path, 'w') as f:
    f.write(content)

Validate after each replacement:

python3 -c "import yaml; yaml.safe_load(open('[path]'))" && echo "valid"

If validation fails, revert the replacement and skip it.

Step 5: Generate diff summary

Save to data/audit/crosslinks-[slug]-[YYYY-MM-DD].md:

# Crosslink Report: [slug]
**Date**: [date]
**Links added**: [n]  (budget [b], from ~[w] words)
**Contexts covered**: [c] of [total] distinct contexts on the page
**Target mix**: [x] under-linked (0–1 inbound) · [y] normal · [z] saturated hubs

| Context (field) | Anchor | Target page | Target inbound (before) | URL |
|---|---|---|---|---|
| panels.faq.items[3].answer | 316L | Stainless Steel 316 | 1 | /materials/metal/alloy/stainless-steel-316-laser-cleaning |

**Candidates considered but not selected** (with reason):
- "stainless steel" hub → saturated (8 inbound) and only a generic anchor → skipped

**Remaining opportunities** (for future runs):
- [under-linked targets that were relevant but found no good context this run]

Rendering dependency

Crosslinks render as <a href> because renderMarkdown (BaseSection) and parseSimpleMarkdown (ExpertAnswersPanel, lib/text-formatting.ts) turn [text](url) into links. If either converter's link handling is removed, crosslinks in affected fields display as literal bracket syntax.

What this skill never does

Never exceeds the per-page budget or places more than one link in a single context (save the long-context exception)
Never links generic anchors (steel, aluminum, metal, surface, coating, cleaning, oxide)
Never piles links onto saturated hubs (≥6 inbound) over under-linked targets
Never places an irrelevant link just to spread equity — relevance is a hard gate
Never creates reciprocal or back-links
Never links to external sites
Never fabricates facts when creating a context — a created clause must be true, or the target is skipped
Never creates more than 2–3 contexts per page
Never links to the page being crosslinked (no self-links)
Never uses yaml.dump() — surgical string replacement only
Never runs without YAML validation after each write

⏸ COMMIT GATE

Never auto-commit. Summarize what changed, provide the git command, wait for Todd to run it.