content-expand

name: content-expand description: > Expand and optimize existing articles. Adds new sections, deepens thin content, and applies the same quality pipeline as write-article (anti-slop, BLUF, GEO signals, humanization, multi-schema). Includes web search research about the topic and auto-discovery of internal links from your content collection. Replaces content-optimize. Flags: [file], --dry-run, --lang, --tone, --site-dir, --domain, --skip-research, --light. codesift_tools: always: - analyze_project - index_status - index_folder - index_file - plan_turn - get_file_tree - search_text

by_stack: {}

zuvo:content-expand — Expand & Optimize Articles

Take an existing article, research the topic, add what's missing, and optimize the result — all in one pass. Same quality standards as zuvo:write-article.

Scope: Expand thin content, add missing sections, research and integrate new facts, optimize prose quality, update SEO/schema. Out of scope: Encoding/markdown fixes (zuvo:content-audit + content-fix), writing from scratch (zuvo:write-article), CMS API, PDF/Word.

Mandatory File Loading

Phase 0 (always):

../../shared/includes/env-compat.md -- Agent dispatch
../../shared/includes/run-logger.md -- Run logging
../../shared/includes/banned-vocabulary/core.md -- Shared anti-slop rules, tone matrix, G12, fallback behavior
../../shared/includes/humanization-rules.md -- Writing constraints + voice matching
../../shared/includes/domain-profile-registry.md -- 17 niche profiles
../../shared/includes/seo-page-profile-registry.md -- SEO profiles
../../shared/includes/no-pause-protocol.md -- HARD: no mid-loop pauses (batch/multi-section)

Lazy-loaded (Phase 2 only):

../../shared/includes/prose-quality-registry.md -- PQ1-PQ18 (scoring)
../../shared/includes/adversarial-loop-docs.md -- Cross-model review (end of Phase 2)
../../shared/includes/article-output-schema.md -- JSON output
../../shared/includes/retrospective.md -- RETRO PROTOCOL

Print CORE FILES LOADED: for items 1-6. After language detection, print LANGUAGE FILE: with the resolved banned-vocabulary language file and whether English fallback was used. Lazy items loaded inline when needed.

Safety Gates

GATE 1 — File Type

.md, .mdx supported. .html tolerated (tag-stripped, WARNING). Binary → STOP.

GATE 2 — Dirty File Check

git status --porcelain -- <file>. Uncommitted changes → STOP. Require commit or stash. --dry-run mode: proceed with WARNING (no file mutation).

GATE 3 — Write Scope

Allowed: input file, <file>.content-expand-backup, zuvo/reports/. FORBIDDEN: everything else.

Arguments

Argument	Behavior
`[file]`	Required. Path to `.md`/`.mdx` file to expand
`--dry-run`	Show proposed changes without writing. Print before/after diff
`--lang <code>`	Language override (default: auto-detect)
`--tone <value>`	`casual` / `technical` / `formal` / `marketing` (default: auto-detect from existing text)
`--site-dir <path>`	Content collection root — enables internal link discovery + voice matching
`--domain <niche>`	Override domain detection (one of 17 niche IDs from `domain-profile-registry.md`)
`--skip-research`	Skip web search; expand using existing content + LLM knowledge only
`--rewrite`	Allow rewriting existing sentences that have concrete defects (factual errors, grammar, outdated info). Without this flag, existing text is never modified.
`--light`	Skip reporting, backlog, knowledge curation. Just expand and write.

Phase 0 — Read & Score

0.1 Validate + Extract

File type validation (GATE 1). Dirty file check (GATE 2).
Parse frontmatter. Mark mutable fields (title, description, keywords, author). All others immutable.
Extract protected regions: fenced code blocks, MDX components. Store with position markers.
Language detection: --lang → frontmatter → content analysis → fallback structural-only.
Resolve the active language file from ../../shared/includes/banned-vocabulary/languages/ using the normalized base code. If missing: load en.md and emit WARNING: banned-vocabulary fallback -> en.

0.2 Domain Detection

Cascade: --domain → scan --site-dir articles for niche signals per domain-profile-registry.md → fallback general. YMYL niches emit credentials WARNING.

0.3 Voice Profile

If --site-dir has 3+ blog articles: extract voice profile per humanization-rules.md (rhythm, person, formality). Inconclusive → default rules only.

0.4 Score

Dispatch prose-quality-scorer + structure-analyst agents in parallel per env-compat.md. Score across 6 dimensions (PQ1-PQ17 from prose-quality-registry.md). Short articles get the same scrutiny as long ones — every sentence matters more when there are fewer of them.

Record: before-scores, tier, weak sections, thin sections (<100 words), missing elements per niche profile.

0.5 Internal Link Discovery

If --site-dir provided: scan the content collection (Glob *.md + *.mdx), read titles/descriptions of 10-20 articles, identify 5 most related by topic overlap. These become internal link candidates for Phase 2.

Print:

SETUP: [file] | [lang] | [tone] | Domain: [niche] | Voice: [matched|default]
SCORE: [composite]/100 ([tier]) | Weak: [N sections] | Thin: [N sections]
LINKS: [N candidates found | no site-dir]

Phase 1 — Research

Skip if --skip-research set. Print: "Research skipped. Expanding from existing content + LLM knowledge."

1.1 Topic extraction

From article content + frontmatter, identify: primary topic, subtopics covered, subtopics missing (from Phase 0 scoring).

1.2 Web search

Perform 2-4 targeted web searches about the topic — NOT competitor analysis, but factual research for expansion:

Facts, statistics, recent developments about the topic
Specific details the article is missing (from Phase 0 thin-section analysis)
Named sources, dates, version numbers for entity grounding

1.3 Fact sheet

Assemble research into a structured fact sheet (same format as write-article topic-researcher output). Each fact has source attribution + year. Tag conflicts as [CONFLICT].

If web search unavailable: note research_limited: true, proceed with LLM knowledge only (clearly marked).

Phase 2 — Expand + Optimize

This is the core phase. Expand THEN optimize in one pass — no separate faz.

2.1 Backup

Copy original to <file>.content-expand-backup. All work on temp copy. --dry-run: skip backup, work in memory only.

2.2 Expand (PRESERVE ORIGINAL TEXT)

DEFAULT: Do not rewrite existing text. Rewriting human prose with AI prose triggers detection on the entire article. Expand by ADDING content around what exists.

What you CAN always do:

Add new paragraphs between or after existing ones
Add new H2/H3 sections for missing subtopics
Add intro/conclusion if missing

What you CANNOT do without --rewrite:

Rephrase existing sentences
Merge or reorder existing paragraphs
Change existing word choices or headings

With --rewrite: Rewrites are allowed ONLY for sentences with a concrete defect: factual error, grammatical mistake, nonsensical claim, or outdated information. Each rewrite must state the reason (e.g., "Fixed: claimed population is 5M, actual is 8.3M per 2025 census"). Style preferences ("sounds better") are NOT a valid reason.

For ALL new content, apply humanization-rules.md constraints:

Sentence variation (fragments + long), contractions, parenthetical asides
Entity grounding (specific versions, dates, names from research)
Voice matching if profile available — match the existing article's voice, not your default
No throat-clearing (G12), BLUF per section (G9), max 300 words/section (G6)
Stats and volatile practical facts must remain traceable to the fact sheet (G11)
Do not build paragraph rhythm around repeated source lead-ins such as Według X (2025)
Use a hard source-name budget: the same full institution/source name should appear no more than once per section and no more than 3 times in the whole article body, excluding a compact ## Źródła section
Use (odczyt: miesiąc rok) only for volatile practical facts, not for stable historical description
Do not append a process-heavy research appendix such as Źródła wykorzystane przy aktualizacji
End public articles with a compact ## Źródła section unless the host project explicitly forbids visible sources
In ## Źródła, keep 3-6 grouped bullets max: source title + link only
Group repeated institutions into one bullet whenever possible, e.g. - **APSARA National Authority:** [Temple page](...), [Restoration update](...)
Do not add filler closers like ## Na koniec if they only restate the article
Prefer practical expansion blocks (specific H2s, lists, route notes, local comparisons, FAQ at the end) over replacing the opening with smoother narrative prose
For practical/service-intent articles, add or preserve a short ## W skrócie block immediately after the italic lead and before the first image or H2. Keep 3-5 bullets max.

Use these exact output shapes when applicable:

## W skrócie
- [Najważniejsza decyzja / odpowiedź]
- [Najważniejszy wymóg / koszt / limit]
- [Najważniejsza opcja / wyjątek]
- [Najważniejszy warunek praktyczny]

## Źródła
- **[Instytucja / grupa źródeł]:** [Tytuł 1](...), [Tytuł 2](...)
- **[Instytucja / grupa źródeł]:** [Tytuł 3](...)
- **[Instytucja / grupa źródeł]:** [Tytuł 4](...)

2.3 Internal Links

Insert 2-5 contextual internal links from Phase 0 candidates. Validate each via Glob. Unverified → [UNVERIFIED LINK], not auto-inserted.

2.4 SEO + Schema

Read domain-profile-registry.md for niche-appropriate schema:

Detect existing @type. If specific (Recipe, HowTo, Event) → preserve and merge. If BlogPosting only → upgrade per niche.
FAQ: if article now has Q&A content + niche allows FAQ → add FAQPage schema.
Inspect the local content schema/config before mutating frontmatter. Treat the schema as the source of truth for allowed fields.
OG tags: write og:title, og:description, og:type, og:image only if those fields are clearly accepted by the local schema. If not, inherit layout-level OG behavior and record OG: inherited-from-layout.
dateModified / modifiedDate / updatedDate: Find the exact modification-date field already accepted by the collection schema (common names: modifiedDate, updatedDate, dateModified, lastmod, updated) and update that field. If no such field exists and schema support is unclear, do not invent modifiedDate; leave frontmatter unchanged and record Date field: unchanged (schema-blocked). Never touch publishDate / date / publishedDate — that's the original publication date.
Meta title/description: only update if currently MISSING. Never rewrite existing meta that the author wrote.

2.5 Anti-slop Review

Run anti-slop check on expanded content (same as write-article Phase 4):

Hard/soft banned vocabulary per banned-vocabulary/core.md + active language file + --tone
G12 anti-patterns (throat-clearing, superlatives, keyword density)
BLUF compliance (G9), chunkability (G6), citation compliance (G11)
Dispatch anti-slop-reviewer agent (read ../../skills/write-article/agents/anti-slop-reviewer.md) for all articles regardless of length.

Apply this review to human-facing prose only. Do not count frontmatter keys, schema-only metadata, file paths, URLs, image names, JSON-LD/schema, code, or raw source lists as banned-vocabulary violations unless the task explicitly asks to audit those zones as prose.

Fix all CRITICAL violations. Fix WARNINGs if localized.

2.6 Re-score + Rollback

Score expanded article (same 6 dimensions). If ANY dimension regressed → revert that section's changes. Never deliver a lower composite score.

2.7 Adversarial Review

Load adversarial-loop-docs.md now. Run: adversarial-review --json --mode article --files "<temp-file>" (fallback: --json --mode audit + WARNING). CRITICAL → fix. WARNING → fix if localized. If the script returns status: "timeout" or exits 124, record Adversarial review: skipped (timeout) and continue without blocking the article.

2.8 Replace Original

Protected regions re-inserted. Frontmatter immutable fields preserved. Replace original with expanded temp copy. Delete backup. --dry-run: print diff only, no file changes.

Phase 3 — Output

Before/After Summary

CONTENT-EXPAND COMPLETE
-----
File: [path]
Words: [before] → [after] (+[N] added)
Score: [before]/100 ([tier]) → [after]/100 ([tier])
Domain: [niche] | Schema: [type(s)]
Sections added: [N] | Sections expanded: [N]
Internal links: [N] added ([N] verified)
FAQ: [N items | none] | OG: [present | inherited-from-layout | schema-blocked]
Voice: [matched | default] | Research: [N facts used | skipped | limited]

Run: <ISO-8601-Z>	content-expand	<project>	-	-	<VERDICT>	<TASKS>	3-phase	<NOTES>	<BRANCH>	<SHA7>	<INCLUDES>	<TIER>
-----

Append via wrapper (REQUIRED). Never >> directly to ~/.zuvo/runs.log — the wrapper is the gate that verifies a retro entry exists for this run. Order: retro bash executed → wrapper invoked → completion claimed.

printf '%b\n' "$RUN_LINE" | ~/.zuvo/append-runlog

Expected stdout: OK: appended to runs.log (retro verified for <skill> on <project>). If exit 2 with RETRO_REQUIRED — go execute the retro bash from retrospective.md first; never bypass with ZUVO_SKIP_RETRO_GATE=1. After the wrapper succeeds, print a Logs: evidence line (tail -1 ~/.zuvo/retros.log, grep -c "^" ~/.zuvo/retros.md, tail -1 ~/.zuvo/runs.log) before claiming completion. Printing the markdown retro section without executing the bash leaves all three log files empty.

VERDICT: PASS (expanded, no regressions), WARN (research_limited or voice delta MED+), FAIL (adversarial blockers), BLOCKED (dirty file, binary input). NOTES: [file basename] [before]->[after] +[N]words [tier] (max 80 chars).

Reporting (skip with `--light`)

Unless --light: write report to zuvo/reports/content-expand-YYYY-MM-DD.md. Write JSON per article-output-schema.md. Run knowledge curation per knowledge-curate.md. Persist findings per backlog-protocol.md.

Retrospective (REQUIRED)

Follow retrospective.md. Gate check → structured questions → TSV emit → markdown append.

Next steps: zuvo:review [file] | zuvo:content-audit [file] | zuvo:ship