name: brand-docx description: >- Brand-aware Word engine. Use to (1) EXTRACT a company's brand from a Word template into a reusable "Brand Profile", (2) COMPREHEND the template with the model (optional), (3) VERIFY it, (4) GENERATE new on-brand .docx documents FROM a saved profile. Trigger on "extract our brand", "learn/match this template", "use our brand kit", "generate a branded report from our profile", or when a ./brand-kit exists. For one-off Word edits with NO saved brand profile, use the docx skill instead. NOT for .pptx (brand-pptx), .xlsx (brand-xlsx), or PDFs.
brand-docx
Use this skill when the user wants a reusable Word brand kit or wants to create
a new on-brand .docx from a company template and variable content.
This is an AI-agent skill for Codex and Claude Code. The user should not need to
write JSON or run shell commands. The agent converts the user's content into an
IntermediateDocument, invokes the internal engine, verifies the output, and
returns the generated .docx.
The seven verbs: three deterministic + four model-assisted
Every brand skill (brand-docx, brand-pptx, brand-xlsx) implements the same
contract. The deterministic core is extract / verify / generate; on top of it
sit the optional learning verbs comprehend / learn / propose-overrides /
refine, each fail-closed (the engine validates every proposal and authors every
value).
| Verb | Input | Output |
|---|---|---|
| extract | a company .docx template |
a reusable Brand Profile |
| comprehend (optional, model-driven) | a saved profile + a model-authored comprehension.json |
the profile with a validated, cached comprehension block |
| verify | a saved Brand Profile | QA findings + a verdict |
| generate | content (an IntermediateDocument) + a profile | a new on-brand .docx |
| learn (deterministic distillation) | the profile's cross-run generation history | recurring QA findings distilled into shell-frozen overrides, advisory until --accept |
| propose-overrides (model-driven) | the recurring remainder learn could not bind + a model-authored proposal |
shell-backed corrections through the same fail-closed sink, advisory until --accept |
| refine | end-of-generation user feedback (text or a screenshot) as a refinement.json delta |
the existing comprehension overlaid for FUTURE generations, advisory until --accept |
comprehend is optional: generate works on the deterministic profile alone.
When a current comprehension is present, generate additionally reconciles the
template's preserved cover/index structures with the new content. See
reference/comprehension.md for the full step.
Hard Rules
- Treat
python scripts/cli.py ...as an internal engine command, not the user-facing workflow. scripts/cli.pyis a LAUNCHER that locates the engine root by itself: it works from this skill folder AND from the repo/plugin root (setBRAND_DOCS_ROOTto override). Never guess deeper paths likescripts/brandkit/....- Run the dependency preflight before starting extract / comprehend / verify / generate, and report missing or unusable dependencies before proceeding.
- Extract opens the source template read-only and saves
brand-kit/<name>/template/shell.docxbyte-for-byte. - Generate opens the saved shell and resolves every semantic block through
profile.json. - Do not put style names, colors, fonts, or brand identifiers in an IntermediateDocument.
- If the user did not provide a template or enough content, ask for the missing input.
- Return the generated file path plus a QA summary.
- Consult
profile.json.artifact_catalogbefore generation when the user asks to mimic a specific piece of the template.
Preflight (always first)
Before doing any work, run:
python scripts/cli.py doctor
Use its output to decide the run mode:
- If a required Python dependency is missing, install/repair it before extraction or generation; the core engine is not ready.
- If only visual renderers are missing or unusable (
sofficepluspdftoppmor optional PyMuPDF/fitz), the core L0 workflow can still run, but a full visual audit cannot be claimed. Tell the user what is missing, include the install/repair hint printed bydoctor, and either proceed with degraded QA or install the renderer first. - If optional OCR (
tesseract) is missing, the visual audit can still run, but rendered residual-text proof is incomplete. Report that limitation when judging stale placeholders or field caches. - For
--qa deepor--qa strict, prefer repairing/installing renderers before generation. If the environment cannot run them,deepgenerates a degraded manifest andstrictfails with a visual proof blocker.
Agent Workflow
- Run the dependency preflight above and report any degraded capability.
- Determine the brand name and locate the user-provided
.docxtemplate. - If no matching
brand-kit/<name>exists, extract one. - Comprehend the template (optional, model-driven; see below). Skip when a current comprehension is already cached or no model is available.
- Convert the user's requested content into
IntermediateDocumentJSON. - Generate the
.docxwith the internal engine. - Run QA and report any warnings honestly.
- Feedback (only after returning the file): invite a refinement of the understanding for future documents (see below).
Before generation, inspect profile.json.artifact_catalog when the user asks
to mimic a specific template piece. It records OOXML parts, media parts,
paragraph/table styles, style details, sections/margins, paragraph samples,
and table counts.
Authoring the IntermediateDocument
The IDoc is where "correct document" becomes "great document". Author it role-first, against the profile, never style-first:
- Read
brand-kit/<name>/PROFILE.mdbefore writing a block. It lists the role table (with scope, placement, required slots), the brand palette tokens, and the template's structural order. Choose every block by MEANING from that table; the engine resolves it to the template's own artifacts. - Respect the skeleton. Cover fields first (
cover.title,cover.subtitle, extra coverfields), atocblock only where the template keeps one, then the freeform body. - Shape the body for the reader. One
headinglevel 1 per major section and a real 1-2-3 hierarchy below it (the TOC regenerates from exactly these headings); paragraphs of 2-5 sentences;listblocks for enumerations (bullet for unordered, number for sequences); everytableandimagewith acaptionblock so derived indexes regenerate from real content;calloutsparingly, for genuinely load-bearing notes;quoteonly for actual quotations. - Color discipline. The named style already carries the brand: the
default is NO run color. When emphasis is truly needed, reference a palette
role (
primary,text, ...) or a theme slot (accent1) from PROFILE.md, never a hex. - Reuse before re-deriving. When a comprehension is present, prefer its
component/sectionfragments (comprehension.fragmentsinprofile.json) with{{slot}}values over hand-building a recurring layout. - Never name a style, font, size, or hex. If a block needs something the role table cannot express, say so in the QA summary instead of inventing formatting: the resolver is the only author of values.
Feedback (end of generation)
Ask for feedback only after you have returned the generated .docx and its
QA summary - never before or during generation. Invite the user to reply with
text or a screenshot of the document, and name the roles, palette colors, and
sections you actually used so the answer is concrete. A screenshot is your own
multimodal read; the engine only ever ingests the structured JSON delta you
distil from it.
Turn the answer into a small refinement delta of verbatim ids and merge it with
the refine verb (see reference/comprehension.md):
python scripts/cli.py refine --name <brand> --input refinement.json --accept
A refinement improves FUTURE generations of this brand only - it mutates the
saved profile, never the .docx you just produced. To apply it, generate again.
When the SAME QA finding recurs across runs, you can also propose a shell-bound
correction with propose-overrides: the comprehend-input bundle surfaces the
recurring generation_history, and you NAME a shell-backed re-point (a stub role to
an existing healthy role, a number_format mask the shell uses, or a captured demo
value) that the engine binds fail-closed (see
reference/comprehension.md). It is advisory until
--accept, improves FUTURE generations only, and every live correction surfaces
as an INFO override_applied finding in QA.
Internal Extract
python scripts/cli.py extract --name <brand> --template <template.docx> --scope project
Internal Comprehend (optional, model-driven)
Read reference/comprehension.md for the full guidance, the six questions, and the anti-overfitting directive. In short:
python scripts/cli.py comprehend-input --name <brand> # prints {facts, excerpt} for the model
python scripts/cli.py comprehend --name <brand> --input comprehension.json # the ONLY writer
Skip this verb when comprehension.status is present and its
source_shell_sha256 equals the live provenance.shell.sha256 (a current
comprehension is already cached). A re-extract resets it to absent; re-run
comprehend only then. Never re-run it at generate time.
docx readiness. The Word extractor surfaces cover anchors, TOC/list fields when present, and text regions. A current comprehension can therefore steer cover fill, index regeneration, and demo-region clearing. If a document genuinely has no TOC/list field, do not force one; a ref into an empty field inventory is fail-closed and will be rejected.
Internal Verify
python scripts/cli.py verify --name <brand> --scope auto --qa auto
--qa selects the QA depth (see reference/visual-audit.md):
fast: deterministic L0 only (schema, resolver targets, residual text, structural diffs).auto: L0 + L1 visual pixel proxies when renderers (sofficepluspdftoppmor optional PyMuPDF/fitz) are present; otherwise L0 plus a single INFOvisual.unavailable.deep: L0 + L1 + avisual_manifest.jsonand per-page PNGs; iftesseractis installed the manifest also includes OCR text/hits. The orchestrator must then run the L2 step (see below).strict: deep visual audit plus gate errors when full render proof is unavailable or L1/OCR evidence is not clean.
Verify has no output to render, so all modes behave as L0 at verify time; the visual stages run at generate time.
Internal Generate
python scripts/cli.py generate --name <brand> --input <intermediate-document.json> --output <output.docx> --scope auto --qa auto
See reference/comprehension.md, reference/profile-schema.md,
reference/generation.md, reference/visual-audit.md, and
examples/intermediate-document.example.json.
Visual audit (two-stage)
The engine renders the output and runs deterministic pixel proxies, but the qualitative visual judgement is yours (the orchestrator), never the engine's - the Python engine never calls a model. To run the full two-stage audit:
- Generate with
--qa deep. The engine renders each page to a PNG, runs the L1 proxies, and writesvisual_manifest.jsonnext to the output in an<output-file>.visual/dir, such asreport.docx.visual/(a side artifact; the.docxbytes never change). - Read the manifest path from stdout (
visual manifest: <path>). - Open the PNGs listed in
pages[*].png. For every entry inchecklist, judge PASS/FAIL against the rendered pages, takingl1_findingsandocr.hitsinto account. - If any checklist item FAILS (or an L1
visual.blank_page/visual.edge_bleedWARNING orvisual.ocr_residual_texthit is confirmed as a real defect): repair the IntermediateDocument/content or the generated composition, regenerate, then re-run the audit. Loop until the checklist is clean, or until no further targeted repair can be justified without user input.
L1 findings are WARNING-only and never fail the gate by themselves; the real qualitative gate is your L2 judgement.
During repair, treat the template as a source of reusable structure, not a rule
to preserve blindly. If inherited section breaks, front-matter scaffolding,
field-result caches, or other template structures create blank pages, stale
entries, overlaps, or visibly broken pagination, diagnose the structure as the
cause and make the smallest targeted composition change. It is acceptable to
collapse, move, or remove a template section break when preserving it damages the
final generated document. After every repair, regenerate and rerun --qa deep or
--qa strict.
Current Guarantees and Limits
Generation opens the saved .docx shell, clears detected demo text, and applies
only styles resolved from profile.json. L0 QA catches schema problems,
unresolved roles, markdown literals, and residual demo text.
When a current comprehension is present, generation also fills the cover slots in place (no duplicate title) and reconciles preserved indexes (a table of contents, a list of tables/figures) against the new content (regenerating, preserving, or purging stale entries) instead of carrying demo entries forward. Destructive reconciliation is bounded: a clear/remove is honored only when determinism corroborates and confidence clears a threshold, else the structure is kept with a warning.
Extraction also records a broad artifact_catalog: OOXML parts, media parts,
paragraph/table styles, style details, sections/margins, paragraph samples, and
table counts. Use it to understand and describe template conventions beyond the
roles that are directly generatable today.
The two-stage visual audit closes the "L0-only" gap: L1 deterministic pixel proxies catch rendered-layout defects L0 cannot see (blank/broken pages, content bleeding past the printable margins), and the L2 manifest drives the orchestrator's qualitative judgement and repair loop. See reference/visual-audit.md.
DOCX visual overflow requires render-time QA with LibreOffice because Word
layout is not deterministic from OOXML alone. When soffice and both PDF
rasterizers (pdftoppm, optional PyMuPDF/fitz) are absent (e.g. CI), the
visual audit degrades cleanly to L0 plus a single INFO
visual.unavailable; exit codes are unchanged and the skill does not claim a
full no-overflow visual proof.