ppt-master

star 28.5k

AI-driven multi-format SVG content generation system. Converts source documents (PDF/DOCX/URL/Markdown) into high-quality SVG pages and exports to PPTX through multi-role collaboration. Use when user asks to "create PPT", "make presentation", "生成PPT", "做PPT", "制作演示文稿", or mentions "ppt-master".

hugohe3 By hugohe3 schedule Updated 6/16/2026

name: ppt-master description: > AI-driven multi-format SVG content generation system. Converts source documents (PDF/DOCX/URL/Markdown) into high-quality SVG pages and exports to PPTX through multi-role collaboration. Use when user asks to "create PPT", "make presentation", "生成PPT", "做PPT", "制作演示文稿", or mentions "ppt-master".

PPT Master Skill

AI-driven multi-format SVG content generation system. Converts source documents into high-quality SVG pages through multi-role collaboration and exports to PPTX.

Core Pipeline: Source Document → Create Project → [Template] → Strategist → [Image_Generator] → Executor Live Preview → Quality Check → Post-processing → Export

[!CAUTION]

🚨 Global Execution Discipline (MANDATORY)

This workflow is a strict serial pipeline. The following rules have the highest priority — violating any one of them constitutes execution failure:

  1. SERIAL EXECUTION — Steps MUST be executed in order; the output of each step is the input for the next. Non-BLOCKING adjacent steps may proceed continuously once prerequisites are met, without waiting for the user to say "continue"
  2. BLOCKING = HARD STOP — Steps marked ⛔ BLOCKING require a full stop; the AI MUST wait for an explicit user response before proceeding and MUST NOT make any decisions on behalf of the user
  3. NO CROSS-PHASE BUNDLING — Cross-phase bundling is FORBIDDEN. (Note: the Eight Confirmations in Step 4 are ⛔ BLOCKING — the AI MUST present recommendations and wait for explicit user confirmation before proceeding. Once the user confirms, all subsequent non-BLOCKING steps — design spec output, SVG generation, speaker notes, and post-processing — may proceed automatically without further user confirmation)
  4. GATE BEFORE ENTRY — Each Step has prerequisites (🚧 GATE) listed at the top; these MUST be verified before starting that Step
  5. NO SPECULATIVE EXECUTION — "Pre-preparing" content for subsequent Steps is FORBIDDEN (e.g., writing SVG code during the Strategist phase)
  6. NO SUB-AGENT SVG GENERATION — Executor Step 6 SVG generation is context-dependent and MUST be completed by the current main agent end-to-end. Delegating page SVG generation to sub-agents is FORBIDDEN
  7. SEQUENTIAL PAGE GENERATION ONLY — In Executor Step 6, after the global design context is confirmed, SVG pages MUST be generated sequentially page by page in one continuous pass. Grouped page batches (for example, 5 pages at a time) are FORBIDDEN
  8. SPEC_LOCK RE-READ PER PAGE — Before generating each SVG page, Executor MUST read_file <project_path>/spec_lock.md. All colors / fonts / icons / images MUST come from this file — no values from memory or invented on the fly. Executor MUST also look up the current page's page_rhythm (anchor / dense / breathing), page_layouts (which template SVG to inherit, if any), and page_charts (which chart template to adapt, if any). Empty / absent entries are intentional Strategist signals — see executor-base.md §2.1. This rule exists to resist context-compression drift on long decks and to break the uniform "every page is a card grid" default
  9. SVG MUST BE HAND-WRITTEN, NOT SCRIPT-GENERATED — Every SVG page is written by the main agent directly, one page at a time (see rules 6 and 7). Writing or running a Python / Node / shell script that produces the SVG files in batch — looping over pages, templating from data, or emitting them via a generator — is FORBIDDEN, including under "save tokens", "quick draft", or "user is in a hurry" pretexts. The script-generation path was tried on a feature branch and abandoned: cross-page visual consistency depends on per-page authoring with full upstream context, which a generator script cannot reproduce

[!IMPORTANT]

🌐 Language & Communication Rule

  • Response language: match the user's input and source materials. Explicit user override (e.g., "请用英文回答") takes precedence.
  • Template format: design_spec.md MUST follow its original English template structure (section headings, field names) regardless of conversation language. Content values may be in the user's language.

[!IMPORTANT]

🔌 Compatibility With Generic Coding Skills

  • ppt-master is a repository-specific workflow, not a general application scaffold
  • Do NOT create .worktrees/, tests/, branch workflows, or generic engineering structure by default
  • On conflict with a generic coding skill, follow this skill unless the user explicitly says otherwise

Main Pipeline Scripts

Script Purpose
${SKILL_DIR}/scripts/source_to_md/pdf_to_md.py PDF to Markdown
${SKILL_DIR}/scripts/source_to_md/doc_to_md.py Documents to Markdown — native Python for DOCX/HTML/EPUB/IPYNB, pandoc fallback for legacy formats (.doc/.odt/.rtf/.tex/.rst/.org/.typ)
${SKILL_DIR}/scripts/source_to_md/excel_to_md.py Excel workbooks to Markdown — supports .xlsx/.xlsm; legacy .xls should be resaved as .xlsx
${SKILL_DIR}/scripts/source_to_md/ppt_to_md.py PowerPoint to Markdown
${SKILL_DIR}/scripts/source_to_md/web_to_md.py Web page to Markdown (supports WeChat via curl_cffi)
${SKILL_DIR}/scripts/project_manager.py Project init / validate / manage
${SKILL_DIR}/scripts/analyze_images.py Image analysis
${SKILL_DIR}/scripts/latex_render.py LaTeX formula rendering (manifest-driven PNG assets)
${SKILL_DIR}/scripts/image_gen.py AI image generation (multi-provider)
${SKILL_DIR}/scripts/svg_quality_checker.py SVG quality check
${SKILL_DIR}/scripts/total_md_split.py Speaker notes splitting
${SKILL_DIR}/scripts/finalize_svg.py SVG post-processing (unified entry)
${SKILL_DIR}/scripts/svg_to_pptx.py Export to PPTX
${SKILL_DIR}/scripts/update_spec.py Propagate a spec_lock.md color / font_family change across all generated SVGs

For complete tool documentation, see ${SKILL_DIR}/scripts/README.md.

Windows note: if a python3 ... command fails (common on python.org installs, which provide python.exe but not python3.exe), rerun the same command with python instead.

Template Index

Index Path Purpose
Layout templates ${SKILL_DIR}/templates/layouts/layouts_index.json Query available page layout templates
Brand presets ${SKILL_DIR}/templates/brands/brands_index.json Query available brand identity presets (color / typography / logo / voice)
Visualization templates ${SKILL_DIR}/templates/charts/charts_index.json Query available visualization SVG templates (charts, infographics, diagrams, frameworks)
Icon library ${SKILL_DIR}/templates/icons/ See ${SKILL_DIR}/templates/icons/README.md; search icons on demand with ls templates/icons/<library>/ | grep <keyword>

Standalone Workflows

Workflow Path Purpose
topic-research workflows/topic-research.md Pre-pipeline — gather web sources when the user supplies only a topic with no source files
template-fill workflows/template-fill-pptx.md Give a native PPTX template deck plus source material; select fitting pages (a page may be reused for several output slides) and fill text back without SVG conversion
create-template workflows/create-template.md Standalone layout template creation workflow
create-brand workflows/create-brand.md Standalone brand-only template creation (identity preset; no SVG page roster)
resume-execute workflows/resume-execute.md Phase B entry — resume execution in a fresh chat after Phase A (Step 1–5) completed in another session (split mode)
verify-charts workflows/verify-charts.md Chart coordinate calibration — run after SVG generation if the deck contains data charts
customize-animations workflows/customize-animations.md Object-level PPTX animation customization — run only when the user explicitly asks to tune animation order/effects/timing
live-preview workflows/live-preview.md Browser-based live preview — auto-started during generation and re-enterable any time the user mentions "live preview", "preview", "看效果", or wants to click/select a slide element
visual-review workflows/visual-review.md Per-page rubric-based visual self-check — run only when the user explicitly asks for a visual re-pass on the generated SVGs (between Executor and post-processing). Opt-in only; never invoked by the main pipeline.

Workflow

Step 1: Source Content Processing

🚧 GATE: User has provided source material (PDF / DOCX / EPUB / URL / Markdown file / text description / conversation content — any form is acceptable).

No source content? When the user supplies only a topic name or requirements without any file or substantive description, run the topic-research workflow first, then return here with its products as input.

When the user provides non-Markdown content, convert immediately:

User Provides Command
PDF file python3 ${SKILL_DIR}/scripts/source_to_md/pdf_to_md.py <file>
DOCX / Word / Office document python3 ${SKILL_DIR}/scripts/source_to_md/doc_to_md.py <file>
XLSX / XLSM / Excel workbook python3 ${SKILL_DIR}/scripts/source_to_md/excel_to_md.py <file>
CSV / TSV Read directly as plain-text table source
PPTX / PowerPoint deck python3 ${SKILL_DIR}/scripts/source_to_md/ppt_to_md.py <file>
EPUB / HTML / LaTeX / RST / other python3 ${SKILL_DIR}/scripts/source_to_md/doc_to_md.py <file>
Web link python3 ${SKILL_DIR}/scripts/source_to_md/web_to_md.py <URL>
WeChat / high-security site python3 ${SKILL_DIR}/scripts/source_to_md/web_to_md.py <URL> (requires curl_cffi, included in requirements.txt)
Markdown Read directly

Office vector assets (EMF/WMF) from DOCX/PPTX sources: doc_to_md.py / ppt_to_md.py extract embedded Office vector images (.emf/.wmf) alongside bitmap images. After import-sources, these land in images/ together with image_manifest.json and are first-class assets in §VIII Image Resource List.

Do NOT convert EMF/WMF to PNG. The PPT Master pipeline preserves them as external references (finalize_svg.py skips them) and svg_to_pptx.py embeds them as PPTX-native media via image/x-emf / image/x-wmf MIME — PowerPoint renders them at full vector fidelity. Converting via LibreOffice/Inkscape introduces CJK font substitution drift and rasterization loss; the original EMF/WMF is always higher fidelity than the converted PNG.

Browser-based live preview cannot render EMF (will show blank) — this is expected; the PPTX output is the source of truth.

✅ Checkpoint — Confirm source content is ready, proceed to Step 2.


Step 2: Project Initialization

🚧 GATE: Step 1 complete; source content is ready (Markdown file, user-provided text, or requirements described in conversation are all valid).

python3 ${SKILL_DIR}/scripts/project_manager.py init <project_name> --format <format>

Format options: ppt169 (default), ppt43, xhs, story, etc. For the full format list, see references/canvas-formats.md.

Import source content (choose based on the situation):

Situation Action
Has source files (PDF/MD/etc.) python3 ${SKILL_DIR}/scripts/project_manager.py import-sources <project_path> <source_files...> --move
User provided text directly in conversation No import needed — content is already in conversation context; subsequent steps can reference it directly

⚠️ MUST use --move (not copy): all source files — Step 1's generated Markdown, original PDFs / MDs / images — go into sources/ via import-sources --move. After execution they no longer exist at the original location. Intermediate artifacts (e.g., _files/) are handled automatically.

✅ Checkpoint — Confirm project structure created successfully, sources/ contains all source files, converted materials are ready. Proceed to Step 3.


Step 3: Template Option

🚧 GATE: Step 2 complete; project directory structure is ready.

Default — free design. Proceed directly to Step 4. Do NOT query any *_index.json unless triggered. Do NOT ask the user. Do NOT proactively suggest, hint at, or fuzzy-match any template based on content, slug-like words, or vague style descriptions.

Template flow triggers ONLY on explicit directory paths supplied by the user in their initial message. The trigger rule is mechanical, not interpretive:

User input contains Step 3 action
One or more explicit template directory paths (each resolves to a directory containing design_spec.md with kind: brand / kind: layout / kind: deck in its YAML frontmatter) Read each spec's kind, dispatch per the kind matrix below, fuse if multiple
Anything else — bare template names ("用 academic_defense"), style descriptions ("麦肯锡风格"), brand mentions ("招商银行风格"), vague intent ("想用个模板"), or silence Skip Step 3, free design

There is no slug matching, no name lookup, no fuzzy resolution. A name without a path does not trigger — the user must give a path the AI can cd into.

Style descriptions ("麦肯锡风格" / "Keynote 风" / "极简风" / etc.) never trigger Step 3. They flow into Strategist's Eight Confirmations as a style brief (color / typography / tone in confirmations e–g).

Bare names ("academic_defense", "招商银行", "anthropic") do NOT trigger Step 3 even if a matching directory exists in the library. The user must give a path. AI must not "helpfully" resolve a name to a path.

"What templates exist?" is out-of-band Q&A — answer by listing entries from brands_index.json / layouts_index.json / decks_index.json together with their paths. Listing alone does not advance the pipeline; the user must send a path back to trigger Step 3.

To create a new layout or deck, read workflows/create-template.md. To create a new brand, read workflows/create-brand.md.

Three template kinds

The architecture has three independent reference bundles. Full schema in docs/zh/templates-architecture.md. Summary:

Kind Physical dir Contains Frontmatter
brand templates/brands/<id>/ identity-only segment: color / typography / logo / voice / icon style kind: brand
layout templates/layouts/<id>/ structure-only segment: canvas / page structure / page types / SVG roster kind: layout
deck templates/decks/<id>/ full replica: identity + structure + middle (template overview) segments kind: deck

Segment ownership (governs fusion override priority):

Segment Sections Owner kind on fusion
Identity Color Scheme / Typography / Logo / Voice & Tone / Icon Style brand
Structure Canvas / Page Structure / Page Types / SVG Roster layout
Middle Template Overview (use cases / design intent) deck (no other kind writes this)

Single-path dispatch

User path's kind Step 3 action
kind: brand design_spec.md + non-image assets → <project>/templates/; logo / illustration / icon bitmaps<project>/images/. Strategist locks identity segment as truth; structure stays free.
kind: layout design_spec.md + SVG roster → <project>/templates/; any bitmap assets → <project>/images/. Strategist locks structure; identity decided in Eight Confirmations e–g.
kind: deck design_spec.md + template SVGs → <project>/templates/; logos / backgrounds / other bitmaps<project>/images/. Strategist locks all segments; Eight Confirmations narrows to deck-content fields (audience / page count / outline / tone tweaks).
TEMPLATE_DIR=<user-supplied path>
# Bitmaps join the project's single runtime image pool (images/, referenced as
# ../images/); the spec + template SVGs + other non-image assets stay in
# templates/ as design reference the Strategist/Executor read but never render.
cp -r ${TEMPLATE_DIR}/* <project_path>/templates/
find <project_path>/templates -type f \( -iname '*.png' -o -iname '*.jpg' -o -iname '*.jpeg' -o -iname '*.gif' -o -iname '*.webp' -o -iname '*.bmp' \) -exec mv {} <project_path>/images/ \;

The same split applies to all three kinds — bitmaps always land in images/, the rest in templates/. The spec's kind field tells Strategist how to read the templates/ side; downstream code doesn't distinguish. (Template SVGs in templates/ are reference material only — the rendered pages live in svg_output/ and reference images via ../images/.)

Multi-path fusion

When the user gives two or more paths of different kinds, Step 3 fuses them into a single <project>/templates/design_spec.md. Default granularity is segment-level integer replacement — entire identity / structure / middle segments are taken from the highest-priority source for that segment, no implicit field-level mixing.

Override priority by segment:

Combination Identity from Structure from Middle from
brand only brand (free design) (none)
layout only (free design) layout (none)
deck only deck deck deck
brand + layout brand layout (none)
brand + deck brand (overrides deck) deck deck
layout + deck deck layout (overrides deck) deck
brand + layout + deck brand layout deck

Field-level micro-adjustment (e.g. "use anthropic brand but primary changed to #FF0000") is not part of Step 3 fusion — it flows into Strategist Eight Confirmations e–g as a normal user request.

Same-kind multiple paths — conflict resolution

When the user gives two paths of the same kind (e.g. brands/anthropic + brands/google), Step 3 surfaces a conflict prompt before fusing — like resolving a git merge conflict:

AI: 你给了两个 brand,检测到段级冲突:
    - Color Scheme(Anthropic 橙红 vs Google 多色)
    - Typography(Styrene/AnthropicSans vs GoogleSans/Roboto)
    - Logo(Anthropic 标 vs Google 标)
    - Voice & Tone(restrained vs friendly)
    - Icon Style(stroke vs filled)

    要 (a) 全部按 Anthropic / (b) 全部按 Google / (c) 逐段挑?

Rules:

  • Default: no implicit ordering — every cross-source segment difference is reported as a conflict
  • Only when the user picks (c) does AI walk through each segment one by one
  • Field-level conflicts are out of scope — segment-level only
  • Three or more same-kind paths are not supported — ask the user to converge to at most two

Fused spec provenance

When fusion happens (any multi-path case), the resulting <project>/templates/design_spec.md carries a provenance block immediately under its H1:

> **Fused from:**
> - deck: `templates/decks/招商银行/` (base)
> - brand: `templates/brands/anthropic/` (identity override)
> - layout: `templates/layouts/academic_defense/` (structure override)
> - conflicts resolved: Color Scheme from anthropic(user picked a)

Single-path Step 3 does not add provenance (the source is self-evident from the copied files).

✅ Checkpoint — Default path proceeds to Step 4 without user interaction. If the user supplied one or more explicit template paths, those have been dispatched (or fused) into <project_path>/templates/ before advancing.


Step 4: Strategist Phase (MANDATORY — cannot be skipped)

🚧 GATE: Step 3 complete; default free-design path taken, or (if triggered) template files copied into the project.

First, read the role definition:

Read references/strategist.md

⚠️ Mandatory gate: before writing design_spec.md, Strategist MUST read_file templates/design_spec_reference.md and follow its full I–XI section structure. See strategist.md Section 1.

Eight Confirmations (full template: templates/design_spec_reference.md):

BLOCKING: present the Eight Confirmations as a single bundled recommendation set and wait for explicit user confirmation or modification before outputting Design Specification & Content Outline. This is the single core confirmation point — once confirmed, all subsequent steps proceed automatically.

  1. Canvas format
  2. Page count range
  3. Target audience
  4. Style objective
  5. Color scheme
  6. Icon usage approach
  7. Typography plan, including formula rendering policy
  8. Image usage approach

Confirm UI Auto-Launch (Mandatory — default visual confirmation surface): by default the Eight Confirmations are presented through an interactive local page (color swatches, live font previews, candidate picks); the chat path is the always-valid fallback. Steps:

  1. Write the recommendations to <project_path>/confirm_ui/recommendations.json (full schema + field mapping: scripts/docs/confirm_ui.md). Two kinds of field: enumerable (canvas / mode / visual_style / icons / formula policy / generation mode; plus image usage with a Custom path; plus AI source only when image usage may include ai) — the page lists common options from confirm_ui/static/catalogs.json, so you only name the recommended canonical id in a recommend block (canvas may be a catalog id like ppt169 or a custom size/prose; style = mode + visual_style, two independent picks; icon ids are real libraries such as tabler-outline, or emoji for system emoji; image usage uses ai / web / provided / placeholder / none, or a custom prose plan when several sources must be combined; never write bare "custom" for image usage — write the actual mixed plan, e.g. "AI cover + user product assets + web industry images"; write image_ai_path only when recommending image_usage: "ai" or a custom plan that includes AI); generative (color, typography, generated-image style) — author a few candidates (color: user-facing core palette with background/secondary_bg/primary/accent/secondary_accent/body_text; typography: CJK + Latin for heading and body with css preview stacks, plus body_size as the body baseline px; when recommending generated images, image_strategy.candidates with rendering × palette combinations from strategist h.5). page_count / audience are plain values. Only open fields show a Custom box: canvas, mode, visual_style, icons, image_usage, and typography custom text. Closed fields (image_ai_path, formula_policy, generation_mode, refine_spec) stay finite. Set lang to the page language; visible candidate text should match lang, or provide bilingual name_zh / name_en and note_zh / note_en fields. Reuse the same candidate thinking as strategist h.5.
  2. Launch the page in the background and wait for the browser confirmation (the child server runs detached; the parent command returns after result.json is freshly written). Run this command with a long tool timeout — 600000 ms — so the --wait (≈590 s budget) can complete:
    python3 ${SKILL_DIR}/scripts/confirm_ui/server.py <project_path> --daemon --wait
    
    Page opens at http://localhost:5050 — the same port as the Step 6 live preview (they never run at once: this page shuts down at the end of Step 4, freeing the port). If another project already holds 5050, the launcher auto-advances to the next free port (5051, …) and serves this project there — read the actual URL from the launch log and report that. When the user clicks Confirm, the command exits 0 and Step 4 reads result.json immediately; do not require a second chat confirmation. Launch or wait failure is non-fatal: if it fails or times out (flask missing, port blocked, no GUI / remote / web host, browser never confirms in time), do NOT troubleshoot. The detached page stays open, so a slow user may confirm after the wait returns — therefore on any non-zero exit, re-check <project_path>/confirm_ui/result.json once (a fresh status: confirmed) before dropping to the chat-summary fallback below.
  3. Always also print the eight recommendations as a short summary in chat, with the URL. This keeps the chat fallback valid whether or not the browser opened. If the page never appears, the user simply confirms or edits in chat as before.
  4. This is the ⛔ BLOCKING wait. Preferred page path: the --wait command returns after the page writes a fresh <project_path>/confirm_ui/result.json; immediately read that file and use its values. On a non-zero exit, re-check result.json once (per step 2) — a fresh status: confirmed still wins. Chat fallback path: only if no fresh result exists (page didn't open, wait timed out with no confirmation, or the user replies in chat with edits) take the chat values directly. Either path converges. A confirmed result.json is an explicit user choice: generation_mode: "split" means split mode was chosen; refine_spec: true means the refine-spec workflow was chosen.
  5. Close the confirm page (Mandatory cleanup — every path). Once you have the confirmed values (page or chat), shut the confirm server down before leaving Step 4 so it cannot keep holding port 5050 (which Step 6 live preview reuses):
    python3 ${SKILL_DIR}/scripts/confirm_ui/server.py <project_path> --shutdown
    
    This is idempotent and required regardless of whether Confirm was clicked: clicking Confirm already shuts the page down (this is then a no-op), but the chat-fallback path leaves the page running — without this cleanup it would block the live preview launch. Run it after reading the confirmation and before proceeding to Step 5.

Honoring the confirmation (result.json is authoritative — Mandatory): the confirmed values override your own recommendations when you write design_spec.md / spec_lock.md. A user who changed any field changed it on purpose. In particular, map image_usage to §VIII Acquire Via (its value names differ from §h options — translate):

result.json.image_usage §VIII Acquire Via h.5 + Step 5 generation
ai (or a custom plan that includes AI) ai rows Run h.5 (lock rendering + palette); Step 5 generates
web web rows None
provided user rows None — never generate
placeholder placeholder rows None
none no image rows (§h option A) None

When the confirmed image_usage is not ai (and the plan has no AI part), do NOT run h.5, do NOT write ai rows, and do NOT generate images in Step 5 — regardless of what you recommended. The same "confirmed value wins" rule applies to every field (color → §III, typography → §IV, etc.).

Opt-out: if the user has said they don't want the page (e.g. "不要网页" / "just confirm in chat" / "纯聊天确认"), skip the launch entirely (step 2) and present the Eight Confirmations in chat as before — steps 1, 3, 4 still apply (recommendations summary in chat; wait; take chat values).

The page is a confirmation surface only — Strategist still authors every recommendation; the page never generates content.

Mandatory — split-mode note (not a ninth confirmation): after listing the eight confirmation details, you MUST append exactly one short line (rendered in the user's language, prefixed with 💡) about generation mode. Pick the variant by qualitative read of Phase A signals — recommended page count, source-material bulk, whether topic-research ran with substantial web-fetch accumulation:

Signal read Line content
Heavy (long page count / bulky sources / heavy web-fetch accumulation) State estimated page count and large source size; recommend switching to split mode after Step 5 — stop this chat, open a fresh window and input 继续生成 projects/<project_name> to enter Phase B (SVG generation + export); no response or "continue" = default continuous mode.
Normal (default) State scale is moderate, default continuous mode generates in one go; if mid-way window switch is desired, input 继续生成 projects/<project_name> after Step 5 to switch to split mode.

This line is required output every run — the user must always see the mode choice exists. Whether to act on it is the user's call. When the Confirm UI is used, this choice also appears as the in-page generation-mode toggle and is captured in result.json (generation_mode); the chat-summary fallback still prints this line.

Mandatory — spec-refinement note (not a ninth confirmation): after the split-mode line, you MUST append one short opt-in line (rendered in the user's language, prefixed with 💡) telling the user they may refine the spec first — Strategist will produce the full design spec, then stop for review/revision of any part of it before any generation, via the refine-spec workflow. Default is OFF: no request → the spec is written in one go and the pipeline auto-proceeds as usual. Only when the user explicitly asks in chat (e.g. "refine the spec first") or confirms refine_spec: true through Confirm UI does the refine-spec workflow take over after the Eight Confirmations. This line, like the split-mode line, is required output every run — the user must see the choice exists; whether to act on it is theirs. When the Confirm UI is used, this choice also appears as the in-page refine-spec toggle and is captured in result.json (refine_spec); the chat-summary fallback still prints this line.

Formula rendering policy lives inside item 7 (Typography plan):

Policy Behavior
mixed (default) Strategist renders complex formula-worthy expressions as PNG assets; simple inline expressions remain editable text / Unicode
render-all Strategist renders every formula-worthy expression as PNG assets
text-only No formula rendering; formulas remain editable text / Unicode

After the Eight Confirmations are approved and before outputting design_spec.md / spec_lock.md, if the confirmed formula policy is mixed or render-all and the content contains formula-worthy expressions, Strategist MUST:

  1. Identify explicit LaTeX and any source expressions that should be faithfully structured as formulas.
  2. Write <project_path>/images/formula_manifest.json with only the formulas selected for rendering.
  3. Run:
    python3 ${SKILL_DIR}/scripts/latex_render.py <project_path>
    
  4. Include the rendered formula PNGs as Acquire Via: formula, Status: Rendered, Type: Latex Formula rows in design_spec.md §VIII Image Resource List; also list them in spec_lock.md images with | no-crop.

The formula renderer uses a provider fallback chain by default: codecogs,quicklatex,mathpad,wikimedia. The first three are color-aware; Wikimedia is an availability fallback. Formula PNGs are transparent by default: manifest background is the temporary render matte and transparency-removal reference, not a retained final background unless transparent: false is set for that item. Do not scan spec_lock.md for $...$ or $$...$$. Dollar-delimited math in source material is only a signal for Strategist; the renderer consumes the explicit manifest.

If the user provided images or formula PNGs were rendered, run analysis before outputting the design spec:

python3 ${SKILL_DIR}/scripts/analyze_images.py <project_path>/images

⚠️ Image handling: NEVER directly read / open / view image files (.jpg, .png, etc.). All image info comes from analyze_images.py output or the Design Spec's Image Resource List.

Output:

  • <project_path>/design_spec.md — human-readable design narrative
  • <project_path>/spec_lock.md — machine-readable execution contract (skeleton: templates/spec_lock_reference.md); Executor re-reads before every page

✅ Checkpoint — Phase deliverables complete, auto-proceed to next step:

## ✅ Strategist Phase Complete
- [x] Eight Confirmations completed (user confirmed via Confirm UI `result.json` or chat fallback)
- [x] Split-mode note appended below the eight items (heavy or normal variant)
- [x] Spec-refinement opt-in line appended (default OFF; only the user's explicit request enters the refine-spec workflow)
- [x] Design Specification & Content Outline generated
- [x] Execution lock (spec_lock.md) generated
- [ ] **Next**: Auto-proceed to [Image_Generator / Executor] phase

Step 5: Image Acquisition Phase (Conditional)

🚧 GATE: Step 4 complete; Design Specification & Content Outline generated and user confirmed. Any formula rows already have Acquire Via: formula and Status: Rendered.

Trigger: At least one row in the resource list has Acquire Via: ai and/or Acquire Via: web. If every row is user, formula, or placeholder, skip to Step 6.

Always load the common framework:

Read references/image-base.md

Then lazy-load the path-specific reference for each row that actually needs it:

Acquire Via Load reference (only if any such row exists) Run
ai references/image-generator.md python3 ${SKILL_DIR}/scripts/image_gen.py --manifest <project_path>/images/image_prompts.json
web references/image-searcher.md python3 ${SKILL_DIR}/scripts/image_search.py ...
user / placeholder (skip) (skip)

A deck with only ai rows never loads image-searcher.md; a deck with only web rows never loads image-generator.md. A mixed deck loads both, processes each row through its own path, and writes both image_prompts.json and image_sources.json.

⚠️ In-pipeline ai path MUST use manifest mode — even when only 1 ai row exists. Write images/image_prompts.json first, then run image_gen.py --manifest, then image_gen.py --render-md to produce the image_prompts.md sidecar. The positional form (image_gen.py "prompt" ...) is reserved for out-of-pipeline one-off testing / single-image fixups — it skips manifest + sidecar, leaving no audit trail.

⚠️ Honor the confirmed image source: the ai generation path (Path A = image_gen.py API / Path B = host-native tool / Offline Manual) is not auto-only — a confirmed choice other than auto wins, whether it came from chat (canonical) or, when the page was used, result.json.image_ai_path. host-native forces Path B even when IMAGE_BACKEND is configured; api forces Path A; manual forces offline. The --manifest command above is Path A. Full selection rule: image-generator.md §7 Path Selection.

Workflow:

  1. Extract all rows with Status: Pending and Acquire Via ∈ {ai, web} from the design spec
  2. Generate prompts (ai rows) and/or run search (web rows) per image-base.md §2 dispatch table
  3. Verify every row reaches a terminal status: Generated (ai success), Sourced (web success), or Needs-Manual

✅ Checkpoint — Confirm acquisition attempted for every row:

## ✅ Image Acquisition Phase Complete
- [x] image_prompts.json created (when any ai rows processed)
- [x] image_prompts.md sidecar rendered (when any ai rows processed)
- [x] image_sources.json created (when any web rows processed)
- [x] Each row: status is `Generated` / `Sourced` / `Needs-Manual` (no `Pending` remaining)

Default — auto-proceed to Step 6. Only when the user's Step 4 response explicitly opted into split mode (in chat or via Confirm UI result.json with generation_mode: "split"), output the Phase A hand-off below and stop this conversation:

## ✅ Phase A Complete
- [x] Spec: `design_spec.md`, `spec_lock.md`
- [x] Resources: `sources/`, `images/`, `templates/`
- [ ] **Next**: open a fresh chat window and input `继续生成 projects/<project_name>` to enter Phase B via the [`resume-execute`](workflows/resume-execute.md) workflow.

On acquisition failure, do NOT halt — follow the Failure Handling rule in image-base.md §5: retry once, then mark the row Needs-Manual, report to user, and continue to the checkpoint above.


Step 6: Executor Phase

🚧 GATE: Step 4 (and Step 5 if triggered) complete; all prerequisite deliverables are ready.

Read the execution references for this deck's locked mode + visual_style (from spec_lock.md):

Read references/executor-base.md                  # REQUIRED: common guidelines
Read references/shared-standards.md               # REQUIRED: SVG/PPT technical constraints
Read references/modes/<locked-mode>.md            # narrative skeleton (spec_lock.md `mode`)
Read references/visual-styles/<locked-style>.md   # aesthetic (spec_lock.md `visual_style`)

Read executor-base + shared-standards + the one locked mode file + the one locked visual-style file. For mode: custom or visual_style: custom, skip that preset file and follow mode_behavior / visual_style_behavior from spec_lock.md instead. Never glob modes/ or visual-styles/.

Design Parameter Confirmation (Mandatory): before the first SVG, output key design parameters from the spec (canvas dimensions, color scheme, font plan, body font size). See executor-base.md §2.

Live Preview Auto-Startup (Mandatory): before the first SVG, automatically start the browser editor in live mode and keep it running continuously through Executor + Step 7 export:

python3 ${SKILL_DIR}/scripts/svg_editor/server.py <project_path> --live
  • Start it immediately when Executor begins; svg_output/ may be empty. Editor opens at http://localhost:5050; if another project already holds it, the launcher auto-advances to the next free port — read the actual URL from the launch log and report that.
  • Run it as a long-running side process/session; do not wait for it to exit before generating SVG pages. Do not wait for user confirmation after startup.
  • Service must keep running until one of: (a) the user clicks Exit preview in the browser, or (b) the user explicitly asks in chat to stop it. Generation continues even if the user closes the editor.
  • Do NOT read or apply submitted annotations during generation. Users may annotate at any time, but Executor proceeds without touching them. The window to apply annotations opens only after Step 7 completes — see workflows/live-preview.md.
  • The editor also supports staged direct edits (text content + SVG element attributes previewed immediately, then written to svg_output/ only when the user clicks Apply changes; Ctrl+Z / Undo drops staged edits) alongside annotation; re-export stays chat-driven. Full scope and editor details: see workflows/live-preview.md Notes.

Pre-generation Batch Read (Mandatory): before the first SVG, batch-read every distinct layout SVG referenced in spec_lock.page_layouts and every distinct chart SVG referenced in spec_lock.page_charts (plus any §VII backup charts). One read per file, up front — do not re-read these during page generation. See executor-base.md §1.0.

Per-page spec_lock re-read (Mandatory): before each SVG page, read_file <project_path>/spec_lock.md and use only its colors / fonts / icons / images, plus the per-page page_rhythm / page_layouts / page_charts lookups (resolves to template SVGs already loaded in the batch read above). Resists context-compression drift on long decks. See executor-base.md §2.1.

⚠️ Main-agent only: SVG generation MUST stay in the current main agent — page design depends on full upstream context. Do NOT delegate to sub-agents. ⚠️ Generation rhythm: generate pages sequentially, one at a time, in the same continuous context. Do NOT batch (e.g., 5 per group).

Visual Construction Phase: generate SVG pages sequentially, one at a time, in one continuous pass → <project_path>/svg_output/

Quality Check Gate (Mandatory) — after all SVGs, BEFORE annotation handling and speaker notes:

python3 ${SKILL_DIR}/scripts/svg_quality_checker.py <project_path>
  • Any error (banned SVG features, viewBox mismatch, spec_lock drift, etc.) MUST be fixed before proceeding — return to Visual Construction, regenerate that page, re-run check.
  • warning entries (low-res image, non-PPT-safe font tail, etc.): fix when straightforward, otherwise acknowledge and release.
  • Run against svg_output/ (not after finalize_svg.py — finalize rewrites SVG and masks violations).

Logic Construction Phase: generate speaker notes → <project_path>/notes/total.md

✅ Checkpoint — Confirm all SVGs and notes are fully generated and quality-checked. Proceed directly to Step 7 post-processing:

## ✅ Executor Phase Complete
- [x] Live preview started and kept available at the reported URL
- [x] All SVGs generated to svg_output/
- [x] svg_quality_checker.py passed (0 errors)
- [x] Speaker notes generated at notes/total.md

Chart pages? If this deck contains data charts (bar / line / pie / radar / etc.), run the standalone verify-charts workflow before Step 7 to calibrate coordinates. AI models routinely introduce 10–50 px errors when mapping data to pixel positions; verify-charts eliminates that class of error. Skip if no chart pages.

Visual self-check (opt-in)? If the user explicitly asked for a per-page visual re-pass on the SVGs ("跑一下视觉自检 / 视觉回看", "visual review", "check pages visually", etc.), run the standalone visual-review workflow before Step 7. Do NOT run it by default and do NOT recommend it based on inferred model capability or deck size — trigger is user request only.


Step 7: Post-processing & Export

🚧 GATE: Step 6 complete; all SVGs generated to svg_output/; speaker notes notes/total.md generated.

🚧 Image readiness GATE (when Step 5 left ai rows in Needs-Manual): every expected file must exist at project/images/<filename> before running 7.1.

If files are missing: PAUSE, list the missing filenames, point the user to images/image_prompts.md (each ### Image N: block is paste-ready for ChatGPT / Gemini / Midjourney; auto-generated from image_prompts.json) and the required placement project/images/<filename>. Resume Step 7.1 only after all expected files are in place. finalize_svg.py and svg_to_pptx.py do not detect missing files at this layer — proceeding with gaps produces a deck with broken image references.

⚠️ Run the three sub-steps one at a time — each must complete successfully before the next. ❌ NEVER combine them into a single code block or shell invocation.

Canonical three-command pipeline (mirrors references/shared-standards.md §5):

Step 7.1 — Split speaker notes:

python3 ${SKILL_DIR}/scripts/total_md_split.py <project_path>

Step 7.2 — SVG post-processing (icon embedding / image crop & embed / text flattening / rounded rect to path):

python3 ${SKILL_DIR}/scripts/finalize_svg.py <project_path>

Step 7.3 — Export PPTX (embeds speaker notes by default):

python3 ${SKILL_DIR}/scripts/svg_to_pptx.py <project_path>
# Output (default-flow mode):
#   exports/<project_name>_<timestamp>.pptx           ← native pptx (canonical output, reads svg_output/)
#   backup/<timestamp>/svg_output/                    ← Executor SVG source backup (always written)
#
# Add --svg-snapshot to additionally emit the SVG-image preview pptx alongside the native pptx:
#   exports/<project_name>_<timestamp>_svg.pptx      ← SVG preview pptx (reads svg_final/)

The native pptx consumes svg_output/ directly so the converter can preserve high-fidelity primitives (icon <use> placeholders, image preserveAspectRatiosrcRect, rounded rect rx/ryprstGeom roundRect). The svg_output/ snapshot in backup/<timestamp>/ is always written so the project can be re-exported from frozen SVG sources without re-running the LLM. The SVG-rendered preview pptx is opt-in via --svg-snapshot — live preview already provides the SVG visual reference, so it's only needed when you want a self-contained file to share. Pass -s output or -s final to force a single source if you need it.

Paragraph editability vs line fidelity — by default, mergeable dy-stacked paragraph blocks collapse into one editable PowerPoint text frame with multiple <a:p>, improving body-text editing and resize/reflow behavior. Add --no-merge only when the user explicitly asks for strict line-layout fidelity or when a layout-tight page must keep every dy-stacked line as its own text frame. The merge detector is conservative; mixed-layout text falls back to per-line frames.

Optional animation flags (the defaults already enable rich entrance animations — adjust only when the user asks for something different):

  • -t <effect> — page transition. Default fade. Options: fade / push / wipe / split / strips / cover / random / none.
  • -a <effect> — per-element entrance animation. Default auto (map effect from group id: chart→wipe, card-/step-/pillar-→fly, title/takeaway→fade; image-like ids hero / figure- / image / img- / kpi cycle a richer pool — zoom / dissolve / circle / box / diamond / wheel — so multiple images vary across the deck). Pass none to disable, a specific effect like fade, or mixed for the legacy 16-effect cycle. Requires top-level <g id="..."> groups (already required by Executor).
  • --animation-trigger {on-click,with-previous,after-previous} — Start mode (matches PowerPoint's animation-pane Start dropdown). Default after-previous (click-free cascade; pace via --animation-stagger). Use on-click for presenter-paced reveals, or with-previous for all-at-once.
  • --animation-config <path> — optional object-level sidecar. Default: <project_path>/animations.json when present.
  • --auto-advance <seconds> — kiosk-style auto-play.

Optional custom animations (only when the user asks to tune animation order/effects/timing for specific objects):

Run the standalone customize-animations workflow. Default export already has global entrance animation; do not create animations.json unless object-level customization was requested.

Optional recorded narration (only when the user asks for narrated/video export):

Run the standalone generate-audio workflow. The AI picks a narration backend (edge by default, or a configured cloud provider such as ElevenLabs / MiniMax / Qwen / CosyVoice for high-quality or cloned voices), asks the user once (backend + voice + rate/settings + embed-or-not, all with recommended values), then executes notes_to_audio.py and (if chosen) re-exports the PPTX with --recorded-narration audio.

Do NOT call notes_to_audio.py directly without going through the workflow — --voice / --voice-id is required and the workflow produces the locale/provider-aware recommendation that makes the choice meaningful.

Full effect list, anchor logic, and limits: references/animations.md.

NEVER substitute cp for finalize_svg.py — finalize performs multiple critical processing steps ❌ NEVER force -s output for the legacy/preview pptx (PowerPoint's internal SVG parser drops icons and rounded corners). The default auto-split already gives native the high-fidelity source it needs without touching legacy. ❌ NEVER use --only (it suppresses one of the two output files)

Post-export annotation window: the preview service from Step 6 typically remains running after export. If the user submitted annotations in the browser (during Executor or after export) and now asks to apply them — they may quote the browser prompt (Changes saved to svg_output... / 修改已保存到 svg_output...), say "apply my annotations" / "应用注解" / equivalent — run live-preview Step 2 to apply and re-export. Annotations submitted during generation are also handled here, not earlier.

Direct edits in the browser: the user may also stage text / SVG attribute edits in the preview. These land in svg_output/ only after the user clicks Apply changes. If they ask to "re-export" / "重新导出" after applying such edits, just re-run Step 7.2–7.3 (finalize + export); no annotation-application step is needed unless they also saved AI-needed annotations.

Preview not running? Any time the user mentions "live preview", "preview", "看效果", or wants to select/click a slide element and the service is not running, run live-preview Step 1 to start it. If the service is already running, just point them at the URL — do not restart.


Role Switching Protocol

Before switching roles, MUST first read the corresponding reference file. Output marker:

## [Role Switch: <Role Name>]
📖 Reading role definition: references/<filename>.md
📋 Current task: <brief description>

Reference Resources

Resource Path
Shared technical constraints references/shared-standards.md
Canvas format specification references/canvas-formats.md
Image-text layout patterns (Primary structures + Modifier layers — combine freely) references/image-layout-patterns.md
Image layout sizing (math for side-by-side container dimensions) references/image-layout-spec.md
SVG image embedding references/svg-image-embedding.md
Icon library templates/icons/README.md

Notes

  • Local preview: python3 -m http.server -d <project_path>/svg_final 8000
  • Troubleshooting: on generation issues (layout overflow, export errors, blank images, etc.), check docs/faq.md for known solutions
Install via CLI
npx skills add https://github.com/hugohe3/ppt-master --skill ppt-master
Repository Details
star Stars 28,523
call_split Forks 2,514
navigation Branch main
article Path SKILL.md
More from Creator