ppt-skill - SKILL.md Agent Skill

name: ppt-skill description: "Create production-grade, actually-usable presentations — not just decks that look nice. Use this skill whenever the user wants to make, edit, or generate slides, a deck, a PPT, a presentation, or slideshow. Triggers include: uploading a document/PDF/image (including .pptx as input) and asking for slides, generating from a topic, generating from an outline, or asking to make slides in any language for any purpose (pitch, report, lecture, proposal, sharing, etc.). This skill produces HTML slideshows (with optional PDF export). The skill REFUSES to skip clarification — it asks Socratic questions before generating anything, because a fast deck on the wrong topic wastes more time than two questions did."

PPT Skill — Production-Grade, Actually-Usable Presentations

Core Philosophy

Useful, not fast. Every other AI PPT tool optimizes for speed and produces decks that "应付用" (just barely usable). This skill optimizes for actual usability — the kind a person would walk into a meeting with.

Four rules that override everything else:

Clarify before generating. Never write outline or slide content until Phase 1 is complete.
Show, don't tell, for visuals. Never describe styles in words. Generate previews and let the user pick.
Titles carry information, not labels. Cover the body and read titles only — the reader must grasp what each slide is about. Empty noun-phrase labels ("Market Analysis" / "Our Solution" / "Background") fail this test. For persuasive and reporting scenarios, write action titles that state the takeaway. For academic, educational, and reference slides, a precise topic phrase is fine. Forcing every title into a full sentence triggers Rule 4's translator-ese trap.
Write like a native, not a translator. Generated text must read as if a native speaker wrote it from scratch. AI's default output in Chinese (and other non-English languages) tends toward 翻译腔 / "AI 文风" — grammatically correct, but no native would actually say it that way. This is the #1 quality killer for non-English decks. See references/patterns.md § Language-Native Phrasing.

Path Conventions

Two path values drive every reference. Both are placeholders, not shell variables — substitute the resolved literal path before running.

{SKILL_DIR} — the skill's install root (the directory holding this SKILL.md), typically /Users/<you>/.claude/skills/ppt-skill on macOS. Resolve once per session.
{RUN_DIR} — the per-run working directory for intermediate artifacts. Create at session start:

mkdir -p "./ppt-workspace/runs/$(date +%Y%m%d-%H%M%S)"/{extracted,template_thumbs,style_previews,qa_thumbs,unpacked}

Why placeholders, not shell variables: Claude Code's Bash tool does not persist shell state across calls — export SKILL_DIR=… in one Bash call is gone by the next. Placeholders (curly braces without $) make it obvious these are author-time substitutions.

Where things live (cwd-relative by default): inputs wherever the user supplied them; intermediates under {RUN_DIR}/; final deliverables in the current working directory (./<slides-slug>.html, optionally ./<slides-slug>.pdf) unless the user asked otherwise.

Mode Routing

Identify the mode from the first message + uploaded files. Read the matching reference file before proceeding — it contains mode-specific Phase 0 details.

Signal	Mode	Reference
Uploaded .docx / .pdf / .md / image / .xlsx / .csv / .json / .pptx or provided a URL, asks for slides	from-files	`references/modes/from-files.md`
No file, just a topic, an outline, or a long pasted text body	from-prompt	`references/modes/from-prompt.md`
Image-only upload, no other content	from-prompt (Phase 0 classifies image: content / style / illustration)	`references/modes/from-prompt.md`
User references a previously-generated deck from this session (`{RUN_DIR}/slides.md` exists)	iterate	`references/modes/iterate.md`

Note on .pptx uploads: Uploaded .pptx files are treated as from-files input. The skill extracts text, images, and layout information via extract_pptx.py and uses them as source content for a new HTML deck. The skill does not edit .pptx files in place.

Ambiguous? Ask. Especially: from-prompt with images (intent classification per Phase 0); multiple file uploads (which is primary content). For from-files vs iterate specifically — if {RUN_DIR}/slides.md exists from this session → iterate (edit slides.md, re-render); if user uploaded a fresh file → from-files. If both, ask which is canonical.

The Six Phases

Every mode runs phases in order — do not skip. 0: Context Inspection (mode-specific) → 1: Socratic Clarification (most important) → 2: Narrative & Outline → 3: Visual Style Selection → 4: Content Deep Expansion → 5: Format → Render → QA.

Exceptions: iterate skips Phases 1-3 (prior run already settled them — load slides.md, apply diff in Phase 4, re-render in Phase 5).

Phase 0: Context Inspection

Mode-specific — don't ask the user things you can find out yourself. Each mode reference describes its own Phase 0 in detail (file extraction, URL fetching via WebFetch or auth-source skills like dingtalk-docs, .pptx extraction via extract_pptx.py, data-file summary stats, etc.). Read the matching mode reference before continuing.

Phase 1: Socratic Clarification (MANDATORY)

Do NOT generate any outline, slide content, narrative, or visual until Phase 1 is complete.

Walk through 7 base dimensions. For each: if the user already answered → skip; if unclear → ask ONE multiple-choice question via AskUserQuestion (proper option chips). Wait for the answer before the next dimension.

#	Dimension	Suggested options
1	Purpose	Make a decision / Understand a concept / Be persuaded / Be moved / Learn to do something
2	Audience	Colleagues / Boss or leadership / Clients or investors / Students / General public / Mixed
3	Setting	Live presentation / Big screen / Email attachment / Printed handout / Web page
4	Length	~5 min (≤8 slides) / ~15 min (≤15 slides) / ~30 min (20-30) / Open
5	Key facts	What must absolutely appear (data, names, quotes, dates) — open-ended
6	Tone	Formal & rigorous / Professional & calm / Warm & friendly / Bold & striking / Quiet & human
7	Existing assets	Data, images, brand guidelines, reference material — open-ended

from-prompt adds an 8th dimension — "likely pushback" — for pitches/proposals/consequential recommendations only. Defined in references/modes/from-prompt.md. Do NOT ask in from-files or iterate.

Rules for asking: one question per message; multiple-choice with 2-5 options; priority order 1 → 2 → 3 → 5 → 6 → 4 → 7, stop early when you have enough; don't ask what's already clear from uploaded files; ambiguous answers warrant follow-ups.

After clarification, present a brief summary back. Optionally classify a scenario (one of: business_pitch / work_report / product_intro / academic / education / creative_proposal / data_analysis / personal_sharing) — it is recorded in slides.md frontmatter and used to pick narrative arcs in Phase 2. If none cleanly fits, say so explicitly and propose the closest with reasoning. Wait for explicit confirmation before Phase 2.

Phase 2: Narrative & Outline

Stage 2a — narrative arc (2-3 alternatives). Based on scenario, propose 2-3 narrative arcs. Read references/patterns.md § "Narrative Arcs by Scenario" for curated arcs per scenario.

Do not skip 2a just because the user gave a structure. A section list is not a narrative arc. "deck has 4 parts: market / product / team / ask" is a table of contents — the arc question is still open (open with market pain? bold product claim? founder story?). Treat user-provided structure as content scope, then still ask which arc to use. Only skip 2a when the user explicitly named an arc.

Stage 2b — outline. After the arc is chosen, generate slide-by-slide outline with: slide number + slide kind in user-facing language (cover / divider / content / data / Q&A / closing — translate; never expose the internal slide_type enum); action title for every content/data slide (full sentence stating takeaway, never a noun-phrase label — single biggest quality lever); 1-3 sentences describing the slide. Chunk the outline when there are natural stage boundaries or when length × complexity overwhelms a single review pass.

Wait for outline approval before Phase 3.

Phase 3: Visual Style Selection (Show, Don't Tell)

Never let the user describe a style in words. "Modern" / "clean" / "professional" mean something different to every user. Instead: pick 3 candidates from assets/styles/_index.md (42 presets), render the same sample slide in each, combine into one compare.html via combine_style_previews.py, auto-open in the browser, ask via AskUserQuestion with 4 fixed slots (3 presets + "Browse all 42"; "Other" auto-slot handles custom descriptions like "MUJI" / "ink wash"). Full procedure including custom-style persistence in assets/styles/_index.md. Record the chosen preset in slides.md frontmatter style_preset:.

Phase 4: Content Deep Expansion

Expand the Phase-2 outline into full per-slide content. Trust the model on this step, but give it good context and good principles — and flag the high-risk information explicitly for the user to verify. Full procedure in references/patterns.md § "Phase 4 Expansion Principles & Checklists":

Four positive principles (convergence not divergence; source-priority; elaboration allowed, fabrication forbidden; depth tracks source density).
Key Information Checklist (10 categories the user must verify — numerals, direct quotes, person+title, org/product names, dates, source attributions; rankings, causal claims, scope, comparison numbers).
High-risk slides table (data / case-study / quote / conclusion / cover).
Delivery callout to paste at Phase 5e along with the file paths.

Image placement: see references/images.md (single source of truth — intent classification, content vs decorative filter, rules by slide_type, size budgets).

Scope: from-files and from-prompt. iterate skips (only iterates affected slides).

Phase 5: Format → Render → QA

5a. Pick a format — two-way choice:

A. HTML slideshow — browser, keyboard arrows, easy link-share.
B. HTML + PDF — same HTML plus a 1280×720 .pdf (one slide per page) for email/Wechat/handout.

Defaults: from-files → HTML + PDF; from-prompt → HTML (or HTML+PDF if Phase 1 dim 3 said "email attachment"). Record in slides.md frontmatter output_format: ([html] / [html, pdf]) so iterate re-renders in the same format. HTML + PDF mechanics in references/render/html.md § "HTML + PDF output".

5b. Build slides.md — Markdown body + YAML frontmatter, ## sN slide blocks. Spec in references/slides-md-spec.md.

5c. Render — read the HTML rendering reference (references/render/html.md), including the Layout Playbook section and the CJK Content Adaptation section (mandatory when deck content is ≥30% Chinese/Japanese/Korean — see the four rules in that section). For each slide: apply the layout selection matrix to pick the right layout, run content triage against capacity rules, and enforce visual hierarchy before generating HTML.

HTML: generate directly using the style preset file assets/styles/<preset>.md (contains CSS variables, typography scale, component HTML recipes, layout patterns, Do's/Don'ts, and CJK notes). Single self-contained HTML, 1280×720, inline CSS/JS, keyboard nav, print rules.

5d. Visual QA — references/render/html.md has the QA loop. HTML: python {SKILL_DIR}/scripts/qa/qa_serve.py <deck.html>. Fix obvious issues; re-verify in proportion to what changed.

Self-check before delivery (in addition to visual QA):

Read takeaway of every content/data slide in order. Any topic-label rather than a claim → fix per Phase 4 rules.
Skim test: titles alone should narrate the deck.
Chart consistency: for each data slide, verify takeaway count claims ("5 大场景", "three key drivers") match the chart's x_labels count and enumeration.
Image completeness (from-files mode only): if extracted/images_manifest.json exists, verify every image is either referenced in a slide body or deliberately tagged editorial: "skip" with a reason in the manifest.
Image rendering audit (HARD RULE): if images_manifest.json exists, run this check on the generated HTML:
```
EXPECTED=$(python3 -c "import json; m=json.load(open('{RUN_DIR}/extracted/images_manifest.json')); print(sum(1 for x in m if x.get('editorial')!='skip'))")
ACTUAL=$(grep -c 'data:image\|<img src=' "./<deck-slug>.html")
echo "Expected ≥ $EXPECTED images in HTML, found $ACTUAL"
```
If ACTUAL < EXPECTED, images were lost during rendering. Go back to the Image Preparation step in references/render/html.md, regenerate images_ready.json, and re-render the missing images into the HTML. Do not deliver until this check passes.
Spot-check the high-risk categories from Phase 4 against the source.

5e. Deliver — paste the file path(s) in chat (default ./<slides-slug>.html and optionally ./<slides-slug>.pdf). Don't try to attach binaries. Paste the delivery callout from references/patterns.md § "Delivery callout" to align expectations on what the user must spot-check.

Critical Reminders

The user can interrupt and change direction at any point. Mid-Phase-4 "let's restart" → go back to Phase 1.
If unsure at any phase, ask. Asking is cheap. Building the wrong thing is expensive.
Don't narrate the phases to the user ("Now entering Phase 3"). Just do the work.
Don't use this skill's terminology in user-facing messages (slide_type, action_title, takeaway, slides.md, "ghost deck test" stay internal).
Languages: respond in whatever language the user uses; keep deck content in the user's language.

Reference Files (Progressive Disclosure)

Read only when the matching phase needs them — don't preload.

File	When to read
`references/modes/<mode>.md`	Phase 0, after mode is identified
`references/extract/*.md`	from-files, per file type (docx, pdf, md, images, pptx, xlsx, json)
`references/patterns.md`	Phase 2 (narrative arcs) + Phase 4 (expansion principles, anti-patterns, language-native phrasing, delivery callout)
`references/images.md`	Phase 4 image placement rules
`assets/styles/_index.md`	Phase 3, to pick 3 candidates and run the preview procedure
`assets/styles/<preset>.md`	Phase 5c HTML, for CSS variables, typography scale, component recipes, layout patterns, Do's/Don'ts, CJK notes
`references/slides-md-spec.md`	Phase 5b, before writing slides.md
`references/fixtures/slides_minimal_valid.md`	Anytime you need a canonical validating example
`references/render/html.md`	Phase 5c HTML rendering + QA
`assets/viewport-base.css`	Phase 5c HTML, viewport scaling system

Dependencies

Bundled scripts (under scripts/): extract/extract_docx.py / extract/extract_pdf.py / extract/extract_pptx.py, qa/combine_style_previews.py, qa/qa_serve.py.
Python: python-docx, pdfplumber, pymupdf, defusedxml, pandas (for .xlsx/.csv extraction).
Optional: Playwright for HTML→PDF.

Note: HTML rendering is done by you (the AI) — no external script needed. Read the style and component references and generate the HTML directly.