name: image-to-editable-ppt description: Rebuild slide images, image-based or scanned PPT/PPTX files, and PDF decks into object-level editable PowerPoint (.pptx). Use whenever the user provides any visual slide source and wants slides they can edit — "make this PPT editable", "把图片/截图转成可编辑 PPT", "this PDF is a scanned deck, restore it", recreating slides from screenshots, reconstructing slide objects, or preserving speaker notes — even if they do not say "convert". Not for authoring new presentations from scratch.
Image to Editable PPT
Overview
This skill rebuilds visual slide inputs into object-level editable PowerPoint .pptx files.
Inputs can be a single image, multiple images, a PDF, or an image-based PPT/PPTX. The output is always .pptx. The goal is not to wrap a full-slide screenshot inside PowerPoint; the goal is to use the editppt runtime and page-level prompts to decompose, reconstruct, validate, and assemble editable slides.
References
Each rule in this skill has exactly one authoritative home; the other files point to it instead of restating it.
prompts/page-worker.md: execution template for page workers — ownership boundary, execution order, required outputs, and return format. The parent agent uses it when generating page-worker prompts.scripts/build-page-worker-prompt.py: skill-local prompt builder. It readsprompts/page-worker.md, fills run/page paths, writesworker-prompt.md, and prints the dispatch command template.references/cli-helper.md: CLI install check (Pre-Run Check), command tree, and command syntax examples. Read it when deciding whicheditpptcommand to call.references/manifest-schema.md: the single home for JSON field contracts of deck/page/image artifacts — required manifest fields, positioned-object coordinates,validation.json, andpage_result.jsonshapes. Read it when writing or validating any run/page file.references/page-decision-tree.md: the single source of truth for page object decisions — background handling, foreground asset separation, native shapes, formulas, text-hints usage, the final self-check, and the fix-versus-warning split. Read it before reconstructing any page.
Entry Contract
These parent-level rules are stated once here; page-level rules live in the references above and are not restated in this file.
- The
editpptCLI is a required runtime surface. Ifeditppt --helpfails, install it first by following the Pre-Run Check inreferences/cli-helper.mdbefore doing anything else. - First run
editppt prepare <input...>to create a run directory. After that, all key state transitions are advanced only througheditpptcommands; never hand-write run/page state JSON. This keeps run state deterministic and resumable. - Every page — including the only page of a single-page input — is rebuilt by a dispatched page worker. The parent agent only orchestrates and never rebuilds pages itself. If no subagent capability is available, stop and report this to the user; do not degrade into parent-agent page reconstruction.
- The parent agent must not write any page reconstruction artifact —
manifest.json,page.pptx,preview.png,split_assets_contact.png,validation.json, orpage_result.json. These files may only be produced by the page worker that owns the page directory. - All image generation, image editing, background repair, transparent bitmap assets, and asset sheets go through
editppt image generate/edit/batch. - All page object decisions follow
references/page-decision-tree.md, including its no-fallback rule for foreground visual objects and its rule that deterministic validation is a structure gate that never waives an object-source decision. manifest.jsonis the authoritative page build source:editppt run recordvalidatespage.pptxagainst it, andeditppt run finalizerebuilds the final deck from recorded page manifests. Required fields and coordinate contracts are defined inreferences/manifest-schema.md.editppt preparewrites per-page text measurements (text_hints.json/text_hints.png). How page workers consume them is defined inreferences/page-decision-tree.mdsection 3.1.- Page workers are driven by prompts generated from
prompts/page-worker.md.
Roles
The parent agent owns orchestration and user interaction:
- Run
editppt prepare. The image backend is chosen automatically (Codex OAuth first, then API fallback), so the normal path needs no extra backend configuration command. - Drive the run with
editppt run nextthrough dispatch → record → finalize, exactly as the Workflow phases below describe. Single-page inputs follow the same path: one page means one dispatched worker. - Report progress, the final PPTX path, and the validation result to the user.
- Do not repeat page-level visual QA that page workers already completed;
recordandfinalizere-validate deterministically.
Each page worker owns exactly one pages/page_NNN/ directory. Its full contract — ownership boundary, decision order, required outputs, and return format — is the prompt generated from prompts/page-worker.md; the rules it follows live in references/page-decision-tree.md and references/manifest-schema.md.
Workflow
Phase 1: Prepare
Read the prepare examples in references/cli-helper.md and the run/page file descriptions in references/manifest-schema.md.
editppt prepare <input...>
After this completes, there must be a run directory, deck_manifest.json, page_jobs.json, notes_manifest.json, and each page must have source.png plus page_request.json.
Prepare also writes per-page text hints. Whenever editppt doctor or prepare reports that no PaddleOCR token is configured (offline fallback), ask the user once before dispatching any page: a free token from https://aistudio.baidu.com/account/accessToken stored via editppt config --paddle-ocr-token <token> makes the hints content-aware and noticeably improves text fidelity, and editppt run hints <run> regenerates the current run's hints in place. Tell the user the free personal quota is currently more than enough for this skill — applying is risk-free with no extra cost. Wait for their choice; if they decline or want to proceed, continue with the offline hints and do not ask again.
Phase 2: Dispatch Pages
Every prepared page is dispatched to a page worker, single-page inputs included. Read the run/dispatch examples in references/cli-helper.md and call repeatedly:
editppt run next <run>
When the dispatch stage is returned, the following steps are mandatory for each suggested page:
python <skill-root>/scripts/build-page-worker-prompt.py <run> --page <page_id> --out <absolute-run-dir>/pages/<page_id>/worker-prompt.md- Spawn a page worker using the current environment's available subagent/multi-agent tool.
editppt run dispatch <run> --page <page_id> --agent-id <id> --prompt-file <absolute-run-dir>/pages/<page_id>/worker-prompt.md
--out and --prompt-file must be absolute paths to avoid the page directory being prepended again to relative paths. The prompt builder only writes the prompt and prints a dispatch command template; it does not create the worker, so run editppt run dispatch only after a real spawn succeeds.
Concurrency slots come from page_jobs.json.max_concurrent_pages (default 6). In the normal flow prefer editppt run next; editppt run status is only for debugging or manual inspection.
Phase 3: Record
Read the record examples in references/cli-helper.md and the page_result.json description in references/manifest-schema.md.
After a worker returns, run:
editppt run record <run> --page <page_id> --agent-id <id>
This command validates page.pptx against manifest.json before recording. It fails if positioned objects are missing source-pixel coordinates, if the manifest cannot independently rebuild the page, or if validation.json does not contain top-level passed: true — a failed page is never recorded.
Handling a failed page: when a worker returns a failure (passed: false), when run record rejects the outputs, or when a dispatched worker is lost and will not return, do not hand-edit state files and do not rebuild the page yourself. Read the page's validation.json for the failure reason, fix the root cause (for example a missing image-backend login reported by the worker), then run:
editppt run reset <run> --page <page_id>
This returns the page to pending. Then rebuild the worker prompt and dispatch a new worker through the normal Phase 2 steps. Never re-dispatch without changing something first: a worker re-run under identical conditions fails identically. When the same page fails twice on the same root cause, the diagnosis is yours, not the user's — read the failed attempt's validation.json and artifacts, reproduce the failing command yourself if needed, and fix the underlying cause (backend login, missing tools, broken assets) before resetting again. Only surface a problem to the user when it genuinely requires something only the user has (credentials, a paid account decision, the original file); phrase it as the concrete action needed, never as a debugging question.
Phase 4: Finalize
Read the finalize examples in references/cli-helper.md.
When editppt run next <run> returns the finalize stage:
editppt run finalize <run>
finalize treats each recorded pages/page_NNN/manifest.json as the authoritative source: it rebuilds the final deck from page manifests in page order, then validates the resulting PPTX. page.pptx remains a page-level deliverability artifact for record-time checks.
Deck-level structural QA at this stage:
- The PPTX is a valid zip/package.
- Slide count matches the input page count.
- PDF/PPTX page mapping is correct.
- Media relationships are complete.
- All asset files referenced by the manifests exist.
- Media hashes match manifest provenance.
- Speaker notes hashes match.
- There is no invalid full-slide source raster plus editable text overlay pattern.
The final reply must report the final PPTX path and validation result.
State Principles
Agents continue only from file facts and editppt run next. Required states:
pending: created byeditppt prepare; restored byeditppt run resetwhen a page must be re-dispatched.dispatched:editppt run dispatchrecords a real spawned worker.recorded:editppt run recordvalidates required outputs and writes the result; only deliverable pages (validation.jsontop-levelpassed: true) reach this state.accepted/complete: written byeditppt run finalize.
imagegen-jobs.json is the page-local provenance/job record. Only these forced file states are kept:
recorded:editppt image importhas copied the selected output and written hash/metadata.processed:editppt image process-sheethas completed background removal and splitting.
Delivery Principles
- Each page is self-checked once by the page reconstructor; the evidence is written into structured fields in
manifest.jsonand intovalidation.json. - The final output must be a currently openable, structurally valid
.pptx. A full-slidesource.pngwith editable text overlaid on top is not an acceptable fallback. - Whether an imperfection must be fixed inside its page or may ship as a recorded warning is governed by the "Fix versus Warning" section of
references/page-decision-tree.md. A warning may never replace a missing required workflow step.
Updating This Skill
Reinstall through the installation channel, refresh the CLI from the updated skill directory, then restart the agent session and verify:
npx -y skills@latest add ningzimu/image-to-editable-ppt-skill \
--skill image-to-editable-ppt \
--agent <agent-id> \
--global
pipx install --force --editable <skill-root>/cli
editppt doctor