explainer

star 2

Turn a topic (or source document) into a visually dynamic HTML explainer deck and a narrated vertical video, end-to-end, using only local/free tools (Kokoro TTS, torchaudio forced alignment, Playwright, ffmpeg) plus this Claude session. Use when the user wants to "make an explainer video", "turn this into a Short/ Reel/TikTok", "create an explainer deck", or "/explainer <topic>". Supports topic-only OR source-driven (ingest a PDF/URL and frame a real figure/screenshot); aspects 9:16, 16:9, 4:5; fixed theme. Generation only — it writes a labeled output dir + manifest.json; it does NOT post to social platforms.

nemock By nemock schedule Updated 6/1/2026

name: explainer description: >- Turn a topic (or source document) into a visually dynamic HTML explainer deck and a narrated vertical video, end-to-end, using only local/free tools (Kokoro TTS, torchaudio forced alignment, Playwright, ffmpeg) plus this Claude session. Use when the user wants to "make an explainer video", "turn this into a Short/ Reel/TikTok", "create an explainer deck", or "/explainer ". Supports topic-only OR source-driven (ingest a PDF/URL and frame a real figure/screenshot); aspects 9:16, 16:9, 4:5; fixed theme. Generation only — it writes a labeled output dir + manifest.json; it does NOT post to social platforms.

/explainer

Generate an explainer deck + narrated 9:16 video from a topic. You (Claude) do the generation stages (research, scripting, deck authoring); a pure-Python pipeline does the deterministic media stages (narrate → align → render → mux). No paid APIs.

Architecture rule (do not violate)

You only author structured JSON (script.json, deck.json, optional meta.json) and wiki nodes. You never write raw deck HTML — the deck engine renders from deck.json, which preserves the determinism contract (PRD §8.6). The media pipeline makes zero LLM calls; once the JSON exists, it renders unattended.

Environment

  • Package: the cloned explainer-system repo (run explainer from there).
  • Interpreter: a Python venv with Kokoro / torch / Playwright / ffmpeg installed.
  • Console command (editable-installed into that venv): explainer (on PATH once the venv is active). If unavailable, use PYTHONPATH=<repo>/src python -m explainer.cli.
  • Shell discipline (CLAUDE.md): the media command is synchronous — run it in the foreground and let it finish (~50s for a ~20s video). No polling, no backgrounding.

Steps

1. Intake

Parse the topic and any flags (default aspect 9:16, fps 30, voice af_heart). Unless the user said --yes, briefly confirm: the angle/hook, target length, and aspect. One cheap confirmation, then proceed.

2. Scaffold

explainer scaffold "<slug>" --title "<title>" [--aspect 9:16] [--theme midnight] [--brand ACME] [--voice-source operator]

Themes (a family of looks — vary them across a channel, PRD §8.5): midnight (default, cool dark), paper (light), sunset (warm dark), forest (green dark), mono (yellow on near-black). Each carries a default motion personality. This creates outputs/<date>_<slug>/project.json and prints the project dir. Use that dir for everything below.

2b. Ingest source material (source-driven runs only)

If the user gave a PDF or URL, ingest it into sources/ + citations.json:

explainer ingest <project_dir> --pdf <path> [--pages "1-3,5"]
explainer ingest <project_dir> --url <url> [--full-page]

This extracts text and renders framed screenshots/figures. To feature a specific figure (not a whole page), it's fine to render a tight clip with PyMuPDF (page.get_pixmap(matrix=fitz.Matrix(3,3), clip=fitz.Rect(...))) into sources/. Read the extracted text to ground the script; reference an image in a figure slide (below).

3. Research (+ wiki)

  • First reuse prior knowledge: read wiki/INDEX.md and any relevant wiki/source-fact/* nodes so you don't re-research what's already captured.
  • Then use WebSearch/WebFetch to gather and verify current facts. Prefer primary sources.
  • Capture what you learn as wiki nodes (provenance compounds across videos):
    explainer wiki source "<source title>" --root . --topic "<topic>" --ref "<url-or-path>"
    explainer wiki fact "<short fact name>" --root . --topic "<topic>" \
        --body "<the atomized claim>" --source "<source slug>" --confidence high
    
  • Every claim that ends up on screen or in narration should trace to a fact you can cite.

4. Author script.json + deck.json

Pick a hook archetype for slide 1 (bold claim · question · surprising stat · "you've been doing X wrong" · visual reveal). The first slide must front-load the payoff — no title-card throat-clearing.

script.json — narration per slide (the slide field is the slide id, matched in deck.json):

{ "segments": [
  { "id": 0, "slide": "s1", "text": "<hook line — spoken>" },
  { "id": 1, "slide": "s2", "text": "<...>" }
] }

Write acronyms naturally ("MCP", "AI", "GPT-4") — the pronunciation lexicon speaks them as letters/words while captions still show the acronym. Add a <project>/lexicon.json ({"token": "spoken form"}) for any term the default lexicon misses. Spell out numbers you want read a certain way (e.g. "ninety seven million").

deck.json — one slide per id. Pick the device that proves each point — the full catalog (schemas + when-to-use) is docs/visual-devices.md. Devices: narrative (hook · statement · build · reframe · highlight · punch · define · list · payoff · quote), data (stat · ring · statgrid · progress · pictograph · trend · delta · diagram · ranked · compare · timeline · waterfall · matrix · steps), source (figure), brand (cta, auto).

{ "title": "<deck title>", "slides": [
  { "id": "s1", "type": "hook", "kicker": "<label>", "headline": "<text>", "accent": ["word"] },
  { "id": "s2", "type": "stat", "kicker": "<label>", "value": "90%", "label": "<context>" },
  { "id": "s3", "type": "statgrid", "stats": [ { "value": "$2M", "label": "<a>" }, { "value": "3x", "label": "<b>" } ] },
  { "id": "s4", "type": "progress", "value": "73%", "label": "<context>" },
  { "id": "s5", "type": "diagram", "kicker": "<label>",
    "bars": [ { "label": "<a>", "value": 0.9, "kind": "good" }, { "label": "<b>", "value": 0.3, "kind": "bad" } ] },
  { "id": "s6", "type": "compare", "left": { "title": "<a>", "value": "<x>", "kind": "bad" },
    "right": { "title": "<b>", "value": "<y>", "kind": "good" } },
  { "id": "s7", "type": "steps", "steps": [ { "title": "<step 1>" }, { "title": "<step 2>" } ] },
  { "id": "s8", "type": "quote", "quote": "<verbatim line>", "attribution": "<who>" },
  { "id": "s9", "type": "figure", "kicker": "<source>", "image": "sources/<file>.png", "caption": "<desc>" },
  { "id": "s10", "type": "payoff", "headline": "<text>", "accent": ["word"], "subkicker": "<a · b · c>" }
] }

More devices (same slides array — see docs/visual-devices.md for when-to-use + every field):

  { "id": "n1", "type": "build", "headline": "<text>", "accent2": ["word"] },
  { "id": "n2", "type": "reframe", "before": "It's not about", "strike": "luck", "after": "timing" },
  { "id": "n3", "type": "highlight", "headline": "<text>", "mark": ["key", "phrase"] },
  { "id": "n4", "type": "punch", "word": "Ship.", "kind": "good" },
  { "id": "n5", "type": "define", "term": "<term>", "definition": "<gloss>" },
  { "id": "n6", "type": "list", "kicker": "3 truths", "items": ["<a>", "<b>", "<c>"] },
  { "id": "d1", "type": "ring", "value": "73%", "label": "<context>", "kind": "bad" },
  { "id": "d2", "type": "pictograph", "filled": 9, "total": 10, "label": "<context>", "kind": "bad" },
  { "id": "d3", "type": "trend", "kicker": "MRR", "points": [2,3,5,8,13], "end_label": "$13k", "kind": "good" },
  { "id": "d4", "type": "delta", "from": "$10k", "to": "$40k", "change": "+300%", "kind": "good" },
  { "id": "d5", "type": "ranked", "bars": [ { "label": "<a>", "value": 0.9, "display": "90%", "kind": "bad" } ] },
  { "id": "d6", "type": "timeline", "events": [ { "date": "Jan", "label": "<a>" }, { "date": "Sep", "label": "<b>" } ] },
  { "id": "d7", "type": "waterfall", "start": { "label": "Q1", "value": 40 },
    "steps": [ { "label": "Churn", "value": -12, "kind": "bad" } ], "end": { "label": "Q2", "value": 58 } },
  { "id": "d8", "type": "matrix", "x_axis": ["Low effort","High effort"], "y_axis": ["Low impact","High impact"],
    "points": [ { "label": "Ship", "x": 0.2, "y": 0.85, "kind": "good" } ] }

stat/statgrid/delta count their numbers up; progress/diagram/ranked bars grow; ring/trend arcs + lines draw on; steps/statgrid/list/timeline reveal in sequence — all driven deterministically by renderAt(t). figure frames an ingested screenshot (white card); image is relative to the project root. waterfall/matrix need ≥ 5s dwell — the engine auto-fits their columns/labels to the aspect, but they carry the most to parse, so favor ≤ 4 steps/points with short labels (see the doc).

Vary your devices (anti-monotony). Across a deck use 5+ distinct device types, and never repeat the same data device unless the data demands it — a deck that reuses one bar chart reads as templated. Use quote for verbatim talk-time one-liners; use stat when a single number carries the point.

Each slide may set "transition" (rise · fade · pop · slide) to override the theme's default intro motion. Vary it across slides — don't repeat the same transition on every slide (the §8.4 anti-repetition rule); repetition reads as "templated". Rules: every slide has motion by construction; accent/accent2 highlight words by the theme colors; keep headlines tight (they auto-shrink past ~60 chars). Keep ids identical across the two files. Aim for 4–6 slides for a ~20–40s Short.

4a. talk-time READ — write the script in the operator's real voice

If the brand carries a talk_time block ({tag, library?}), don't write generic AI prose — ground the script in the operator's own documented takes, stories, and quotes (a private "talk-time library" the operator maintains; it is not bundled with this tool). This is independent of who speaks (Kokoro or operator VO): it shapes the words. If the brand has no talk_time block, skip this step and write the script normally.

Surface the relevant material (read-only — never writes, never fabricates):

explainer talktime --brand <SLUG> [--topics "keyword,keyword"]

The library path comes from the brand's talk_time.library, a --library flag, or the EXPLAINER_TALKTIME_LIBRARY env var. It parses that library's INDEX.md, filters entries by the brand tag (+ optional topic keywords), and prints candidate quotes / positions / anecdotes / topics with absolute paths. Then:

  1. Read the candidate files you'll draw on (Read tool, the printed abspaths).
  2. Quote VERBATIM from quotes.md — use the exact one-liners as spoken lines.
  3. ADAPT freely from positions/ and anecdotes/ — paraphrase the reasoning/story into tight script prose.
  4. NEVER fabricate a take, stat, or story not in the library. If the library is thin on the topic, say so and write only what the source material supports (or narrow the angle).
  5. Watch the per-entry brand tags + any notes in INDEX.md (e.g. "overused — rotate", NDA cautions) — honor them.

Use --topics to narrow to the deck's subject; omit it to see everything tagged for the brand. This applies to both the daily (Kokoro) and weekly (operator VO) tiers.

5. (optional) meta.json for the manifest

Author meta.json with a summary + per-platform captions so the downstream poster has what it needs (this tool still does NOT post):

{ "summary": "<1-2 sentences>",
  "per_platform": [
    { "platform": "tiktok", "caption": "<hook-first caption>", "hashtags": ["#ai","#rag"],
      "link_placement": "none", "primary_asset": "video", "aspect": "9:16" }
  ],
  "sources": ["<url>", "<url>"] }

6. Render (pure-Python, synchronous)

explainer media outputs/<date>_<slug>

Runs narrate → align → deck → render → mux → manifest → qa and writes results.json. If a stage fails, it prints failed_stage; re-run a single stage with explainer <stage> <dir>.

The qa stage (motion/pacing) reports warnings in work/qa.json: visual dead air during speech (held frames while narrating — add motion or split the shot), over-long shots, and uniform cut rhythm. Read the warnings; if dead air is high, tighten pacing or split slides and re-render. Warnings are advisory, not fatal.

Decks include a subtle drifting ambient glow by default (keeps motion alive between word highlights → near-zero dead air). It roughly doubles render time (compositing the glow layer); set "ambient": false in project.json for ~2× faster renders when speed matters.

7. Report

Tell the user the output dir and the key artifacts:

  • deck/index.html (standalone, openable deck)
  • video/explainer_9x16.mp4
  • captions/captions.srt / .vtt
  • manifest.json (ready_for_post, AI-disclosure, per-platform captions) Spot-check one rendered frame in work/frames/ to confirm layout/legibility before declaring done.

Aspects, platforms & length

  • --aspect 9:16|16:9|4:5|1:1, or render several at once with --aspects "9:16,1:1" (one project → one MP4 per aspect; layout is robust across aspects).
  • --platform <tiktok|reels|shorts|threads|linkedin|youtube|square> sets the aspect + a safe-zone bottom inset (captions clear the platform's UI chrome) and, where relevant, a default min length (e.g. tiktok ⇒ 60s).
  • --min-length <seconds>: if the rendered narration is shorter, the manifest gets a length_warning and ready_for_post:false. Meet it by deepening the script with a sourced beat (a new example / fact), never by padding (PRD §7) — then re-render.

8. Validate + hand off (boundary stops here)

  • explainer validate <dir> — confirm the manifest is a complete, consistent handoff contract (videos exist, captions present, per-platform aspects rendered, disclosure set).
  • explainer handoff <dir> — emit handoff.json: per-platform blotato-ready post specs (absolute media_file, composed text, title for YouTube, ai_label). A poster (the blotato-crosspost skill) consumes it: upload media_filecreate_post per entry. This tool never posts. The ai_disclosure block maps to the poster's AI toggle (e.g. TikTok's isAiGenerated) — keep it set so publishes are compliant.

Optional: music bed

Set "music": "<path>" (and optionally "music_gain": 0.16) in project.json to mix a low royalty-free bed under the narration (recommended for 9:16; off for 16:9/deck). No audio ships with the tool — provide your own vetted, licensed track.

Branding & call-to-action (--brand <SLUG>)

Pass a brand slug to stamp the video with a brand and a CTA. Resolution is local-first then global: ./brand/<SLUG>/ (the content project you run from) → $EXPLAINER_BRAND_DIR/~/.claude/explainer-brands/<SLUG>/. A brand folder holds brand.json + assets:

{ "name": "ACME Co",
  "logo": "logo.png",            // transparent PNG — small corner watermark on EVERY slide + larger on the CTA
  "product": "product.png",      // optional — e.g. a book cover, shown on the CTA slide
  "watermark_corner": "bl",      // bl | br
  "accent": "#5b8cff",           // optional — tints the theme accent to brand color
  "lexicon": { "acme.example": "ACME dot example" },  // optional — brand-specific pronunciations
  "talk_time": { "tag": "<brand-tag>", "library": "/abs/path/to/library" },  // optional — operator's private take-library (step 4a)
  "cta": { "headline": "Get the thing.", "subkicker": "Out now",
           "url": "acme.example",
           "spoken": "Check out ACME — link in bio." } }

When --brand is set: assets are copied into the output dir (self-contained), the logo watermarks every slide in the safe-zone corner, and a CTA end slide is auto-appended (product + larger logo + headline/subkicker/url) with the cta.spoken line auto-narrated and synced. You don't author the CTA slide/segment — the pipeline adds them from the brand (author your own {"id":"cta","type":"cta"} slide / slide:"cta" segment only to override). The url is on-screen text only — the tool still never links out or posts.

Rotating CTAs (--cta <variant>). A brand folder may hold a hand-editable cta_library.json with named variants (e.g. book, newsletter), each with its own headline/subkicker/url/spoken + optional product image. explainer scaffold … --brand FFW --cta newsletter picks one; with no --cta the library default (then brand.json cta) is used. Lets the operator maintain rotating CTAs by hand without touching code — and a routine just passes the variant it chose.

Voiceover mode — the operator's real voice (--voice-source operator)

For higher-production pieces, the narration can be the operator reading the script, instead of Kokoro. The pipeline is audio-first, so a real recording aligns to the slides exactly like Kokoro does. Be a coach through this loop — keep it seamless, no external apps:

  1. Scaffold with --voice-source operator (everything else is the same).
  2. Author script.json + deck.json as usual. Offer a quick review checkpoint — let the operator tweak the script before recording (re-recording is the costly step).
  3. Record (integrated). Run explainer record <project_dir> — but launch it in the background so the conversation keeps moving. It opens a browser teleprompter that records the mic per segment (record / re-record / playback), saving straight into voiceover/. Tell the operator: "Teleprompter's open — read each segment, re-record any you flub, then click Finish." The command returns when they click Finish; its result lists recorded / missing segment ids.
  4. If any segment is missing, tell them which and re-launch record (already-recorded segments stay ✓). Don't proceed until all are recorded.
  5. Render. Run explainer media <project_dir> — the operator-narrate path assembles the clips, runs them through the local audio-cleanup skill (podcast-grade, −14 LUFS), aligns, and renders.
  6. Report the finished video and ask "what do you think?"

The tool can also run fully end-to-end with Kokoro (--voice-source kokoro, default) — give it a topic and it returns a finished video. Voiceover mode is the interactive, co-built tier.

Out of scope (current phase)

Music beat-sync, operator --interview voice capture, C2PA embedding (needs c2patool + a signing cert — disclosure is currently carried in the manifest + poster AI toggle), automatic min-length deepening (you do it), layout variants within a template — later phases (see PRD). Don't fake them.

Install via CLI
npx skills add https://github.com/nemock/video-explainer-system --skill explainer
Repository Details
star Stars 2
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator