name: processing-wwdc-session description: Use when the user drops an Apple Developer session video URL (developer.apple.com/videos/play/...) to research into the blog's workspace, or adds a screenshots/ folder to an already-processed session dir. Builds workspace/wwdcNN/{num}-{slug}/ with transcript, code, meta, notes, and a synthesized digest; compresses slide screenshots and folds their info back into the digest.
Processing a WWDC Session
Overview
Turns an Apple Developer video URL into a research folder under the gitignored
workspace/ so a later session can draft blog posts from one synthesized file
(digest.md). Two phases: A. Fetch & build runs the moment a URL is dropped.
B. Screenshots runs later, when the user adds screenshots/ — slides are not
available at fetch time, so the digest is written first and enriched afterward.
workspace/ is gitignored: this produces zero shippable changes. Do NOT touch
src/data/blog/, public/llms.txt, src/assets/, or any SEO file — those are for
published posts only.
Model policy — the digest must be Opus
Fetching, scraping, and file scaffolding (Phase A steps 1–5, Phase B steps 1–2) are
mechanical and fine on any model — run the whole skill on Sonnet if you like.
digest.md is the one high-judgment synthesis step and MUST be produced on Opus.
- If you (the orchestrator) are not running on Opus, spawn a subagent with the
Opus model to do it (Agent/Task tool,
model: opus). Give it the session dir; it readstranscript.md,meta.md,code.md(plusscreenshots/in Phase B) and writes/updatesdigest.md. - If you are already on Opus, write/update the digest directly.
- Never create or update the digest on a smaller model.
This applies to BOTH the initial write (Phase A step 6) and the screenshot enrichment (Phase B step 3).
Phase A — Fetch & build the entry
Trigger: a developer.apple.com/videos/play/wwdcYYYY/NNN URL.
- Derive paths. Collection dir is
workspace/wwdc26/(this blog's WWDC 2026 set). Session numberNNNfrom the URL; slug = short kebab of the title (e.g.241-foundation-models). Createworkspace/wwdc26/{NNN}-{slug}/. - Transcript.
ToolSearch→select:mcp__sosumi__fetchAppleVideoTranscript, call it withpath: /videos/play/wwdcYYYY/NNN. Writetranscript.mdwith YAML frontmatter (title, source, session, collection, duration, fetched, via: sosumi.ai) — raw Apple words, attribution intact. - Summary + Code tabs. These are JS-rendered tabs, not in the transcript and not
reachable by WebFetch. Use Playwright:
ToolSearch→select:mcp__plugin_playwright_playwright__browser_navigate,mcp__plugin_playwright_playwright__browser_evaluate.browser_navigateto the URL, then ONEbrowser_evaluatereading both (all tabs live in the DOM — no clicking):- Summary:
document.querySelector('.supplement.summary')?.innerText - Code:
Array.from(document.querySelectorAll('.supplement.sample-code pre')).map(p => p.innerText)
- Summary:
- Write
meta.md— description, key topics, chapter summary (from the Summary tab), a "Related sessions to fetch" checklist (talks named in this one), pointer tocode.md. Writecode.md— each snippet under a## HH:MM — labelheading in a fenced block. Clean any// Copy Code/// Insert code snippetplaceholder text that bleeds into the extraction. - Write
notes.md— a stub for the user's own analysis (blog angles, open questions, code to reproduce). Keep separate fromtranscript.mdso drafts pull from the user's words, not Apple's. - Write
digest.md(Opus only — see Model policy) — comprehensive single-file synthesis: frontmatter listing sources, TL;DR, sections following the chapters, all code blocks, an Open Questions section, and blog angles. Leave[shot: HH.MM.SS]references out until Phase B (or add them only if screenshots already exist). If you are not on Opus, spawn an Opus subagent to read steps 1–5's files and write this. - Update
workspace/wwdc26/README.md— add the session's tracker row.
If the target folder already exists, confirm before overwriting.
Phase B — Screenshots post-completion step
Trigger: a screenshots/ folder appears in an existing session dir (the user took
slide captures and dropped them in).
- Compress in place — run, from the repo root:
.agents/skills/processing-wwdc-session/compress-screenshots.sh <session-dir>/screenshots(the script is bundled with this skill). It downscales + palette-quantizes each PNG to ≤200 KB (band 100–200 KB), preserving PNG format. Keep the originalScreenshot … HH.MM.SS.pngfilenames — do NOT rename, do NOT make.webpsiblings, do NOT move intosrc/assets. - Read every screenshot (the Read tool renders images). Extract what the
transcript and Code tab do NOT capture: diagram structure, on-screen API values,
demo token counts/durations, the CLI surface, and any code visible on a slide that
is missing from
code.md. - Fold back into
digest.md(Opus only — see Model policy) — add[shot: HH.MM.SS]references at the relevant points (theHH.MM.SSis the time in the screenshot's filename, used as a stable handle to the file — it is NOT the video timecode). Add genuinely new facts, and flag discrepancies where a slide contradicts the transcript/Code tab. Add any slide-only code tocode.mdwith a// from screenshots/…note. Update the README row if its screenshot/digest columns change. If you are not on Opus, spawn an Opus subagent that reads the screenshots and rewrites the affected digest sections.
Group labs — transcribe from the SD video audio
Group labs (the 8xxx series) ship no sosumi transcript and no Summary/Code tab, but
the session page does publish a downloadable recording. Generate the transcript locally
from the video audio, then run the normal Opus digest step. This replaces the old
"write a _No transcript available. stub" for any lab whose video has posted.
Tooling (already installed): whisper-cli (Homebrew), model
/Users/jetbrains/Developer/whisper.cpp/models/ggml-large-v3.bin, VAD model
/Users/jetbrains/Developer/whisper.cpp/models/for-tests-silero-v6.2.0-ggml.bin,
ffmpeg, curl. Helper scripts in workspace/wwdc26/_tools/: process_lab.sh
(chains the whole media pipeline), whisper_to_transcript.py, check_transcript.py.
Per lab:
- Get the SD URL. WebFetch the session page (
/videos/play/wwdc2026/NNNN/) and read the SD Video link from Resources — it is in static HTML, so WebFetch works and you can probe many labs in parallel (no Playwright needed). The CDN URL carries a per-session UUID, so you must fetch it per lab; you cannot construct it. "Available soon" / "Live …" pages have no download link yet — skip and revisit later. - Download + extract.
curl -sLthe SD mp4 (~240 MB), thenffmpeg -vn -ar 16000 -ac 1 -c:a pcm_s16le→ 16 kHz mono WAV. - Transcribe — VAD +
-mc 0are MANDATORY, not optional:
Plain large-v3 (no VAD, default context) falls into a repetition hallucination loop on lab audio — in testing it emitted one phrase for 47 of 63 minutes (75 % of the session), and the output looks fine segment-by-segment. VAD skips the non-speech that triggers the loop;whisper-cli -m <large-v3> -f NNNN.wav -l en \ --vad --vad-model <silero> -mc 0 -et 2.8 -oj -of NNNN -t 8-mc 0stops the decoder conditioning on its own repeats. VAD is also faster (skips silence). - GATE: check for loops.
python3 _tools/check_transcript.py NNNN.jsonexits 1 if any phrase runs ≥8 consecutive segments or is ≥15 % of all segments. Never build a digest off a transcript that fails the gate. On failure, re-run whisper / inspect the audio — do not proceed. - Convert to Zoom-style transcript. `python3 _tools/whisper_to_transcript.py NNNN.json
/transcript.md --title " " --session NNNN --source --fetched ` — merges whisper's choppy segments into sentence-level numbered cues (`N` / `HH:MM:SS --> HH:MM:SS` / text). large-v3 does **not** diarize: do **not** invent speaker labels; capture the panel roster from the intro instead. Frontmatter records `via: whisper.cpp ggml-large-v3` + `transcription: machine-generated, no speaker diarization`. - Delete the mp4 + wav (~240/116 MB each; scratch).
- meta.md + digest.md (Opus — see Model policy). Same as Phase A steps 4–6, but the
digest must capture a live Q&A: panel roster, every question→answer exchange,
and an "unconventional facts & takeaways" section for the off-the-cuff details that
only surface in Q&A (corrected misconceptions, hard numbers, candid limitations).
code.mdstaysn/a— labs show no code. - README row: Transcript = 🎙️ (machine transcript), Digest = ✅.
process_lab.sh NUM SLUG "TITLE" SD_URL does steps 2–6 (download → extract → whisper →
gate → convert → cleanup) and exits non-zero if the gate fails.
Scaling many labs. whisper saturates the GPU, so run it serial — one lab at a time
(loop process_lab.sh over the lab list in a background batch). Parallelize only the cheap
parts: probe SD URLs with parallel WebFetch up front, and dispatch the Opus digest
subagents in waves (each reads its own transcript.md, writes meta+digest) while the next
whisper runs. Never run two whisper processes at once. A lab is ~6–9 min of whisper
(faster with VAD) + ~1 min download.
Bulk mode — many sessions at once (parallel subagents)
When a whole catalog is dropped (dozens of URLs), do NOT run Phase A serially per session. Split the work so the one shared resource (the Playwright browser) never races and the 20–60 KB transcripts stay OUT of the orchestrator's context:
- Build a manifest first. Parse the link list into
{num, title, url, folder}(clean, stopword-stripped slugs) and create all folders. Always dispatch using the manifest's folder names — hand-typed slugs drift from the created dirs and make thebrowser_evaluatefilenamewrite fail with ENOENT. - One fetcher subagent at a time (serial, OWNS the browser): for each session
browser_navigate+ ONEbrowser_evaluatethat saves Summary/Code to<folder>/_supplement.jsonvia thefilenameargument (keeps the payload out of your context). Batches of ~16. Never run two fetchers at once — the single browser instance will race and cross-contaminate pages. - Parallel synthesis subagents (Opus — see Model policy) — one per session, waves
of ~8. Each fetches its OWN sosumi transcript, reads
_supplement.json, and writes all five files. They use sosumi + file tools ONLY (no browser), so a synthesis wave can run concurrently with the NEXT fetcher batch running in the background. - Normalize after each fetch batch.
browser_evaluatesaves its result double-JSON-encoded (a JSON string wrapping the object) — decode twice (json.loads(json.loads(...))) once so synthesis reads clean JSON. - Sync trackers from disk, don't hand-maintain. A script that scans folders (digest present? supplement summary/code non-empty?) idempotently regenerates the README table and the link-list ✅ markers — survives context summarization.
- Labs / keynotes have no sosumi transcript. Group labs return "Transcript not found"
from sosumi and have no Summary/Code tab — if the lab's video has posted, transcribe it
from the SD audio (see Group labs — transcribe from the SD video audio above), not a
_No transcript available._stub. Only stub a lab whose video is still "Available soon." Some keynotes genuinely have no transcript — stub those.
Give each synthesis subagent a one-line dispatch pointing at a shared instruction file
plus NUM/TITLE/URL/FOLDER, so prompts stay tiny and consistent across the fleet.
Overlap rule: synthesis (sosumi) + the next fetcher (browser) may run concurrently;
two browser users may not.
Quick reference
| Need | Tool |
|---|---|
| Transcript | mcp__sosumi__fetchAppleVideoTranscript (path /videos/play/wwdcYYYY/NNN) |
| Summary + Code tabs | Playwright browser_navigate + browser_evaluate on .supplement.summary / .supplement.sample-code pre |
| Bulk fetch | ONE serial fetcher subagent (browser) → _supplement.json via filename; synthesis in parallel Opus subagents (sosumi only) |
| Compress slides | .agents/skills/processing-wwdc-session/compress-screenshots.sh <session-dir>/screenshots (pngquant + ImageMagick, ≤200 KB, PNG kept) |
| Group-lab transcript | WebFetch SD URL → _tools/process_lab.sh NUM SLUG "TITLE" URL (curl + ffmpeg + whisper-cli VAD + -mc 0 + gate + convert). whisper serial; never two at once |
| Loop check (gate) | _tools/check_transcript.py NNNN.json — exit 1 = hallucination loop, do NOT digest it |
| Stream-only session (Keynote) | no SD/HD mp4, only an HLS .m3u8 (read <video>/<source> src via Playwright). _tools/process_stream.sh NUM SLUG "TITLE" M3U8 AUDIO_FMT — yt-dlp grabs the audio-only rendition (audio-stereo-aac-128-English, NOT the audio-description track), then same ffmpeg→whisper→gate→convert. yt-dlp -F <m3u8> lists renditions |
| Write/update digest | Opus only — if orchestrator isn't on Opus, spawn an Opus subagent (Agent/Task model: opus) |
| Tracker | workspace/wwdc26/README.md |
Per-session files: transcript.md, meta.md, code.md, notes.md, digest.md, screenshots/.
Common mistakes
- Producing or updating the digest on a smaller model — the digest is the deliverable and the only real synthesis step; it must be Opus. Steps 1–5 (and screenshot compression) can be any model. If you're on Sonnet, spawn an Opus subagent for the digest — don't write it yourself.
- Skipping screenshot compression — raw Retina captures are ~2.5 MB each. Always run the script; preserve PNG.
- Renaming screenshots — the
[shot: HH.MM.SS]handle is the filename's time. Renaming breaks every reference. Keep the originals. - Not updating the digest after reading screenshots — Phase B exists to enrich the digest; reading the images without folding new facts (and discrepancies) back in defeats the point.
- Treating
[shot:]as a video timecode — it is the screenshot filename's clock time, just a file handle. - Touching shippable files —
workspace/is gitignored scratch. Nollms.txt,src/data/blog/, orsrc/assets/edits when processing a session. - Trusting WebFetch for the Summary/Code tabs — they are JS-rendered; use Playwright. (The download/SD links are static HTML, so WebFetch is fine for those.)
- (Group lab) Transcribing without VAD +
-mc 0— plain large-v3 hallucination-loops on lab audio (one phrase for most of the session) and looks fine per-segment. Always pass--vad --vad-model <silero> -mc 0, then run thecheck_transcript.pygate before any digest. - (Group lab) Inventing speaker labels — large-v3 has no diarization. Keep cues unattributed; capture the panel roster from the intro.
- (Group lab) Running two whisper processes at once — whisper saturates the GPU; concurrency just contends. Serial whisper; parallelize SD-URL probes and Opus digests only.
- Leaving
// Copy Codeplaceholder text incode.mdfrom thepreextraction. - (Bulk) Running two browser fetchers at once — the Playwright instance is shared and single; concurrent navigations race. Exactly one fetcher; parallelize only the sosumi synthesis.
- (Bulk) Hand-typing folder slugs per dispatch — they drift from the created dirs and the
filenamewrite fails (ENOENT), silently dropping that supplement. Always read folders from the manifest. - (Bulk) Forgetting the double-encode —
browser_evaluate's saved_supplement.jsonis a JSON string wrapping the object; parse twice or synthesis sees a string, not{summary, code}.
Verification
- Folder has
transcript.md,meta.md,code.md,notes.md,digest.md. - Every screenshot is valid PNG and ≤200 KB (
find screenshots -name '*.png' -size +204800creturns nothing). - Screenshot filenames unchanged; each
[shot:]indigest.mdresolves to a real file. digest.mdreflects slide-only facts and flags any slide↔transcript discrepancies.README.mdtracker row is current.