name: michel-ui-demo-recorder description: Record polished UI demo videos and screenshots of a running web app using Playwright MCP — for client deliverables, release notes, feature walkthroughs, or bug repros. Produces an HD WebM video with chapter markers, a mandatory animated cursor overlay, and a mandatory subtitle bar that narrates each step (positioned deliberately so it never masks the UI being demonstrated), plus full-page screenshots at each step. Use this whenever the user asks to "record a demo", "create a screencast", "make a UI walkthrough video", "document this feature with video", "show the client how X works", "capture screenshots of the app", or anything similar — even when the user only says "make a video" or "take screenshots" in the context of a running frontend. Also use it when the user wants to demonstrate a workflow, generate marketing-quality footage of an app, or produce repeatable visual documentation.
UI Demo Recorder
Produce client-ready UI documentation (HD video + screenshots) from a running web app using the Playwright MCP video tools.
When to reach for this skill
- "Record a demo / walkthrough / screencast / video of the app"
- "Take screenshots of the feature working"
- "Show the client what X looks like"
- "Document this flow visually"
- "Make a GIF/video of clicking through Y"
The deliverable is always one or more of:
- WebM video (1440×900 HD by default), with optional chapter cards
- Full-page PNG screenshots at key moments
- An animated cursor overlay so viewers can follow what's being clicked (always present)
- A subtitle bar narrating each step (always present)
Non-negotiable: every video this skill produces MUST carry both the cursor overlay and the subtitle bar. They are injected via browser_evaluate (pure DOM), independent of .mcp.json — a run that omits them is a defective deliverable, not a stylistic choice. If you cannot inject them, stop and report why rather than shipping a bare recording.
Why a skill exists for this
The Playwright MCP video tools work, but they have several non-obvious gotchas that waste a lot of time if you discover them mid-recording:
- The video tools live behind an opt-in flag (
--caps=devtools) and are silently absent if the MCP server wasn't launched with it. - There are usually two different Playwright MCP tool prefixes in the deferred-tools list (a stock one and a plugin one). They use different browser contexts, and only one of them can record video. Mixing them mid-session causes "Browser is already in use" errors.
- The recorder doesn't render the OS cursor — videos look like the app is operating itself unless you inject a fake cursor.
- Any
location.reload()wipes the injected cursor — it must be reinjected. - The output file lands at the project root, not in whatever
--output-dirsays.
This skill codifies the working recipe so the model doesn't relearn it every time.
Pre-flight check (do this first, every time)
1. Confirm the Playwright MCP has --caps=devtools
Open .mcp.json (project root or ~/.claude/.mcp.json). The Playwright server entry must include --caps=devtools in its args. If it doesn't, video tools won't be exposed.
Working config:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": [
"@playwright/mcp@latest",
"--caps=devtools",
"--output-dir=./videos"
]
}
}
}
If you have to add the flag, tell the user — they have to reload MCP (/mcp reconnect or restart Claude Code) before browser_start_video becomes callable.
2. Confirm the right tool prefix
Use ToolSearch for mcp__playwright__browser_start_video. The video tools belong to the mcp__playwright__browser_* family. Use the same prefix for every browser interaction in the recording — browser_navigate, browser_click, browser_type, browser_snapshot, browser_wait_for, browser_evaluate, browser_close. Do not mix in mcp__plugin_playwright_playwright__browser_* — that's a different MCP server with a different browser context; mixing causes "Browser is already in use".
3. Get the app running — from a clean volume
See the michel-run-local-dev-stack skill for the full stack lifecycle. Packmind runs as a Docker Compose stack (PostgreSQL, Redis, NestJS API on container port :3000, React/Vite frontend on container port :4200, MCP server). Those are container-internal ports — from the host the API is reached only through the frontend's Vite proxy, and the frontend's host port is edition-dependent (4200 OSS / 4201 proprietary). For recordings, start from wiped volumes so stale Postgres schema/rows from a prior run don't leak into the footage, then confirm it's serving before recording:
export PACKMIND_EDITION="$(bash scripts/michel/resolve-edition.sh)" # oss | proprietary, from the git remote
docker compose down -v && docker compose up -d --build
PM_WEB="$(docker compose port frontend 4200 | sed 's#.*:##')" # host port: 4200 oss / 4201 proprietary
# wait for the API via the Vite proxy (NOT :3000 — that container port is not host-exposed)
until curl -sf "localhost:$PM_WEB/api/v0" >/dev/null; do sleep 1; done
# then wait for the frontend you'll actually record
until curl -sf "localhost:$PM_WEB/" >/dev/null; do sleep 1; done
The frontend you record lives at http://localhost:$PM_WEB (4200 OSS / 4201 proprietary) — navigate there, not at the API port.
On a freshly wiped database there is no account, so the first screen is the sign-up / login page. Create an account (or seed via the API) before recording the feature, otherwise the demo is just the auth screen. If a populated workspace matters, seed it via the UI or the API after the stack is up — do not record over stale state.
Recording recipe
The recipe in plain English:
- Make sure no browser tab is currently open on the Playwright side, and always call
browser_stop_videofirst (ignore any error it returns) to clear a recorder a prior session may have left open. Delete any stub WebMs it drops at the project root. - Start the video.
- Navigate to the app.
- Inject BOTH overlays — cursor and subtitle bar — in one
browser_evaluate. This step is mandatory, never skip it. - Add a chapter card.
- Drive the UI (click, type slowly, wait for visible feedback). Call
window.__setSubtitle(...)before every step. - Whenever the page reloads, reinject BOTH overlays (cursor + subtitle bar). A reload wipes them.
- Add chapter cards between major sections.
- Stop the video.
- Move the file from project root into
videos/. - Stop the app stack (
docker compose down— see themichel-run-local-dev-stackskill).
Starting the video (with the common pitfall)
Always call browser_stop_video before start_video, even on the first recording — wrap it so its error is ignored. An earlier browser session in this run frequently leaves a screencast open, and start_video then fails with Error: Screencast is already started. Stopping first is idempotent: if nothing was recording it's a harmless no-op; if something was, it clears it.
mcp__playwright__browser_stop_video() # ignore any error; deletes/returns stub WebMs at project root
mcp__playwright__browser_start_video(
filename="<name>.webm",
size={"width": 1440, "height": 900}
)
If you still get Error: Screencast is already started after that, the recorder is wedged to a stale browser context that a second stop_video won't clear on its own. Escalate — do not just retry stop+start in a loop:
mcp__playwright__browser_stop_video()again (ignore error).mcp__playwright__browser_close()— this tears down the wedged context that owns the orphaned screencast.- Delete the stub WebMs it drops at the project root.
- Retry
start_video. The fresh context starts clean.
This is the common failure mode on the fly worker, where an earlier browser session in the same run left both a screencast and its browser context alive — stopping the screencast alone is not idempotent there, because the next start_video reattaches to the same wedged context. Closing the context is what actually resets it.
If you get Browser is already in use for ... use --isolated, a non-recording Playwright session has the persistent profile locked. Call mcp__playwright__browser_close first, then start_video.
Injecting the cursor overlay
The Playwright video recorder is a CDP screencast of the viewport — the real OS cursor is never in the frame. To make the video readable, inject a DOM cursor that listens to mousemove/mousedown/mouseup (Playwright's pointer events do dispatch these in capture phase).
Run the script in scripts/inject-cursor.js (relative to this skill's directory, not the project repo you're working in — the agent cwd is usually a checked-out repo that has no such file) via browser_evaluate immediately after every navigate or reload. Inject the cursor and the subtitle bar (next section) together in a single browser_evaluate so neither is ever forgotten — call both injector bodies in one function:
mcp__playwright__browser_evaluate(function=`() => {
(<contents of scripts/inject-cursor.js>)();
(<contents of scripts/inject-subtitles.js>)({position: 'bottom'});
}`)
The cursor is:
- A blue radial-gradient disk with a soft glow
- Animates with a 220ms CSS transition so jumps look smooth
- Turns red and shrinks on
mousedown - Emits an expanding ring "ripple" on click
Reinject after every location.reload() — there is no initScript equivalent exposed in MCP, so it has to be a manual step.
Chapter cards
mcp__playwright__browser_video_chapter renders a full-screen card with a blurred backdrop over the page for a configurable duration.
Scale to flow complexity — do not use 3-6 cards by default:
- ≤5 action steps → 0-1 cards max (one opener only if the flow needs context)
- 6-12 steps → 2-3 cards (intro + sections)
- 12+ steps → up to 5 cards
Every card adds dead time. For short demos like "create a standard" or "open a recipe", skip chapter cards entirely or use a single 1000ms opener.
Duration is raw recording time — multiply by your speedup factor to get on-screen time. A duration: 2000 card at 4× speedup shows for only 500 ms after ffmpeg — too fast to read. Formula: duration = desired_screen_ms × speedup. For a 2 s on-screen hold at 4×, set duration: 8000. At 2× use 4000, at 3× use 6000.
Subtitle bar (continuous narration) — mandatory
Chapter cards interrupt the action. For inline narration that doesn't hide the UI — short captions that explain each step while the viewer watches the click happen — inject a subtitle bar via scripts/inject-subtitles.js. This is required on every recording, not optional. Bundle it with the cursor injection so both go in with one browser_evaluate (see the combined snippet above), and reinject both after every reload.
The script exposes window.__setSubtitle(text). Call it before each step:
mcp__playwright__browser_evaluate(function="() => window.__setSubtitle('Type the task name')")
Pass an empty string or null to hide the bar.
Pick the position deliberately. The default is bottom-center. A subtitle that sits over the very area the viewer needs to watch defeats the purpose — they'll either miss the action or miss the caption.
- If the action happens in the header/top bar (a form, search input, primary CTA at the top), use the bottom position (default).
- If the action happens in a footer, sticky action bar, or fixed bottom CTA, switch to the top position by passing
{position: 'top'}to the injector, or callwindow.__moveSubtitle('top')mid-recording. - If a single screen has critical content at both top and bottom, reposition between steps with
window.__moveSubtitleso the bar always sits on the inert side of the UI.
The bar fades + slides on text change (280ms) so swaps look intentional rather than glitchy. Reinject after location.reload() along with the cursor.
Scenario script (written narration for the PR) — mandatory
A video link alone is useless to a reviewer who can't or won't play it — typically the Product Manager the demo is for. Every recording MUST ship a short written scenario so the demo flow is understandable from text alone.
You already have the raw material: the captions you fed to __setSubtitle ARE the per-step narration. After recording, distill that sequence into a concise, bullet-point scenario:
- Audience: a non-technical reviewer who may never open the video. They should grasp the whole flow from the bullets.
- Synthetic and essential only —
login → key action(s) → observable result. Drop superficial UI steps (scrolling, opening menus, incidental clicks). Keep what proves the feature works. - One line per bullet. No screenshots-of-text, no implementation detail.
Write it to a sidecar file with the SAME basename as the final webm, in the artifacts dir Michel passes you:
# video: feature-demo.webm → sidecar: feature-demo.scenario.md
cat > "<ARTIFACTS_DIR>/feature-demo.scenario.md" <<'EOF'
- Log in as a Product Manager and open the Standards page
- Create a new standard "React naming conventions" with two rules
- The standard appears in the list; open its detail view
- Edit a rule inline and confirm the change persists after reload
EOF
Michel's publish step finds this sidecar by basename and inlines it directly under the video link in the PR body as What the video shows: — you do NOT edit the PR body yourself. If the sidecar is missing or empty, the PR shows _No scenario available — an error occurred during generation._ under the link instead — a visible defect, so always write it. For uses outside Michel (no automatic PR assembly), hand the same bullets to whoever embeds the video so they go right under the link.
Lead-in dead air
The recorder buffers for a second or two after start_video before useful frames appear. Combined with MCP tool round-trip latency, the first ~10–25 seconds of the WebM can show an empty page or about:blank. Mitigations:
- Set the first subtitle and a chapter card before the first
wait_for— gives the viewer something to read during the lead-in. - Keep the gap between
start_video,navigate, and the inject-overlaysbrowser_evaluateas tight as possible — no intermediate snapshots. - Always trim dead air in post — add
-ss 5before the speedup step (see post-processing section). This cuts the first 5 seconds from the raw file before any speedup is applied, removing blank openers at essentially zero quality cost.
Typing and clicking — make it watchable
browser_type(..., slowly=true)types one character at a time. Use it for any text the viewer should read.- Before each
browser_click, take a freshbrowser_snapshotto read current UI state — but click by text/CSS (next section), not byref=, so a re-render or navigation can't invalidate your target. - After actions that change the DOM, call
browser_wait_for(text=<expected new content>)instead of arbitrary sleeps — recordings made ofwait_forlook like a real user; recordings made of fixed sleeps look robotic. wait_for(text=...)must use the label the UI actually renders, not the one you assume. A guessed string that never appears burns the full 30s timeout. Read the real text from a freshbrowser_snapshotfirst, then wait on it. If a step is flaky to wait on by text, drive it viabrowser_evaluateinstead of waiting.
Do not use snapshot ref= tokens as click selectors
browser_snapshot output contains ref=eXX tokens. They look like selectors but are snapshot-internal handles, not selector-engine names — passing one to browser_click fails with Error: Unknown engine ref. Target elements by text or CSS instead: :text('Add task'), input[name="assignee"], etc. — or drive the interaction with browser_evaluate.
Clearing a text input (there is no browser_triple_click)
The Playwright MCP exposes no browser_triple_click, and browser_type appends — it does not replace existing text. The Packmind frontend is React (Chakra UI / @packmind/ui), so its inputs are controlled. To clear a field (e.g. before retyping into an edit drawer), set the value via browser_evaluate and dispatch an input event so React's controlled state updates:
mcp__playwright__browser_evaluate(function=`() => {
const el = document.querySelector('<selector>'); // e.g. input[name="name"]
const setter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, 'value').set;
setter.call(el, ''); // native setter so React sees the change
el.dispatchEvent(new Event('input', { bubbles: true }));
}`)
Using the native value setter (not el.value = '') is required for React-controlled inputs — a plain assignment is silently overwritten on the next render. After clearing, browser_type(slowly=true) the new text so the viewer reads it.
Chakra checkboxes/radios — click the control part, not the native input
Chakra UI v3 (@packmind/ui, on zag-js) renders a checkbox/radio as a composite: a hidden native <input> plus a visible control element carrying data-part="control". The native <input> is not interactable — clicking it (or the row's checkbox role) is a silent no-op, and you burn snapshot/retry cycles wondering why selection never toggles. Click the visible control part instead, via browser_evaluate:
mcp__playwright__browser_evaluate(function=`() => {
// the visible control toggles selection; the native <input> does not
document.querySelector('table tbody tr:has(a:text("<name>")) [data-part="control"]').click();
// select-all lives in the header row:
// document.querySelector('table thead [data-part="control"]').click();
}`)
This mirrors the e2e Page Objects, which click [data-part="control"] (see apps/e2e-tests/src/infra/pages/StandardsPage.ts). Select/combobox options follow the same idea — pick [data-part="item"] filtered by text.
Waiting on a Chakra dialog/drawer — target the overlay or the toast, never an inner button
After opening or submitting a dialog, do not wait_for an inner button's disabled/hidden state — a condition that never settles burns the full 30s timeout. zag-js carries the open state as data-state on the same node as role="dialog", so:
- Wait for OPEN with one combined selector (a descendant search finds nothing):
[role="dialog"][data-state="open"]. - Wait for the action to FINISH on the success-toast text the app actually renders (e.g.
moved to the selected space), or on the dialog node detaching — not on the button you clicked.
Same patterns the e2e POMs use: AbstractPackmindAppPage.ts for the drawer-open wait, StandardsPage.ts / SkillsPage.ts for toast waits.
When a click doesn't propagate (drag-drop and friends)
The Playwright browser_drag tool dispatches mouse events but does not reliably trigger libraries that use pointer-event sensors (e.g. @dnd-kit's PointerSensor). If you see the drag "land" on the wrong drop zone or no state change happens, don't fight it. Switch to driving state via the app's API:
mcp__playwright__browser_evaluate(function=`async () => {
// fetch + PATCH the relevant endpoint
// then location.reload()
}`)
Then reinject the cursor. The video still shows the UI visually changing — viewers don't see the difference.
Screenshots
Create the target dir before the first capture. browser_take_screenshot throws ENOENT if the directory in filePath doesn't exist — it will not create it. Run mkdir -p screenshots once up front:
mkdir -p screenshots
Use browser_take_screenshot(filePath=..., fullPage=true) for full-page PNGs at moments worth capturing as stills. Number them (01-initial.png, 02-typing.png, ...) so they sort correctly. Save them under screenshots/ inside the project.
Wait for real content before the first screenshot. The Packmind frontend is an SPA — on first paint it shows only a loading spinner, so a screenshot taken right after browser_navigate captures the spinner, not the page. Always browser_wait_for(text=<a label the loaded page renders>) (e.g. the sign-in heading) before capturing, exactly as for the video steps above.
If the project doesn't have ffmpeg/ImageMagick installed and you also want a GIF or MP4 from a sequence of stills (for an environment that can't play WebM), use a Python venv with Pillow (GIF) or imageio-ffmpeg (MP4) — both ship a usable binary so brew install isn't needed.
Stopping and packaging
mcp__playwright__browser_stop_video()
mcp__playwright__browser_close()
The WebM lands at the project root, not in videos/, regardless of what --output-dir says. Move it:
mv ./<name>.webm ./videos/
Stop the app stack you started in the background (docker compose down) — don't leave it running. See the michel-run-local-dev-stack skill for the teardown rationale.
Post-processing: trim dead air and speed up
Playwright recordings run at real-time wall-clock speed, which includes MCP round-trip latency. The raw WebM always feels sluggish. Always apply both steps after moving the file:
Step 1: check raw duration and choose speed multiplier
ffprobe -v quiet -show_entries format=duration -of csv=p=0 ./videos/<name>.webm
Pick multiplier based on raw duration:
- < 60s → 2× (
setpts=0.5*PTS) - 60–120s → 3× (
setpts=0.333*PTS) 120s → 4× (
setpts=0.25*PTS)
Step 2: trim lead-in and apply speedup in one pass
# Replace SETPTS with the value chosen above (0.5, 0.333, or 0.25)
ffmpeg -i ./videos/<name>.webm \
-vf "setpts=SETPTS*PTS,fps=30" \
-an -c:v libvpx-vp9 -b:v 2M \
-ss 5 \
./videos/<name>-fast.webm && mv ./videos/<name>-fast.webm ./videos/<name>.webm
-ss 5 (placed after -i, output-side) trims the first 5 seconds (blank lead-in) accurately. -an drops the silent audio track.
The fps=30 and explicit -c:v libvpx-vp9 -b:v 2M are mandatory — do not drop them. Playwright's recorder is a CDP screencast: a variable-frame-rate stream that emits a frame only when the page changes, leaving large gaps between PTS timestamps. Applying setpts alone and re-encoding with libvpx defaults keeps those sparse frames, producing a file that reports the full duration in its container metadata but holds almost no real frames — it plays for a few seconds then freezes or won't play at all. (Symptom seen in the wild: a 169s source produced a 256 KB "fast" file that still claimed 169s duration but only showed ~4s of content.) The fps=30 filter resamples the VFR stream to a constant 30 fps, regenerating real frames across the whole timeline; the explicit codec + bitrate guarantee a clean VP9 re-encode instead of a degenerate passthrough.
Verify the output before shipping it. A correct fast file's duration ≈ source duration × SETPTS, and its size is a meaningful fraction of the source. Check:
ffprobe -v error -show_entries format=duration,size -of default=noprint_wrappers=1 ./videos/<name>.webm
If the duration didn't shrink by the multiplier, or the file collapsed to a tiny fraction of the source (e.g. 6 MB source → 256 KB output), the re-encode is corrupt — do not hand it over. Re-run with fps=30 present.
If the recording has a real audio track (rare for app demos), remove -an and add -af "atempo=2.0" (only valid for 2×; higher speedups need chained atempo filters).
Why the subtitle minHold is 2500ms: at 2× speed that becomes ≈1.25 s on screen — long enough to read a short caption comfortably. At 3× it becomes ≈0.83 s — still readable for short captions. Do not lower minHold below 2000ms or subtitles will flash too fast after the speed-up. If you skip the ffmpeg step, the default hold will make subtitles feel unnecessarily long — only skip if ffmpeg is not available.
What "good" looks like
- Resolution 1440×900 or larger
- Smooth cursor that's clearly visible against the app's UI — present the whole time (no bare frames)
- A subtitle caption visible for every step, repositioned so it never masks the active UI
- 3–6 chapter cards (intro, 2–4 sections, outro)
- ~15–60 seconds total for focused flows (single feature/action); 60–90 s for multi-section walkthroughs — anything longer should be split or sped up more aggressively
- File size 4–15 MB for typical demos; if it's bigger, the run was too long
- No console errors visible in the video (close DevTools if it was open)
What to tell the user at the end
Hand over:
- The path of the final WebM
- The path of the
<basename>.scenario.mdsidecar (the written scenario that lands under the video link in the PR) - A one-line summary of the chapters
- Anything that was faked (e.g. "drag-drop was driven via the API because dnd-kit doesn't respond to MCP drag synth")
- A reminder that WebM may need a modern browser or VLC to play; offer to also produce an MP4 if the client uses a tool that doesn't accept WebM
Reference material
scripts/inject-cursor.js— paste-ready cursor overlayscripts/inject-subtitles.js— paste-ready subtitle bar with__setSubtitle/__moveSubtitlehelpersscripts/check-mcp-config.sh— one-liner to confirm--caps=devtoolsis setreferences/playwright-mcp-tools.md— table of the relevant tools and their gotchas