p2v-phase-1-script - SKILL.md Agent Skill

name: p2v-phase-1-script description: Generate and validate video_script.jsonl from a specific paper URL or pasted paper content. Use this when running phase 1 of the paper-to-video pipeline. metadata: short-description: P2V phase 1 script generation

P2V Phase 1: Script

When to use

Use this skill when the user wants phase 1 of paper-to-video: paper input to validated video_script.jsonl.

Inputs

paper URL or full paper text
output run directory (default: outputs/<video_id>-<timestamp>)

Workflow

Run a mandatory preparation pass before drafting any script lines.
Follow these local guides:
- docs/educational-video-pedagogy-framework.md
- docs/00-system-contract.md
Draft one coherent educational script from the preparation results (not directly from raw paper text).
Enforce the contract fields in video_script.jsonl.
Save as video_script.jsonl in the run folder.
Validate:

uv run python -c "from pathlib import Path; from paper2video.contracts.io import validate_artifact; validate_artifact(Path('<video_script.jsonl>'), artifact_type='video_script'); print('video_script contract ok')"

Required output

<run_dir>/video_script.jsonl

Mandatory Preparation Pass (Internal, Phase-1 only)

Before writing the first record, the agent must do this internally:

Scientific extraction
- core claims
- mechanism details
- strongest evidence/ablations
- assumptions, caveats, and failure modes
Pedagogical recomposition
- learner-first sequence (not paper section order)
- narrative arc: hook -> setup -> mechanism -> evidence -> limits -> synthesis
- prerequisite and misconception map
Script planning
- chapter plan with explicit didactic objective per chapter
- segment purpose statements that justify each segment
- duration estimate based on paper complexity, evidence density, and mechanism depth

Do not ask the user for these artifacts. Build them internally, then emit only video_script.jsonl.

Complexity-To-Depth Policy (Required)

Before drafting, assign a complexity tier using paper content:

tier_1 (simple conceptual paper): one main claim, light empirical evidence
tier_2 (moderate): multiple claims, some formal or empirical detail
tier_3 (dense empirical/mechanistic): many experiments/ablations and non-trivial mechanism
tier_4 (very dense): tier_3 plus multiple interacting mechanisms or heavy formal load

Use this mapping for script depth:

tier_1: 700-1100 words (~5-8 min)
tier_2: 1100-1700 words (~8-13 min)
tier_3: 1700-2600 words (~13-20 min)
tier_4: 2400-3600 words (~18-28 min)

For ML empirical papers with broad ablations and mechanism discussion (like grokking-style papers), default to tier_3 unless there is strong evidence for tier_4.

If draft word count is below tier minimum, expand with:

deeper protocol/mechanism walkthrough
evidence decomposition (main curve + secondary curve + failure/negative case)
ablation interpretation and caveats
replication-oriented synthesis

Depth And Specificity Rules

The script must reflect expert-level understanding:

Include concrete paper details where possible:
- dataset/task setup
- model or method specifics
- key experimental findings
- important limitations
Avoid generic summaries that could apply to any paper.
Tie claims to evidence in the narration flow.
Use explicit transitions that preserve technical continuity.
Duration is paper-dependent:
- do not force a fixed runtime target
- simple papers can be shorter
- complex papers should expand enough to cover mechanism and evidence thoroughly
Do not collapse dense papers into a short executive summary.
- if the paper has multiple non-trivial empirical findings, include enough segments to teach each finding causally.

If the current draft feels generic, refine before finalizing.

Narration Voice Rules (Required)

narration_text must sound like an educational video, not a lecture outline:

Never use meta-outline phrasing inside narration text:
- avoid: Chapter 1, Chapter 2, Section, Lecture, In this chapter
Keep chapter metadata in fields (record_type=chapter, chapter_id) but keep spoken text natural.
Prefer direct viewer-facing transitions:
- examples: Now let’s test this on..., Next we inspect..., Here’s the key result...
Avoid production/meta instructions in narration:
- no references to script-writing process, tiers, or internal planning artifacts.

Didactic Density Rules (Required)

Keep the script teachable for video viewers (not only technically correct):

One core idea per narration unit.
- each segment should deliver one primary teaching point plus at most one supporting point.
Control spoken numeric load.
- prefer about, roughly, on the order of in speech.
- keep exact values for only the most important numbers in a segment.
- move secondary precision to visuals/overlays, not spoken prose.
Split dense units.
- if a segment contains more than two major claims, split it into two sequential segments.
Keep recaps short and retrieval-oriented.
- recap units should be concise and phrased as punchline reinforcement.
Appendix-grade statistics are optional in narration.
- exact p-values, detailed correlation coefficients, and low-priority appendix numbers should be omitted from spoken text unless essential to the main claim.