name: jlpt-n5-listening-variation-creator description: "Create variations of real JLPT N5 listening questions from YouTube clips. Reads data.json (and optional PNG) from tobeprocessed/, applies surgical entity swap, rewrites Japanese dialogue, generates Imagen 3 image prompts, writes TTS script, outputs derived-data.json, and moves the folder to processed/. Use when the user asks to 'process a clip', 'create a variation', or 'generate listening question from clip'."
JLPT N5 Listening Variation Creator
You are an expert JLPT N5 content engineer. Your goal is to create original variation listening questions by swapping entities in real clips while strictly preserving the underlying logic and structure.
Core Directives
- CRITICAL: Concurrency Control (Locking): Before ANY other action, you MUST acquire an atomic lock.
- List folders in
tobeprocessed/. - Select a folder that does NOT contain
processing.lock/ORderived-data.json. - Immediately run:
mkdir "path/to/clip_folder/processing.lock". - If the command fails, the folder is already taken. Start over.
- Never read
data.jsonor start analysis without a successful lock.
- List folders in
- CRITICAL: STOP FOR MANUAL APPROVAL: After generating
image.pngand performing self-validation (Step 10), you MUST STOP. Show the image and the dialogue to the user and WAIT for explicit approval. Never run the tester skill, never generate audio, and never move the folder without the user saying "Onaylıyorum" or "Approved". - Preserve Logic Pattern: If the source is a "Reconsideration" question, the variation must remain a "Reconsideration" question.
- Surgical Entity Swap: Change entities (objects, colors, quantities, times, locations) using
backend/vocabulary/data/n5_vocabulary.jsonas the primary N5 vocabulary source. - Read PNG When Available: If a Screenshot PNG is present in the clip folder, read it to understand the original question's visual composition and spatial layout.
- No Audio Analysis: Do NOT attempt to read or analyze
audio.mp3. - Output to derived-data.json: Save the variation as
derived-data.jsoninside the clip folder. - Handover to Tester: After finishing Step 10 (JSON and Image ready), do NOT move the folder. Leave it in
tobeprocessed/for thejlpt-n5-listening-variation-testerskill to finalize and move. - Self-Validate: After writing derived-data.json and generating image.png, run the validator script and visually check the image. Fix any errors.
- Clean Logic Formatting: When referencing panels in
logic.trorlogic.en, use ONLY the panel number in parentheses, e.g.,(3). NEVER use the word "Panel" or labels like "Distractor_A".
Input Structure
backend/listening/data/selectImage/listening-youtube-data/tobeprocessed/
clip_XX_XXmXXs_XXmXXs/
audio.mp3 ← IGNORE — do not read
data.json ← PRIMARY INPUT: dialogue, logic, analysis
Screenshot *.png ← READ if present: understand visual composition
.done_slice ← IGNORE
processing.lock/ ← LOCK: If present, another agent is working here
Output Structure
backend/listening/data/selectImage/listening-youtube-data/tobeprocessed/
clip_XX_XXmXXs_XXmXXs/
...
derived-data.json ← NEW: your output
image.png ← NEW: your output
processing.lock/ ← REMOVE only if you abort; keep if finished
Workflow
Step 1 — INGEST & LOCK
- If the user provides a clip folder name, verify it doesn't have
processing.lock/orderived-data.json. - If no folder is provided, list
tobeprocessed/and select the first folder that:- Does NOT contain
processing.lock/(directory). - Does NOT contain
derived-data.json.
- Does NOT contain
- CRITICAL (Atomic Lock): Immediately attempt to create a directory named
processing.lockinside the folder:mkdir "backend/listening/data/selectImage/listening-youtube-data/tobeprocessed/<clip_folder>/processing.lock" - If the
mkdircommand fails, it means another agent beat you to it. Start Step 1 again and pick the next available folder. - Once the lock is acquired, read
data.jsonto understand: dialogue, logic pattern, key entities. - If a Screenshot PNG is present, read it to understand the original visual scene composition.
Step 2 — ANALYZE
- Identify which of the 6 logic patterns this question uses (see
references/n5-listening-patterns.md). - Identify the critical entities (the objects/attributes being swapped in the traps).
- If PNG was read, determine the
image_typeusing the detection criteria inreferences/imagen3-prompting-guide.md:four_panel_grid— 4 separate equal panels in a 2×2 grid, small numbers 1–4 in the top-left corner of each panelnumbered_scene— single scene with small position numbers 1–4 inside itmap_diagram— top-down street/area map with position numbers 1–4 on buildings- Default to
four_panel_gridif no PNG is present.
Step 3 — SUBSTITUTE (Surgical Swap)
- Select new N5-level entities from
backend/vocabulary/data/n5_vocabulary.json. - Swap entities while keeping the same logic pattern and trap structure.
- Colors are valid attributes — Imagen 3 generates full-color images.
- Verify the trap logic still works with the new entities (each distractor should fail exactly one or two criteria).
- The variation must be meaningfully different from the source clip.
Step 4 — REWRITE DIALOGUE
- Write a new Japanese dialogue using only N5 grammar and vocabulary.
- Preserve the same grammar structures from the original (only entities change).
- Provide a Turkish translation for the intro, dialogue lines, and the question for the
transcription_trsection.
Step 5 — GENERATE IMAGE PROMPT (Imagen 3 / Nano Banana)
- Follow
references/imagen3-prompting-guide.mdrules strictly. - Use the
image_typedetermined in Step 2 to select the correct prompt template:four_panel_grid→ composite 2×2 grid prompt, small numbers 1–4 in the top-left corner of each panelnumbered_scene→ single scene prompt with position numbers 1–4map_diagram→ top-down map prompt with position numbers 1–4
- Record
image_typeinvisual_prompts.image_type. - Record the panel-to-role mapping in
panel_map(decide correct panel position — vary it across questions).
Step 6 — WRITE TTS SCRIPT
- Follow
references/tts-guidelines.mdrules strictly. - Voices:
Intro_Voice,Male_1,Female_1. - Required sequence: intro → 1s break → dialogue (0.5s between turns) → 1s break → question repeat.
- No mixed objects: each entry has EITHER
voice+textORbreak, never both.
Step 7 — OUTPUT derived-data.json
- Write
derived-data.jsonto the clip folder insidetobeprocessed/. - Include both
transcription(Japanese) andtranscription_tr(Turkish) sections. - Follow the schema in
references/derived-data-schema.mdexactly.
Step 8 — SELF-VALIDATE JSON
- Run the mechanical validator script using the skill's specific environment:
skills/jlpt-n5-listening-variation-creator/venv/bin/python3 skills/jlpt-n5-listening-variation-tester/scripts/validate_derived_data.py <clip_folder>/derived-data.json - If PASS: proceed to Step 9.
- If FAIL: read the error output, identify which check(s) failed, fix the JSON accordingly, re-save, and re-run the validator.
- Maximum 3 attempts. If still failing after 3 attempts, stop and report the remaining errors to the user.
Step 9 — GENERATE IMAGE
- Run the image generation script using the skill's specific environment to create
image.pngin the clip folder:skills/jlpt-n5-listening-variation-creator/venv/bin/python3 skills/jlpt-n5-listening-variation-creator/scripts/generate_image.py <clip_folder_name> - The script reads
visual_prompts.image_promptfromderived-data.json, callsgemini-2.5-flash-image, and savesimage.pnginside the clip folder. - Requires
JLPT_IMAGE_GEMINI_API_KEYto be set in the environment. - Wait for confirmation that
image.pngwas saved successfully.
Step 10 — SELF-VALIDATE IMAGE & STOP
- Read the generated
image.pngfrom the clip folder. - Verify:
- image_type match: layout matches declared type (four_panel_grid / numbered_scene / map_diagram)
- Panel content: correct panel shows the answer, distractors show wrong alternatives
- Style: colorful minimalist illustration (or monochrome if requested), flat colors, thick clean outlines, no shading, white background, no borders
- CRITICAL: STOP HERE.
- Display the generated
image.png, the Japanese dialogue, and the Turkish translation to the user. - WAIT for the user to provide explicit approval (e.g., "Onaylıyorum").
- DO NOT run Pass 3.5, Pass 4, Pass 5, or Pass 6 of the tester skill until approval is received.
- If the user requests changes to the image or JSON, apply them and repeat Step 9/10 as needed.
- If any check fails during self-validation: delete image.png, re-run generate_image.py, and re-check.
- Maximum 2 attempts (image generation costs API credits). If still failing after 2 attempts, stop and report the issue to the user.
- Leave the folder in
tobeprocessed/. Notify the user that the variation (JSON and Image) is ready for manual review. Once the user approves, they will use thejlpt-n5-listening-variation-testerskill to finalize it.
Resources
| Resource | Purpose |
|---|---|
backend/vocabulary/data/n5_vocabulary.json |
N5 vocabulary source for entity selection |
references/imagen3-prompting-guide.md |
Imagen 3 prompt engineering rules |
references/derived-data-schema.md |
Output JSON schema |
references/n5-listening-patterns.md |
6 logic patterns with trap design rules |
references/tts-guidelines.md |
TTS voice/break formatting rules |
references/n5-grammar-points.md |
N5 grammar reference |
scripts/generate_image.py |
Calls Gemini API to generate image.png from image_prompt |
../jlpt-n5-listening-variation-tester/scripts/validate_derived_data.py |
Mechanical JSON validator (used in self-validation loop) |