name: robotwin-create-benchmark
description: Create RoboTwin benchmark task envs from a task name, task description, or /robotwin cb ... request. Use this skill when a user wants to automate creation of envs/task_name.py, discover the RoboTwin project in the workspace, verify project health and render readiness, render preview images, and use vision review to catch unreasonable object placement before optional config generation or data collection.
metadata:
short-description: Create and preview RoboTwin benchmark env tasks
Robotwin Create Benchmark
Use this skill when the user wants to create or revise a RoboTwin task environment, especially when they describe it as a benchmark task or use a slash-like command such as /robotwin cb ....
What This Skill Owns
- Discover the RoboTwin project root inside the user's workspace.
- Verify the project is structurally complete enough to add a new env task.
- Verify the Python and render environment can actually import and initialize the RoboTwin stack.
- Create or update
envs/<task_name>.pydirectly. - Render initial-scene preview images and review them with the model's vision ability.
- Optionally write a safe
task_config/<task_name>.ymlstub.
What This Skill Does Not Own
- Do not optimize
play_oncefor real data collection quality. play_onceonly needs to be structurally reasonable enough for the task file to exist and to express rough intent.- Treat executable behavior refinement, motion-quality iteration, and code-generation-quality action logic as downstream work owned by RoboTwin's
code_gen/flow.
Trigger Patterns
Use this skill when the user asks for any of the following:
- Create a new RoboTwin benchmark env.
- Automate writing
envs/<task_name>.py. - Use
/robotwin cb [task name] [task description] [other demands]. - Create a task class from a benchmark description.
- Validate object placement by rendering and visually inspecting the scene.
Inputs
Normalize the request into:
task_nametask_descriptionother_demands
If the user gives a slash-like command, parse it first and then normalize with scripts/normalize_request.py.
For raw slash-style parsing, prefer quoted task_name and task_description, or use || as a separator between task_name, task_description, and other_demands.
Required Workflow
- Normalize the request.
- Discover RoboTwin roots with
scripts/discover_robotwin.py. - Run
scripts/check_robotwin_health.pyon the selected root, including render checks. - Stop and ask the user if there are multiple strong candidates, missing assets, or a failed render environment.
- Read
references/task_contract.md. - If you need examples, only glance at the fixed files listed in
references/fixed_examples.md. - Generate
envs/<task_name>.pydirectly. Do not do a repo-wide similarity search for the closest task. - Validate the task file with
scripts/validate_task_env.py. - Render initial-scene previews with
scripts/render_preview.py. - Build a preview montage with
scripts/build_preview_montage.pywhen multiple images exist. - Review the rendered image or montage with the model's vision ability using
references/vision_review_checklist.md. - If placement is unreasonable, patch
load_actorsor related setup logic and repeat the preview loop. - Optionally write
task_config/<task_name>.ymlwithscripts/write_task_config_stub.py. - Before any asset download or real collection run, stop and ask the user.
Generation Rules
- The generated file must be
envs/<task_name>.py. - The class name must exactly match
task_name. - The class must inherit
Base_Task. - The file must implement
setup_demo,load_actors,play_once, andcheck_success. setup_demomust callsuper()._init_task_env_(**kwags).- Prioritize correctness of
load_actors, initial placement, andcheck_success. play_oncemay be approximate and does not need to be data-collection-ready.- Prefer a small number of stable references over repo-wide similarity search.
Render And Vision Review Rules
- Render review is mandatory before calling the task creation complete.
- Environment readiness is also mandatory. If SAPIEN or the renderer cannot initialize, do not claim the task is verified.
- The first required preview is the initial scene after
setup_demo, not after a full successfulplay_once. - Review the image for floating objects, penetration, off-table placement, unreachable positions, bad target poses, edge-risk placement, and embodiment mismatch.
- If the review result is
warnorfail, keep iterating or stop to ask the user when the issue is preference-sensitive.
When To Pause And Ask The User
- More than one RoboTwin root looks plausible.
- No healthy RoboTwin root is found.
- Default assets appear missing and downloading may be needed.
- The render environment cannot initialize.
- The task description is too vague to determine placement or success criteria.
- The preview looks wrong but the fix depends on user preference.
- The user has not confirmed whether to continue into data collection.
Scripts
scripts/normalize_request.pyscripts/discover_robotwin.pyscripts/check_robotwin_health.pyscripts/generate_task_env.pyscripts/validate_task_env.pyscripts/render_preview.pyscripts/build_preview_montage.pyscripts/write_task_config_stub.py
References
references/task_contract.mdreferences/fixed_examples.mdreferences/vision_review_checklist.md
Output Contract
The usual outputs are:
envs/<task_name>.pytask_config/<task_name>.ymlwhen requested.codex/robotwin_cb/<task_name>/preview_seed_*.png.codex/robotwin_cb/<task_name>/preview_montage.png.codex/robotwin_cb/<task_name>/vision_review.json
If you cannot complete render verification, say so clearly and explain whether the blocker is project health, missing assets, or Python/render environment failure.