robotwin-create-benchmark - SKILL.md Agent Skill

name: robotwin-create-benchmark description: Create RoboTwin benchmark task envs from a task name, task description, or `/robotwin cb ...` request. Use this skill when a user wants to automate creation of `envs/task_name.py`, discover the RoboTwin project in the workspace, verify project health and render readiness, render preview images, and use vision review to catch unreasonable object placement before optional config generation or data collection. metadata: short-description: Create and preview RoboTwin benchmark env tasks

Robotwin Create Benchmark

Use this skill when the user wants to create or revise a RoboTwin task environment, especially when they describe it as a benchmark task or use a slash-like command such as /robotwin cb ....

What This Skill Owns

Discover the RoboTwin project root inside the user's workspace.
Verify the project is structurally complete enough to add a new env task.
Verify the Python and render environment can actually import and initialize the RoboTwin stack.
Create or update envs/<task_name>.py directly.
Render initial-scene preview images and review them with the model's vision ability.
Optionally write a safe task_config/<task_name>.yml stub.

What This Skill Does Not Own

Do not optimize play_once for real data collection quality.
play_once only needs to be structurally reasonable enough for the task file to exist and to express rough intent.
Treat executable behavior refinement, motion-quality iteration, and code-generation-quality action logic as downstream work owned by RoboTwin's code_gen/ flow.

Trigger Patterns

Use this skill when the user asks for any of the following:

Create a new RoboTwin benchmark env.
Automate writing envs/<task_name>.py.
Use /robotwin cb [task name] [task description] [other demands].
Create a task class from a benchmark description.
Validate object placement by rendering and visually inspecting the scene.

Inputs

Normalize the request into:

task_name
task_description
other_demands

If the user gives a slash-like command, parse it first and then normalize with scripts/normalize_request.py. For raw slash-style parsing, prefer quoted task_name and task_description, or use || as a separator between task_name, task_description, and other_demands.

Required Workflow

Normalize the request.
Discover RoboTwin roots with scripts/discover_robotwin.py.
Run scripts/check_robotwin_health.py on the selected root, including render checks.
Stop and ask the user if there are multiple strong candidates, missing assets, or a failed render environment.
Read references/task_contract.md.
If you need examples, only glance at the fixed files listed in references/fixed_examples.md.
Generate envs/<task_name>.py directly. Do not do a repo-wide similarity search for the closest task.
Validate the task file with scripts/validate_task_env.py.
Render initial-scene previews with scripts/render_preview.py.
Build a preview montage with scripts/build_preview_montage.py when multiple images exist.
Review the rendered image or montage with the model's vision ability using references/vision_review_checklist.md.
If placement is unreasonable, patch load_actors or related setup logic and repeat the preview loop.
Optionally write task_config/<task_name>.yml with scripts/write_task_config_stub.py.
Before any asset download or real collection run, stop and ask the user.

Generation Rules

The generated file must be envs/<task_name>.py.
The class name must exactly match task_name.
The class must inherit Base_Task.
The file must implement setup_demo, load_actors, play_once, and check_success.
setup_demo must call super()._init_task_env_(**kwags).
Prioritize correctness of load_actors, initial placement, and check_success.
play_once may be approximate and does not need to be data-collection-ready.
Prefer a small number of stable references over repo-wide similarity search.

Render And Vision Review Rules

Render review is mandatory before calling the task creation complete.
Environment readiness is also mandatory. If SAPIEN or the renderer cannot initialize, do not claim the task is verified.
The first required preview is the initial scene after setup_demo, not after a full successful play_once.
Review the image for floating objects, penetration, off-table placement, unreachable positions, bad target poses, edge-risk placement, and embodiment mismatch.
If the review result is warn or fail, keep iterating or stop to ask the user when the issue is preference-sensitive.

When To Pause And Ask The User

More than one RoboTwin root looks plausible.
No healthy RoboTwin root is found.
Default assets appear missing and downloading may be needed.
The render environment cannot initialize.
The task description is too vague to determine placement or success criteria.
The preview looks wrong but the fix depends on user preference.
The user has not confirmed whether to continue into data collection.

Scripts

scripts/normalize_request.py
scripts/discover_robotwin.py
scripts/check_robotwin_health.py
scripts/generate_task_env.py
scripts/validate_task_env.py
scripts/render_preview.py
scripts/build_preview_montage.py
scripts/write_task_config_stub.py

References

references/task_contract.md
references/fixed_examples.md
references/vision_review_checklist.md

Output Contract

The usual outputs are:

envs/<task_name>.py
task_config/<task_name>.yml when requested
.codex/robotwin_cb/<task_name>/preview_seed_*.png
.codex/robotwin_cb/<task_name>/preview_montage.png
.codex/robotwin_cb/<task_name>/vision_review.json

If you cannot complete render verification, say so clearly and explain whether the blocker is project health, missing assets, or Python/render environment failure.