name: vla-remote-train-eval description: End-to-end remote VLA-Arena operations on GPU servers: collect user-provided host/path/repo/eval parameters, clone required repos, configure fixed HF mirror endpoint (https://hf-mirror.com) and centralized caches (HF_HOME/UV_CACHE_DIR/etc.), install uv, download Hugging Face models, patch train/eval YAMLs, run per-model uv environments for training/evaluation, split jobs across GPUs, and debug logs/results. Use when user asks for remote setup, model download, benchmark execution, or troubleshooting dependency/runtime failures.
VLA Remote Train Eval
Overview
Execute VLA-Arena remote setup, training, and evaluation in a repeatable way. Do not assume host, mount path, suite, or trial count. Ask user first and execute with those values.
Workflow
1. Collect execution targets from user
Collect these required inputs before running commands:
remote_host: SSH target host alias or IProot_dir: remote working root (for repo/models/cache)repo_url: repository to clonetask_suite_name: eval suite name (allor specific suite)num_trials_per_task: eval trial count
Collect these optional inputs when needed:
task_levelgpu_plan(model-to-GPU mapping)log_dirmodels_dir
2. Bootstrap remote environment
From local machine, run:
ssh <remote_host> "bash -lc 'cd <root_dir> && /bin/bash -s -- <root_dir> <repo_url>'" \
< skills/vla-remote-train-eval/scripts/bootstrap_remote.sh
Or SSH first and run directly:
ssh <remote_host>
cd <root_dir>
bash /path/to/VLA-Arena-pub/skills/vla-remote-train-eval/scripts/bootstrap_remote.sh \
"<root_dir>" "<repo_url>"
This step must ensure:
- repo exists under
<root_dir> - cache roots are under
<root_dir>/cache HF_ENDPOINT=https://hf-mirror.comuvis installed and on PATH
3. Download models to user-provided model directory
Inside remote repo:
cd <repo_dir>
bash skills/vla-remote-train-eval/scripts/download_vla_arena_models.sh \
"<models_dir>" "VLA-Arena"
This workflow downloads all model repos under a user-provided HF author/org (for this project typically VLA-Arena) via the fixed mirror endpoint.
4. Patch evaluation configs with user-provided benchmark parameters
Inside remote repo:
cd <repo_dir>
bash skills/vla-remote-train-eval/scripts/apply_eval_defaults.sh \
. "<task_suite_name>" "<num_trials_per_task>"
Optional level override:
bash skills/vla-remote-train-eval/scripts/apply_eval_defaults.sh \
. "<task_suite_name>" "<num_trials_per_task>" "<task_level>"
5. Run train/eval with per-model uv projects
Never use envs/base for model train/eval dependencies.
Always use model-specific projects to avoid conflicts (for example OpenVLA vs OpenPI transformers versions).
Run template:
uv sync --project envs/<model_env>
uv run --project envs/<model_env> vla-arena eval --model <model_cli_name> --config vla_arena/configs/evaluation/<config>.yaml
Model mapping:
openvla: envenvs/openvla, modelopenvla, configopenvla.yamlopenvla_oft: envenvs/openvla_oft, modelopenvla_oft, configopenvla_oft.yamlunivla: envenvs/univla, modelunivla, configunivla.yamlsmolvla: envenvs/smolvla, modelsmolvla, configsmolvla.yamlopenpi: envenvs/openpi, modelopenpi, configopenpi.yamlopenpi_fast: envenvs/openpi, modelopenpi, configopenpi_fast.yaml
6. Split jobs across GPUs
Use CUDA_VISIBLE_DEVICES plus separate logs:
mkdir -p <log_dir>
CUDA_VISIBLE_DEVICES=<gpu_id> nohup uv run --project envs/openvla \
vla-arena eval --model openvla --config vla_arena/configs/evaluation/openvla.yaml \
> <log_dir>/openvla.log 2>&1 &
Do not run openpi and openpi_fast concurrently unless result/log paths are explicitly separated.
7. Report status in the user’s preferred style
Always provide:
- whether it was tested (
yes/no) - whether errors occurred (
yes/no, plus last key stack trace) - active process status (
ps -fp <pid>) - result files and timestamps (
ls -lt results/*.json) - short metric summary when requested
Guardrails
- Use
vla-arena eval/vla-arena train; do not usepython -m vla_arena.cli.main. - If repo is cloned from source, skip task downloading by default.
- Download tasks only when user explicitly asks to update task assets.
- Prefer setting unique
result_json_pathforopenpi_fast; default naming may collide withopenpi. - Keep cache dirs on the user-provided large-capacity path (
<root_dir>/cache). - Keep HF endpoint fixed at
https://hf-mirror.comunless user explicitly requests a different endpoint.
Troubleshooting Reference
For known failure patterns and fixes, read:
references/debug-playbook.md