vla-remote-train-eval

star 178

End-to-end remote VLA-Arena operations on GPU servers: collect user-provided host/path/repo/eval parameters, clone required repos, configure fixed HF mirror endpoint (https://hf-mirror.com) and centralized caches (HF_HOME/UV_CACHE_DIR/etc.), install uv, download Hugging Face models, patch train/eval YAMLs, run per-model uv environments for training/evaluation, split jobs across GPUs, and debug logs/results. Use when user asks for remote setup, model download, benchmark execution, or troubleshooting dependency/runtime failures.

PKU-Alignment By PKU-Alignment schedule Updated 3/7/2026

name: vla-remote-train-eval description: End-to-end remote VLA-Arena operations on GPU servers: collect user-provided host/path/repo/eval parameters, clone required repos, configure fixed HF mirror endpoint (https://hf-mirror.com) and centralized caches (HF_HOME/UV_CACHE_DIR/etc.), install uv, download Hugging Face models, patch train/eval YAMLs, run per-model uv environments for training/evaluation, split jobs across GPUs, and debug logs/results. Use when user asks for remote setup, model download, benchmark execution, or troubleshooting dependency/runtime failures.

VLA Remote Train Eval

Overview

Execute VLA-Arena remote setup, training, and evaluation in a repeatable way. Do not assume host, mount path, suite, or trial count. Ask user first and execute with those values.

Workflow

1. Collect execution targets from user

Collect these required inputs before running commands:

  • remote_host: SSH target host alias or IP
  • root_dir: remote working root (for repo/models/cache)
  • repo_url: repository to clone
  • task_suite_name: eval suite name (all or specific suite)
  • num_trials_per_task: eval trial count

Collect these optional inputs when needed:

  • task_level
  • gpu_plan (model-to-GPU mapping)
  • log_dir
  • models_dir

2. Bootstrap remote environment

From local machine, run:

ssh <remote_host> "bash -lc 'cd <root_dir> && /bin/bash -s -- <root_dir> <repo_url>'" \
  < skills/vla-remote-train-eval/scripts/bootstrap_remote.sh

Or SSH first and run directly:

ssh <remote_host>
cd <root_dir>
bash /path/to/VLA-Arena-pub/skills/vla-remote-train-eval/scripts/bootstrap_remote.sh \
  "<root_dir>" "<repo_url>"

This step must ensure:

  • repo exists under <root_dir>
  • cache roots are under <root_dir>/cache
  • HF_ENDPOINT=https://hf-mirror.com
  • uv is installed and on PATH

3. Download models to user-provided model directory

Inside remote repo:

cd <repo_dir>
bash skills/vla-remote-train-eval/scripts/download_vla_arena_models.sh \
  "<models_dir>" "VLA-Arena"

This workflow downloads all model repos under a user-provided HF author/org (for this project typically VLA-Arena) via the fixed mirror endpoint.

4. Patch evaluation configs with user-provided benchmark parameters

Inside remote repo:

cd <repo_dir>
bash skills/vla-remote-train-eval/scripts/apply_eval_defaults.sh \
  . "<task_suite_name>" "<num_trials_per_task>"

Optional level override:

bash skills/vla-remote-train-eval/scripts/apply_eval_defaults.sh \
  . "<task_suite_name>" "<num_trials_per_task>" "<task_level>"

5. Run train/eval with per-model uv projects

Never use envs/base for model train/eval dependencies. Always use model-specific projects to avoid conflicts (for example OpenVLA vs OpenPI transformers versions).

Run template:

uv sync --project envs/<model_env>
uv run --project envs/<model_env> vla-arena eval --model <model_cli_name> --config vla_arena/configs/evaluation/<config>.yaml

Model mapping:

  • openvla: env envs/openvla, model openvla, config openvla.yaml
  • openvla_oft: env envs/openvla_oft, model openvla_oft, config openvla_oft.yaml
  • univla: env envs/univla, model univla, config univla.yaml
  • smolvla: env envs/smolvla, model smolvla, config smolvla.yaml
  • openpi: env envs/openpi, model openpi, config openpi.yaml
  • openpi_fast: env envs/openpi, model openpi, config openpi_fast.yaml

6. Split jobs across GPUs

Use CUDA_VISIBLE_DEVICES plus separate logs:

mkdir -p <log_dir>
CUDA_VISIBLE_DEVICES=<gpu_id> nohup uv run --project envs/openvla \
  vla-arena eval --model openvla --config vla_arena/configs/evaluation/openvla.yaml \
  > <log_dir>/openvla.log 2>&1 &

Do not run openpi and openpi_fast concurrently unless result/log paths are explicitly separated.

7. Report status in the user’s preferred style

Always provide:

  • whether it was tested (yes/no)
  • whether errors occurred (yes/no, plus last key stack trace)
  • active process status (ps -fp <pid>)
  • result files and timestamps (ls -lt results/*.json)
  • short metric summary when requested

Guardrails

  • Use vla-arena eval / vla-arena train; do not use python -m vla_arena.cli.main.
  • If repo is cloned from source, skip task downloading by default.
  • Download tasks only when user explicitly asks to update task assets.
  • Prefer setting unique result_json_path for openpi_fast; default naming may collide with openpi.
  • Keep cache dirs on the user-provided large-capacity path (<root_dir>/cache).
  • Keep HF endpoint fixed at https://hf-mirror.com unless user explicitly requests a different endpoint.

Troubleshooting Reference

For known failure patterns and fixes, read:

  • references/debug-playbook.md
Install via CLI
npx skills add https://github.com/PKU-Alignment/VLA-Arena --skill vla-remote-train-eval
Repository Details
star Stars 178
call_split Forks 15
navigation Branch main
article Path SKILL.md
More from Creator
PKU-Alignment
PKU-Alignment Explore all skills →