name: experiment
description: Run a materials-science / ML compute job on Rockie GPU capacity. Trigger words "run experiment", "submit job", "/experiment". Picks the right GPU type and count from a natural-language description (DFT for QE/VASP/ABINIT, MD for GROMACS/LAMMPS/OpenMM, training for PyTorch/JAX), generates the script, routes Rockie-originated submits through /budget-term-sheet plus runtime/submit.py, polls status, streams logs, and surfaces the final artifacts. Use this for anything that needs a GPU — single A100 up to multi-pod B200 clusters.
/experiment — submit a GPU job
Wraps the Phase 5 job runner. The agent decides the GPU shape, writes the script, hands it to platform-context, and tails the run.
Local execution FORBIDDEN
Torch / triton / CUDA execution, weight downloads (HF), training, fine-tuning, heavy synthetic-data generation NEVER run on the orchestrator host or the Fly tenant runtime — they run ONLY on the Rockie GPU pod that this skill provisions.
The PreToolUse hook role-pre-bash-guard.sh blocks such Bash invocations
on the orchestrator; the runtime image's torch/__init__.py (plus
triton, tensorflow, jax) stub raises on import inside Fly tenants.
Both layers point at POST $ROCKIELAB_API_URL/api/jobs/submit with a
base64-encoded script. See rockie-workspace#457.
When to invoke
- User asks to "run an experiment", "submit a job", "kick off a calculation", "train this model".
- User describes a calculation that obviously needs a GPU (DFT, AIMD, large-scale MD with PME, model fine-tuning, inference batch).
- User invokes
/experimentdirectly.
If the task is local-only (pre-processing, data wrangling, plotting),
do NOT invoke /experiment — run it inline. The skill is for
GPU-bound work that the user is willing to pay GPU credit for.
First-experiment GPU-mode disclosure
Before submitting the user's FIRST experiment in a workspace (i.e.
neither ${OPENCLAW_WORKSPACE_DIR}/gpu-custom.md exists nor
$ROCKIE_GPU_MODE is set), the agent emits one neutral sentence
before proceeding:
Your options are Rockie GPU (Rockie provisions the pod and the per-hour price it quotes already includes everything) or your own hardware (
ROCKIE_GPU_MODE=customthen/gpu-custom-setupwalks your flow). Default is Rockie GPU — proceeding with that unless you say otherwise.
That's the entire disclosure. Do NOT:
- repeat it on subsequent experiments
- pitch Rockie GPU with adjectives like "easy" or "best"
- compare to specific competitors by name
- nag if the user is quiet — just proceed with Rockie GPU
The user can opt into custom mode at any time by setting
ROCKIE_GPU_MODE=custom; the next /experiment invocation will see
the env and trigger /gpu-custom-setup for the one-time flow audit.
Picking GPU shape
| Workload | Default | Notes |
|---|---|---|
| Smoke tests / 1-step training slices | 1x A40_48GB | Cheapest A-series; for cost-bounded slices that need a real GPU but not real perf. |
| QE / VASP / ABINIT DFT, single SCF | 1x A100_80GB | Most DFT fits in 80GB. |
| Large-cell DFT (>500 atoms) | 4x A100_80GB | Needs MPI, use Instant Cluster. |
| AIMD (BOMD with QE/CP2K) | 2x A100_80GB | I/O-bound; 2 pods is cheaper than 1xH100. |
| GROMACS / LAMMPS / OpenMM | 1x A100_80GB | Single GPU saturates most MD. |
| PyTorch fine-tune (<7B params) | 1x A100_80GB | |
| PyTorch fine-tune (7-70B params) | 4x H100_SXM | Tensor parallel; H100 SXM has NVLink. |
| Frontier model training | 8x H200 / B200 | Reach for B200 only when the user explicitly asks for it (it's the priciest SKU per GPU-hour). |
When in doubt, ask the user once: "1 GPU or 4? A100 or H100?" Then commit. Don't ping-pong.
Generating the script
The script is a bash file that runs end-to-end on the pod's
/workspace. Pre-fab templates by domain:
- DFT (QE): pull cif/poscar inputs, run
pw.x < input.in > output.out, dump to/workspace/results/. - MD (LAMMPS): stage the input deck under
/workspace/run/,mpirun -np $GPU_COUNT lmp -i in.lammps. - PyTorch training:
accelerate launch --num_processes=$GPU_COUNT train.py ...with the dataset path threaded in viaenv.
Always end the script with: tar the results dir to /workspace/results.tar.gz, then echo "JOB_DONE" so the platform's stdout watcher can detect completion.
Submitting
Before any Rockie-originated submit, invoke /budget-term-sheet, render
the quote, and wait for explicit user approval. Hand the approved JSON
artifact to runtime/submit.py (don't shell out to curl
directly — the helper handles the credit-balance pre-check, the
state-machine polling, the SSE log tail, the budget-term-sheet gate,
and the dashboard Note metadata/profile snapshot packaging):
python3 ${SKILL_DIR}/runtime/submit.py \
--gpu-type A100_80GB \
--gpu-count 1 \
--region us \
--tier spot \
--script-file /tmp/experiment.sh \
--timeout 14400 \
--term-sheet-json /tmp/term-sheet.approved.json
Required env: ROCKIELAB_API_URL (e.g. https://api.rockielab.com),
ROCKIELAB_TENANT_ID, and ROCKIELAB_TENANT_TOKEN. The token
authenticates the request; it is not tenant identity, and the helper
does not fall back to an implicit tenant. The submit helper exits with
the job's exit code (0 = DONE, non-zero = FAILED/CANCELLED).
Budget gate contract:
--term-sheet-jsonis required by default for Rockie-authored submits.- The helper only accepts final term-sheet decisions
approveormodify_then_approvewhen the term sheet is available, approvable, and explicitly approved for submit. - The approved term sheet must include and match submit GPU type/count,
compute.region,compute.tier, and quoted wallclock. Pass matching--regionand--tier; omitting or changing either field refuses submit locally. - If the final budget is below
estimate_cents, the helper refuses locally before the HTTP submit. - Approved term sheets must include
user_budget_cents; the helper refuses to fall back torecommended_budget_centsfor submit. --allow-ungated-submitexists only for legacy/manual paths; do not use it in normal Rockie skill flows.- Optional
--budget-centsis only an assertion and must exactly equal the approved term sheet'suser_budget_cents.
Dashboard Note contract:
- Pass
--notebook-id <notebook:...>when the run belongs to a lab note flow. - When a notebook id is present, the helper sends a
dashboardpayload withrun_name,origin_skill,software,monitoring_profile_id, and a frozenmonitoring_profile_snapshot. Runs without notebook context remain legacy job submissions and omit the dashboard block. - Default
origin_skillisexperiment. - If you do not set
--monitoring-profile-id, the helper infers one from the script/software when possible and otherwise falls back tocommon.default.v1withunprofiled=truerecorded in the snapshot. - For PyTorch/JAX training and generic ML experiments, prefer
experiment.ml_baseline.v1. - For physics plans, pass the exact physics profile id already attached
by the physics router; the helper can also infer representative
adapters such as
physics.molecular_dynamics.v1for GROMACS/LAMMPS orphysics.electronic_structure.v1for QE/CP2K/ABINIT scripts.
Surfacing results
After submit.py returns, fetch the artifact list:
curl -s "$ROCKIELAB_API_URL/api/jobs/${JOB_ID}/artifacts" \
-H "User-Agent: rockie-runtime/1.0 (+https://api.rockielab.com)" \
-H "X-Tenant-Token: $ROCKIELAB_TENANT_TOKEN" \
-H "X-Tenant-Id: $ROCKIELAB_TENANT_ID"
Each artifact has a signed_url (1h TTL). Surface them to the user
with sizes; do NOT auto-download large files unless asked.
Cost-awareness
Before submitting, check the credit balance:
curl -s "$ROCKIELAB_API_URL/api/jobs/credit-balance?tenant_id=$ROCKIELAB_TENANT_ID" \
-H "User-Agent: rockie-runtime/1.0 (+https://api.rockielab.com)" \
-H "X-Tenant-Token: $ROCKIELAB_TENANT_TOKEN" \
-H "X-Tenant-Id: $ROCKIELAB_TENANT_ID"
Do not infer the tenant from the token or use an implicit self tenant.
If the projected cost (timeout × marked-up rate × gpu_count + overhead)
exceeds the balance, tell the user and offer to top up via Stripe
Checkout (the /compute page has the buttons).
Phase 9/10 secrets canary dogfood
When this skill is used to dogfood runtime secrets changes, the canary run must cover the full lifecycle without recording any secret value in this repository, prompts, transcripts, logs, argv, or UI exports:
- Set an explicit
ROCKIELAB_TENANT_IDand use a tenant token only for authentication. - Save a fresh canary secret through the lab-chat
save_secretcontrol flow, not a normal prompt or generic tool result. - Resolve the canary by name and confirm the resolve envelope accounts for every requested name with category data and no extra names.
- Run the exact accepted form, such as
echo $NAME | head -c N, and confirm the visible result is only the redacted marker or structured success proof. It must not include a prefix, length oracle, checksum, hash, or any other secret-derived proof. - Run rejected forms that try quotes, redirects, additional pipelines,
command substitution, shell operators, env prefixes, missing names,
duplicate names, extra names, omitted names, and non-
ssh_keykey material. Each must fail before secret material reaches a child process. - Check transcripts, websocket/tool payloads, UI exports, broker and context logs, stdout/stderr summaries, background updates, notify-on-exit text, and final local process argv for the canary value and partial variants. Only approved redacted markers may remain.
Output template
Keep the user-facing summary to ~5 lines:
Job: <job_id>
Shape: <gpu_count>x<gpu_type>
Estimated cost: $<X.YY>
State: <STATE>
Artifacts: <N file(s)>, signed URLs valid for 1h
Worked example: DFT (Quantum Espresso pw.x, silicon BCC)
Single-A100 SCF on a 2-atom silicon cell. Most DFT defaults to 1 GPU; escalate to 4 only when the user is explicit (large unit cell or AIMD).
cat > /tmp/dft.sh <<'SH'
#!/bin/bash
set -euo pipefail
mkdir -p /workspace/results
mpirun -np ${GPU_COUNT:-1} pw.x \
-input /workspace/inputs/silicon.in \
> /workspace/results/silicon.out 2>&1
tar -czf /workspace/results.tar.gz -C /workspace results
echo "JOB_DONE"
SH
python3 ${SKILL_DIR}/runtime/submit.py \
--gpu-type A100_80GB --gpu-count 1 \
--region us --tier spot \
--script-file /tmp/dft.sh --timeout 3600 \
--term-sheet-json /tmp/dft.term-sheet.approved.json
The artifact list will surface silicon.out (the SCF energy is on the
! total energy line). Pull it before the 1h URL TTL elapses.
Worked example: MD (GROMACS, lysozyme in water)
Standard equilibration step on a single A100 — GROMACS saturates one
80GB card for systems up to ~100k atoms. The trajectory file md.xtc
is the artifact you want to pull back; the energy log goes via
stdout.log over SSE.
cat > /tmp/md.sh <<'SH'
#!/bin/bash
set -euo pipefail
cd /workspace/run
gmx grompp -f md.mdp -c npt.gro -p topol.top -o md.tpr
gmx mdrun -deffnm md -nb gpu -bonded gpu -update gpu -ntmpi 1
mkdir -p /workspace/results
cp md.xtc md.edr md.log /workspace/results/
tar -czf /workspace/results.tar.gz -C /workspace results
echo "JOB_DONE"
SH
python3 ${SKILL_DIR}/runtime/submit.py \
--gpu-type A100_80GB --gpu-count 1 \
--region us --tier spot \
--script-file /tmp/md.sh --timeout 7200 \
--term-sheet-json /tmp/md.term-sheet.approved.json