name: agentgym:run description: > Launch an AgentGym RL training session on this node using the 4090 field runner config (pmoves/configs/agentgym/field-runner-4090.yaml). Agent model: Qwen 3.5 9B via TensorZero (localhost:3030). Publishes episode results to agentgym.episode.completed.v1 on NATS. Environments: BabyAI, TextCraft, Maze, Wordle (lightweight); ALFWorld, SQLGym, WebShop (moderate). Max 300s/15 rounds, concurrency=1. Shift Crew style: NATS event publishing after each episode. disable-model-invocation: true
agentgym:run — AgentGym RL Field Runner (4090)
Launches an AgentGym reinforcement learning session using the 4090 node configuration.
Pre-flight
# Verify TensorZero is running (required for Qwen 3.5 9B)
curl -sf http://localhost:3030/health && echo "TensorZero: OK" || echo "TensorZero: DOWN"
# Verify NATS is reachable (for episode event publishing)
curl -sf http://localhost:8222/healthz && echo "NATS: OK" || echo "NATS: unreachable"
# Check Ollama fallback available
ollama list | grep qwen3 || echo "Ollama qwen3 models not found"
Run — Single Episode
The current field runner (pmoves/tools/agentgym_field_runner.py) runs ONE episode per invocation; batch by looping in shell. CLI accepts --run <env>, --config <profile>, --dry-run, --list-envs.
# Validate config + env connectivity without running an episode
python pmoves/tools/agentgym_field_runner.py --dry-run --run BabyAI
# BabyAI (fastest, < 1GB RAM, no external deps)
python pmoves/tools/agentgym_field_runner.py --run BabyAI
# Wordle (good for language model eval)
python pmoves/tools/agentgym_field_runner.py --run Wordle
# ALFWorld (embodied task planning, ~2GB RAM)
python pmoves/tools/agentgym_field_runner.py --run ALFWorld
# List configured environments with health status
python pmoves/tools/agentgym_field_runner.py --list-envs
Run — Full Lightweight Battery
Until a make agentgym-run-lightweight wrapper lands, batch via shell:
for env in BabyAI TextCraft Maze Wordle; do
python pmoves/tools/agentgym_field_runner.py --run "$env"
done
TODO (follow-up PR): Wrap as
make -C pmoves agentgym-run ENV=X+make -C pmoves agentgym-run-lightweightMake targets so this skill routes through the canonicalwith-env.shchain. Also: add a--repeat Nflag to the runner for episode batching without shell looping.
NATS Events Published
| Subject | When | Payload |
|---|---|---|
agentgym.episode.completed.v1 |
After each episode | episode_id, env, reward, rounds, duration_s |
agentgym.eval.batch.completed.v1 |
After full batch | batch_id, env, n_episodes, mean_reward |
agentgym.field.status.v1 |
On runner start/stop | node, config, status |
Environment Reference
| Name | Category | RAM | External Deps | Port |
|---|---|---|---|---|
| BabyAI | lightweight | < 1GB | None | 36001 |
| TextCraft | lightweight | < 1GB | None | 36002 |
| Maze | lightweight | < 1GB | None | 36003 |
| Wordle | lightweight | < 1GB | None | 36004 |
| ALFWorld | moderate | 1-4GB | ALFWorld data | 36005 |
| SQLGym | moderate | 1-4GB | SQLite | 36006 |
| WebShop | moderate | 1-4GB | Web scraper | 36007 |
Skipped on 4090: WebArena (needs full browser), SciWorld (needs Java)
Notes
- Concurrency is 1 — episodes run sequentially (16GB VRAM budget)
- Model:
qwen35_9b(registered in tensorzero.toml) withqwen3:8bOllama fallback - Max episode: 300s / 15 rounds — hard limits in field-runner-4090.yaml
- Capture results for AGNOTE4482 handoff: episode JSON written to
pmoves/data/agentgym/runs/(amake -C pmoves agentgym-resultssummary target is a pending follow-up) - See:
pmoves/configs/agentgym/field-runner-4090.yamlfor full config - See:
pmoves/services/agentgym-rl-coordinator/for the runner service