name: colab-remote description: > Orchestrate Google Colab Pro/Pro+ GPU instances as remote training backends via SSH. Compounds /agent-browser (to launch Colab sessions and install colab-ssh) with SSH (to operate the runtime remotely). Use when: (1) launching a Colab notebook for GPU training, (2) running training jobs on Colab from the local terminal, (3) transferring datasets/checkpoints to/from Colab, (4) monitoring GPU utilization on a Colab instance, (5) integrating Colab GPU compute with /autoany EGRI optimization loops, (6) reconnecting to a Colab session after timeout. Triggers on: "colab", "colab-remote", "colab ssh", "colab training", "remote GPU", "colab pro", "train on colab", "google colab".
Colab Remote — SSH-Operated GPU Training
Operate Google Colab Pro/Pro+ instances as headless GPU backends from the local terminal.
Architecture
Local Mac (Claude Code)
├── agent-browser → Chrome → colab.research.google.com
│ └── Opens notebook, runs colab-ssh setup cell
├── SSH tunnel → Colab runtime (via ngrok or cloudflared)
│ └── Run training, monitor GPU, transfer files
└── /autoany EGRI loop (local)
└── Proposes mutations → SSH executes on Colab → evaluates results
Phase 1: Launch Colab Session (Browser Automation)
Use /agent-browser to open Colab and set up SSH access.
Step 1: Open Colab and create notebook
agent-browser open "https://colab.research.google.com/#create=true"
agent-browser wait --load networkidle
agent-browser snapshot -i
If login is required, prompt the user to authenticate manually, then re-snapshot.
Step 2: Select GPU runtime
Navigate Runtime > Change runtime type, select GPU (T4/V100/A100 depending on plan), and save.
Step 3: Install colab-ssh and get connection details
Type the SSH setup code into a cell. Two methods supported:
Method A: ngrok (recommended)
!pip install colab-ssh --upgrade
from colab_ssh import launch_ssh
launch_ssh("YOUR_NGROK_TOKEN")
User must provide ngrok authtoken from https://ngrok.com.
Method B: cloudflared (no account needed)
!pip install colab-ssh --upgrade
from colab_ssh import launch_ssh_cloudflared
launch_ssh_cloudflared(password="your-password-here")
Step 4: Extract and save connection details
After the cell runs, snapshot output to extract hostname/port. Save for reuse:
mkdir -p ~/.colab-remote
cat > ~/.colab-remote/session.env << 'EOF'
COLAB_HOST=0.tcp.ngrok.io
COLAB_PORT=12345
COLAB_USER=root
COLAB_METHOD=ngrok
EOF
Load in subsequent commands: source ~/.colab-remote/session.env
Phase 2: SSH Operations
Connect
# ngrok
ssh -o StrictHostKeyChecking=no -p $COLAB_PORT root@$COLAB_HOST
# cloudflared
ssh -o StrictHostKeyChecking=no -o ProxyCommand="cloudflared access ssh --hostname %h" root@$COLAB_HOST
Verify GPU
ssh -p $COLAB_PORT root@$COLAB_HOST "nvidia-smi"
Transfer files
# Upload
scp -P $COLAB_PORT -r ./data root@$COLAB_HOST:/content/data
# Download
scp -P $COLAB_PORT -r root@$COLAB_HOST:/content/checkpoints ./checkpoints
Run training
# Foreground
ssh -p $COLAB_PORT root@$COLAB_HOST "cd /content && python train.py --epochs 10"
# Background (survives SSH disconnect)
ssh -p $COLAB_PORT root@$COLAB_HOST "cd /content && nohup python train.py > train.log 2>&1 &"
# Monitor
ssh -p $COLAB_PORT root@$COLAB_HOST "tail -f /content/train.log"
Monitor GPU
ssh -p $COLAB_PORT root@$COLAB_HOST "nvidia-smi --query-gpu=utilization.gpu,utilization.memory,memory.used,memory.total,temperature.gpu --format=csv"
Install dependencies
ssh -p $COLAB_PORT root@$COLAB_HOST "pip install torch transformers peft bitsandbytes accelerate datasets"
Phase 3: EGRI Integration (/autoany)
Wire Colab as the execution backend for an EGRI optimization loop. See references/egri-colab.md for the full problem-spec template and harness patterns.
Execution loop (summary)
for each trial:
1. Upload mutated artifact → scp to Colab
2. Execute on Colab GPU → ssh python train.py
3. Evaluate results → ssh python evaluate.py
4. Download metrics → scp results.json
5. Score locally (immutable evaluator)
6. Promote or discard based on policy
Phase 4: Session Lifecycle
| Tier | Max runtime | Idle timeout | GPU |
|---|---|---|---|
| Free | 12h | 90min | T4, limited |
| Pro | 24h | 90min | T4, V100, priority |
| Pro+ | 24h | 90min | T4, V100, A100 |
Keep-alive
ssh -p $COLAB_PORT root@$COLAB_HOST "while true; do sleep 300; echo keepalive; done &"
Reconnect after timeout
- Check:
ssh -p $COLAB_PORT root@$COLAB_HOST "echo ok" 2>/dev/null && echo "UP" || echo "DOWN" - If dead, re-launch via Phase 1 (browser automation)
- Resume from last checkpoint
Google Drive persistence
Mount Drive to persist across sessions:
ssh -p $COLAB_PORT root@$COLAB_HOST "python -c 'from google.colab import drive; drive.mount(\"/content/drive\")'"
# Checkpoints survive in /content/drive/MyDrive/
Quick Reference
| Task | Command |
|---|---|
| Check GPU | ssh -p $COLAB_PORT root@$COLAB_HOST "nvidia-smi" |
| Upload | scp -P $COLAB_PORT ./file root@$COLAB_HOST:/content/ |
| Download | scp -P $COLAB_PORT root@$COLAB_HOST:/content/file ./ |
| Run script | ssh -p $COLAB_PORT root@$COLAB_HOST "python /content/script.py" |
| Background job | ssh -p $COLAB_PORT root@$COLAB_HOST "nohup python train.py > log 2>&1 &" |
| Tail log | ssh -p $COLAB_PORT root@$COLAB_HOST "tail -20 /content/log" |
| Disk space | ssh -p $COLAB_PORT root@$COLAB_HOST "df -h /content" |
| Kill job | ssh -p $COLAB_PORT root@$COLAB_HOST "pkill -f train.py" |
| Session alive? | ssh -p $COLAB_PORT root@$COLAB_HOST "echo ok" 2>/dev/null |
Prerequisites
- ngrok account (free): https://ngrok.com — or
cloudflared:brew install cloudflared - Colab Pro/Pro+ for GPU priority and longer runtimes
- agent-browser installed and working
- Google account signed into Chrome