name: vast-gpu description: "Cloud GPU access via Vast.ai for marker-pdf, embeddings, and ML workloads. Convenience scripts for instance management (start, stop, status, SSH) and PDF-to-markdown conversion. Use when the user says 'run on gpu', 'use gpu', 'marker-pdf', 'convert pdf', 'gpu status', 'start gpu', 'stop gpu', 'vast', or needs GPU acceleration from this CPU-only VPS."
Vast.ai Cloud GPU
On-demand cloud GPU from Vast.ai marketplace. Instances are ephemeral — created when needed, destroyed after use. No persistent instances, no idle storage costs.
gpu-marker is fully self-contained: if no GPU is running, it searches for one, creates an instance, installs marker-pdf, processes the PDF, and returns results. For batch work (multiple PDFs), call gpu-start first to avoid repeated setup, then gpu-stop when done.
Quick Reference
# Self-contained PDF conversion (handles everything automatically)
gpu-marker paper.pdf # Find/create GPU, install marker, convert, return results
gpu-marker paper.pdf ./output # Specify output directory
# Explicit instance management (for batch work)
gpu-start # Search marketplace, create instance, install marker
gpu-stop # DESTROY instance (default — no ongoing cost)
gpu-stop --pause # Stop only (preserves disk, pays storage)
gpu-status # Show current instance status
gpu-ssh # Interactive SSH session
gpu-ssh 'nvidia-smi' # Run single command
gpu-run 'pip install something' # Run arbitrary commands
All scripts live at ~/.agent/skills/vast-gpu/scripts/ and are symlinked to ~/.local/bin/.
Architecture
VPS (CPU-only) Vast.ai (GPU)
+-----------------+ SSH +----------------------+
| ubuntu@vps | --------> | root@sshN.vast.ai |
| gpu-marker | auto | RTX 3090+ (24 GB) |
| gpu-start/stop | resolved | PyTorch + CUDA |
| vastai CLI | | marker-pdf |
+-----------------+ | surya-ocr |
+----------------------+
Instance ID, SSH host, and port are resolved dynamically by _resolve-instance.sh.
Instance Management
Start
gpu-start
# Starts instance, waits for SSH (up to 3 min), prints status
Cold start takes 30-60 seconds. SSH becomes available 10-30 seconds after that.
Stop (destroy by default)
gpu-stop # Destroy instance — no ongoing cost
gpu-stop --pause # Stop only — preserves disk, pays ~$0.03/GB/mo storage
Default is destroy. Instances are ephemeral — gpu-start creates a fresh one in ~1-2 minutes. Use --pause only during multi-session batch work where you want to preserve cached models between runs.
Status
gpu-status
# Shows: status, GPU, uptime, hourly cost, SSH command
SSH
gpu-ssh # Interactive shell
gpu-ssh 'nvidia-smi' # Single command
gpu-ssh 'pip list | grep torch' # Pipe-friendly
Running marker-pdf
Convert any PDF to clean markdown with GPU-accelerated OCR:
gpu-marker document.pdf # Output to ./document/
gpu-marker document.pdf ./my-output # Custom output dir
What happens:
- Uploads PDF to GPU box via SCP
- Runs
marker_singlewith GPU acceleration - Downloads markdown + extracted images back to VPS
- Prints output location
First run after instance restart downloads OCR models (~2 GB, takes ~8 min). Subsequent runs process at ~2-3 pages/second.
Batch conversion
for pdf in *.pdf; do
gpu-marker "$pdf" ./converted/
done
Output structure
output-dir/
├── document.md # Markdown with inline image refs
├── document_meta.json # Metadata (page count, timing)
├── _page_0_Picture_1.jpeg # Extracted figures
└── _page_5_Figure_1.jpeg
Installing Additional Tools
SSH in and install anything:
gpu-ssh 'pip install sentence-transformers'
gpu-ssh 'pip install vllm'
gpu-ssh 'apt-get update && apt-get install -y ffmpeg'
Installations persist as long as the instance is not destroyed (only stopped). Stopping preserves disk; destroying deletes everything.
Running Embeddings
Example: run a HuggingFace TEI embedding server on the GPU box:
gpu-ssh 'pip install text-embeddings-inference'
# Or use the Docker image:
gpu-ssh 'docker run --gpus all -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 --model-id BAAI/bge-base-en-v1.5'
Then from VPS, forward the port:
ssh -L 8080:localhost:8080 -p 15400 root@ssh3.vast.ai
# Now localhost:8080 serves embeddings
Cost Management
| State | Cost | What persists |
|---|---|---|
| Running | ~$0.11-0.20/hr | Everything |
Stopped (--pause) |
~$0.03/GB/mo (50 GB = ~$1.50/mo) | Disk, installed packages, cached models |
| Destroyed (default) | $0 | Nothing — gpu-start creates fresh next time |
Rules:
- Default to destroy after every use (
gpu-stop) gpu-markeris self-contained — just call it, it handles the rest- For batch work:
gpu-start→ multiplegpu-markercalls →gpu-stop
Check spending
vastai show invoices
Changing Instance / Instance Gone
Scripts auto-resolve the best instance — no hardcoded IDs to update. If your instance was destroyed or you want a different GPU:
# Search for alternatives
vastai search offers 'gpu_name=RTX_3090 num_gpus=1 reliability>0.95 dph<0.20' -o 'dph' --limit 10
# Create new instance (scripts will auto-detect it)
vastai create instance <OFFER_ID> --image pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel --disk 50 --ssh --direct
# Verify scripts pick it up
gpu-status
No need to edit any scripts — _resolve-instance.sh finds the newest running or exited instance automatically.
After creating a new instance, re-install marker-pdf:
gpu-ssh 'pip install marker-pdf && pip install --upgrade torchvision'
Troubleshooting
"Connection refused" on SSH — Instance is stopped. Run gpu-start.
"Model downloading" on first marker-pdf run — Normal. Surya OCR models (~2 GB) download on first use after instance creation. Takes ~8 min. Cached afterward.
torchvision errors — If marker-pdf fails with torchvision::nms error, the base Docker image's torchvision is outdated:
gpu-ssh 'pip install --upgrade torchvision'
Instance destroyed / need to recreate — Scripts will report "No instances found." Search for a new offer, create one, and scripts auto-detect it:
vastai search offers 'gpu_name=RTX_3090 num_gpus=1 reliability>0.95 dph<0.20' -o 'dph' --limit 5
vastai create instance <OFFER_ID> --image pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel --disk 50 --ssh --direct
gpu-status # Should now find the new instance
gpu-ssh 'pip install marker-pdf && pip install --upgrade torchvision'
Slow network transfer — Use rsync with compression for large files:
rsync -avzP -e 'ssh -p 15400' large-file.pdf root@ssh3.vast.ai:/tmp/