skyvern - SKILL.md Agent Skill

name: skyvern description: > AI-powered browser automation using Skyvern (self-hosted only). Automate web workflows, fill forms, extract data, and navigate multi-step processes using Vision LLMs and Playwright. Skyvern uses a Planner→Agent→Validator multi-agent architecture with computer vision to interact with web pages semantically — no brittle CSS/XPath selectors. All LLM inference routes through LiteLLM; use OpenRouter, Ollama, any OpenAI-compatible endpoint, or your own proxy (ZenMux, Z.AI, Nebius, etc.). Never use the Skyvern cloud API. Always self-host. version: 2.0.0

Skyvern — AI Browser Automation (Self-Hosted Expert Reference)

Policy: We NEVER use Skyvern's hosted cloud API (api.skyvern.com). All deployments are self-hosted, routing LLM calls through our own providers: OpenRouter, ZenMux, Z.AI, Nebius Token Factory, Ollama, or any OpenAI-compatible endpoint.

Instructions

Step 1: Fetch Latest Docs (Always)

Before implementing, use web tools to pull current state from these canonical sources:

Source	URL	Purpose
llms.txt	`https://skyvern.com/docs/llms.txt`	Full documentation index — start here
GitHub	`https://github.com/Skyvern-AI/skyvern`	Source code, `.env.example`, `config_registry.py`
Docs (new)	`https://docs-new.skyvern.com`	Developer docs, self-hosted guides
Docs (legacy)	`https://docs.skyvern.com`	Cloud UI docs, workflow block reference
PyPI	`https://pypi.org/project/skyvern`	Latest version (currently v1.0.32+)

Step 2: Architecture Deep Dive

2.1 How Skyvern Works

Skyvern is inspired by BabyAGI/AutoGPT but adds real browser control via Playwright. It uses a swarm of agents to comprehend a website, plan actions, and execute them:

┌─────────────────────────────────────────────────────┐
│                  Skyvern 2.0 Engine                 │
│                                                     │
│  ┌──────────┐   ┌──────────┐   ┌──────────────┐    │
│  │ Planner  │──→│  Agent   │──→│  Validator   │    │
│  │  Agent   │   │ (Actor)  │   │   Agent      │    │
│  └──────────┘   └──────────┘   └──────────────┘    │
│       │              │               │              │
│       └──────────────┴───────────────┘              │
│                      │                              │
│              ┌───────▼───────┐                      │
│              │   Playwright  │                      │
│              │   (Browser)   │                      │
│              └───────────────┘                      │
│                      │                              │
│              ┌───────▼───────┐                      │
│              │   LiteLLM     │ ◄── All LLM calls    │
│              │   (Gateway)   │     route through     │
│              └───────────────┘     this layer        │
└─────────────────────────────────────────────────────┘

Key advantages:

Zero-shot — operates on websites never seen before
Layout-resistant — no XPath/CSS selectors; vision-based element discovery
Multi-site reusable — one workflow runs across different websites
Computer Vision — screenshots → LLM reasoning → action plan → Playwright execution

2.2 Core Concepts

Concept	Description
Task	Single automation job: `prompt` (required) + `url` (optional) + optional `data_extraction_schema`
Workflow	Multi-step pipeline of chained blocks, built visually or via API
Block	Atomic unit inside a workflow (see Block Taxonomy below)
Run	Single execution of a task or workflow, with status tracking
Browser Session	Persistent Chromium instance; cookies/auth state survive across pages
Browser Profile	Saved browser state (cookies, storage) reusable across runs
Credential	Stored passwords, credit cards, TOTP seeds (via Bitwarden/1Password/custom)
Artifact	Output from a run: screenshots, recordings, downloaded files, logs
Engine	Skyvern 1.0 (legacy) vs 2.0 (Planner→Agent→Validator, recommended)

2.3 Workflow Block Taxonomy

Category	Blocks
Browser Automation	Browser Task, Browser Action, Extraction, Login, Go to URL, Print Page
Data & Extraction	Text Prompt (LLM-only, no browser), File Parser (PDF/CSV/Excel)
Control Flow	Loop (for-each), Conditional (if/else), AI Validation, Code (Python/Playwright), Wait
Files	File Download, Cloud Storage Upload
Communication	Send Email, HTTP Request, Human Interaction (pause for manual intervention)

Browser Task (recommended block): accepts natural-language prompt, autonomously navigates.

Skyvern 2.0 fields: URL, Prompt, Max Steps, Disable Cache
Skyvern 1.0 fields: URL, Prompt + Advanced (completion condition, action history, download triggers)

Browser Action: single granular action (click, type, select, upload) — no goal-seeking.

Extraction: turns current page content into structured JSON without navigation.

Step 3: Self-Hosted Deployment

3.1 Option A: pip install (Recommended for dev)

pip install skyvern  # or: uv pip install skyvern
skyvern quickstart   # interactive wizard: DB, LLM provider, browser mode, MCP

The quickstart wizard will:

Set up database (SQLite default since v1.0.31+, or PostgreSQL)
Configure your LLM provider via .env
Choose browser mode (headless, headful, connect to existing Chrome)
Generate local API credentials (SKYVERN_API_KEY)
Optionally configure MCP for Claude Code/Desktop/Cursor/Windsurf
Download Chromium browser

3.2 Option B: Docker Compose

git clone https://github.com/Skyvern-AI/skyvern.git && cd skyvern
cp .env.example .env
# Edit .env — configure LLM provider (see Step 4)
docker compose up -d
# UI at http://localhost:8080

3.3 Option C: Kubernetes

Helm charts and manifests in kubernetes-deployment/ directory. See docs for scaling config.

Step 4: LLM Configuration (BYOM — Bring Your Own Model)

Critical: All LLM routing goes through LiteLLM internally. The LLMConfigRegistry (at skyvern/forge/sdk/api/llm/config_registry.py, ~1946 lines) registers model configs at startup based on ENABLE_* environment flags.

4.1 Supported Provider Matrix

Provider	Enable Flag	Key Env Vars	`LLM_KEY` Value
OpenRouter ⭐	`ENABLE_OPENROUTER=true`	`OPENROUTER_API_KEY`, `OPENROUTER_MODEL`	`OPENROUTER`
OpenAI-Compatible ⭐	`ENABLE_OPENAI_COMPATIBLE=true`	`OPENAI_COMPATIBLE_API_KEY`, `OPENAI_COMPATIBLE_API_BASE`, `OPENAI_COMPATIBLE_MODEL_NAME`	`OPENAI_COMPATIBLE` (or custom via `OPENAI_COMPATIBLE_MODEL_KEY`)
Ollama	`ENABLE_OLLAMA=true`	`OLLAMA_SERVER_URL`, `OLLAMA_MODEL`	`OLLAMA`
Anthropic	`ENABLE_ANTHROPIC=true`	`ANTHROPIC_API_KEY`	`ANTHROPIC_CLAUDE4.6_SONNET`, etc.
OpenAI	`ENABLE_OPENAI=true`	`OPENAI_API_KEY`	`OPENAI_GPT5`, `OPENAI_GPT4_1`, etc.
Gemini	`ENABLE_GEMINI=true`	`GEMINI_API_KEY`	`GEMINI_3_1_PRO`, `GEMINI_3_FLASH`, etc.
Azure OpenAI	`ENABLE_AZURE=true`	`AZURE_API_KEY`, `AZURE_DEPLOYMENT`, `AZURE_API_BASE`, `AZURE_API_VERSION`	`AZURE_OPENAI`
AWS Bedrock	`ENABLE_BEDROCK=true`	AWS credentials	`BEDROCK_ANTHROPIC_CLAUDE4.5_SONNET_INFERENCE_PROFILE`
Groq	`ENABLE_GROQ=true`	`GROQ_API_KEY`, `GROQ_MODEL`	`GROQ`
Moonshot	`ENABLE_MOONSHOT=true`	`MOONSHOT_API_KEY`	`MOONSHOT_KIMI_K2`
Inception	`ENABLE_INCEPTION=true`	`INCEPTION_API_KEY`	`INCEPTION_MERCURY_2`
Volcengine	`ENABLE_VOLCENGINE=true`	`VOLCENGINE_API_KEY`	—

4.2 OpenRouter Configuration (Our Primary Path)

# .env
ENABLE_OPENROUTER=true
LLM_KEY=OPENROUTER
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=minimax/minimax-m2.5  # Any model on OpenRouter
# OPENROUTER_API_BASE=https://openrouter.ai/api/v1  # default

Under the hood — the config_registry.py registers it as:

LLMConfigRegistry.register_config(
    "OPENROUTER",
    LLMConfig(
        "openrouter/{model_name}",
        ["OPENROUTER_API_KEY", "OPENROUTER_MODEL"],
        supports_vision=settings.LLM_CONFIG_SUPPORT_VISION,
        max_completion_tokens=settings.LLM_CONFIG_MAX_TOKENS,
        litellm_params=LiteLLMParams(
            api_key=settings.OPENROUTER_API_KEY,
            api_base=settings.OPENROUTER_API_BASE,
            model_info={"model_name": f"openrouter/{model_name}"},
        ),
    ),
)

Dynamic OpenRouter model resolution: If LLM_KEY starts with openrouter/, the registry creates a config on-the-fly without needing explicit registration.

4.3 OpenAI-Compatible Endpoint (ZenMux, Z.AI, Nebius, LM Studio, vLLM, etc.)

# .env
ENABLE_OPENAI_COMPATIBLE=true
LLM_KEY=OPENAI_COMPATIBLE

# Required
OPENAI_COMPATIBLE_API_KEY=your-key-here
OPENAI_COMPATIBLE_API_BASE=https://your-proxy.example.com/v1
OPENAI_COMPATIBLE_MODEL_NAME=your-model-name

# Optional
OPENAI_COMPATIBLE_MODEL_KEY=OPENAI_COMPATIBLE    # Custom registry key
OPENAI_COMPATIBLE_SUPPORTS_VISION=true            # Enable vision support
OPENAI_COMPATIBLE_ADD_ASSISTANT_PREFIX=false
OPENAI_COMPATIBLE_MAX_TOKENS=128000
OPENAI_COMPATIBLE_TEMPERATURE=0.7
OPENAI_COMPATIBLE_REASONING_EFFORT=medium
OPENAI_COMPATIBLE_API_VERSION=                    # If needed

Under the hood — registers with openai/ prefix for LiteLLM routing:

LLMConfig(
    f"openai/{openai_compatible_model_name}",  # LiteLLM requires this prefix
    required_env_vars,
    supports_vision=settings.OPENAI_COMPATIBLE_SUPPORTS_VISION,
    litellm_params=LiteLLMParams(
        api_key=settings.OPENAI_COMPATIBLE_API_KEY,
        api_base=settings.OPENAI_COMPATIBLE_API_BASE,
        model_info={"model_name": f"openai/{openai_compatible_model_name}"},
    ),
)

4.4 Ollama (Fully Local)

# .env
ENABLE_OLLAMA=true
LLM_KEY=OLLAMA
OLLAMA_SERVER_URL=http://localhost:11434
OLLAMA_MODEL=llava:latest
OLLAMA_SUPPORTS_VISION=true

4.5 Multi-Model Setup

Skyvern supports LLM_KEY for the primary model and SECONDARY_LLM_KEY for mini agents:

LLM_KEY=OPENROUTER                         # Primary (Planner + Actor)
SECONDARY_LLM_KEY=OLLAMA                   # Secondary (mini validation tasks)
LLM_CONFIG_MAX_TOKENS=128000               # Global max tokens override

4.6 General LLM Tuning Variables

Variable	Type	Description	Default
`LLM_KEY`	string	Primary model registry key	—
`SECONDARY_LLM_KEY`	string	Mini agent model	—
`LLM_CONFIG_MAX_TOKENS`	int	Max tokens override	`128000`
`LLM_CONFIG_TEMPERATURE`	float	Temperature	—
`LLM_CONFIG_SUPPORT_VISION`	bool	Vision support flag	—
`LLM_CONFIG_ADD_ASSISTANT_PREFIX`	bool	Add assistant prefix to messages	—

Step 5: SDK Usage (Playwright + AI Hybrid)

Skyvern's SDK is a Playwright extension that adds AI-powered commands.

5.1 Python SDK — Three Interaction Modes

from skyvern import Skyvern

# Self-hosted local mode (our standard)
skyvern = Skyvern.local()
# Or explicit:
skyvern = Skyvern(
    base_url="http://localhost:8000",
    api_key="YOUR_LOCAL_SKYVERN_API_KEY"
)

browser = await skyvern.launch_cloud_browser()
page = await browser.get_working_page()

# Mode 1: Traditional Playwright (CSS/XPath selectors)
await page.goto("https://example.com")
await page.click("#submit-button")

# Mode 2: AI-powered (natural language)
await page.click(prompt="Click the green Submit button")

# Mode 3: AI fallback (tries selector first, AI if it fails)
await page.click("#submit-btn", prompt="Click the Submit button")

5.2 Core AI Page Commands

Command	Description
`page.act(prompt)`	Perform multi-step actions via natural language
`page.extract(prompt, schema)`	Extract structured data with optional JSON schema
`page.validate(prompt)`	Validate page state, returns `bool`
`page.prompt(prompt, schema)`	Send arbitrary prompts to the LLM

5.3 Agent-Level Commands

Command	Description
`page.agent.run_task(prompt)`	Execute complex multi-step tasks
`page.agent.login(cred_type, cred_id)`	Authenticate with stored credentials
`page.agent.download_files(prompt)`	Navigate and download files
`page.agent.run_workflow(workflow_id)`	Execute pre-built workflows

5.4 Simple Task Execution

from skyvern import Skyvern

skyvern = Skyvern()
task = await skyvern.run_task(
    prompt="Find the top post on hackernews today",
    data_extraction_schema={
        "type": "object",
        "properties": {
            "title": {"type": "string"},
            "url": {"type": "string"},
            "points": {"type": "integer"}
        }
    }
)
print(task)

5.5 TypeScript SDK

import { Skyvern } from "@skyvern/client";

const skyvern = new Skyvern({
    baseUrl: "http://localhost:8000",
    apiKey: "YOUR_LOCAL_KEY"
});
const browser = await skyvern.launchCloudBrowser();
const page = await browser.getWorkingPage();

await page.goto("https://example.com");
await page.agent.runTask("Complete checkout with: John Snow, 12345");
await browser.close();

5.6 REST API (Self-Hosted)

# Create a task
curl -X POST "http://localhost:8000/api/v1/tasks" \
  -H "Authorization: Bearer $SKYVERN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "navigation_goal": "Find the pricing page and extract all plan details",
    "data_extraction_schema": {
      "type": "object",
      "properties": {
        "plans": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "price": {"type": "string"}
            }
          }
        }
      }
    }
  }'

# Get task result
curl "http://localhost:8000/api/v1/tasks/{task_id}" \
  -H "Authorization: Bearer $SKYVERN_API_KEY"

5.7 Task Parameters Reference

Parameter	Type	Description
`prompt`	string (required)	Natural language instructions
`url`	string	Starting URL
`engine`	string	`skyvern_2.0` (default) or `skyvern_1.0`
`data_extraction_schema`	object	JSON schema for structured output
`max_steps`	int	Maximum AI steps (default: 50)
`proxy_location`	string	Proxy/geolocation
`browser_session_id`	string	Reuse existing browser session
`totp_identifier` / `totp_url`	string	2FA configuration
`error_code_mapping`	object	Custom error codes to halt execution
`webhook_url`	string	Callback URL for results
`run_with`	string	`agent` or `code`
`model`	string	Override LLM model for this task
`browser_address`	string	CDP address for local browser

5.8 Run Status Values

queued → running → completed | failed | terminated | timed_out | canceled

Step 6: MCP Server

Skyvern exposes an MCP server for integration with Claude Code, Claude Desktop, Cursor, Windsurf, Codex, Hermes, and OpenClaw.

6.1 Configuration

{
  "mcpServers": {
    "Skyvern": {
      "command": "/path/to/python3",
      "args": ["-m", "skyvern", "run", "mcp"],
      "env": {
        "SKYVERN_BASE_URL": "http://localhost:8000",
        "SKYVERN_API_KEY": "YOUR_LOCAL_KEY"
      }
    }
  }
}

Client	Config Path
Claude Desktop (macOS)	`~/Library/Application Support/Claude/claude_desktop_config.json`
Claude Code (project)	`.mcp.json` in project root
Claude Code (global)	`~/.claude.json`
Codex (global)	`~/.codex/config.toml`
Cursor	`~/.cursor/mcp.json`
Windsurf	`~/.codeium/windsurf/mcp_config.json`

6.2 Quickstart MCP auto-setup

During skyvern quickstart, if you choose Claude Code, it will:

Write .mcp.json in project root
Pin the MCP command to the active Python interpreter
Install bundled skills into .claude/skills/

Step 7: Browser Configuration

# .env options
BROWSER_TYPE=chromium-headful          # or: chromium-headless, cdp-connect
BROWSER_STREAMING_MODE=vnc             # or: cdp
BROWSER_ACTION_TIMEOUT_MS=5000
MAX_STEPS_PER_RUN=50
VIDEO_PATH=./videos
MAX_SCRAPING_RETRIES=0

# Connect to existing Chrome (local dev)
BROWSER_TYPE=cdp-connect
BROWSER_REMOTE_DEBUGGING_URL=http://127.0.0.1:9222

Step 8: Credential Management

Skyvern supports stored credentials for automated login:

Type	Support
Bitwarden / Vaultwarden	Built-in integration
1Password	Via `OP_SERVICE_ACCOUNT_TOKEN`
Custom HTTP API	`CREDENTIAL_VAULT_TYPE=custom`, `CUSTOM_CREDENTIAL_API_BASE_URL`, `CUSTOM_CREDENTIAL_API_TOKEN`
TOTP / 2FA	QR-based, email-based, SMS-based

Step 9: CLI Commands

skyvern quickstart        # Interactive setup wizard
skyvern init              # Setup-only (no service start)
skyvern run all           # Start server + UI
skyvern run server        # Start API server only
skyvern run ui            # Start frontend only
skyvern status            # Check service status
skyvern stop all          # Stop everything
skyvern init browser      # Setup Chrome remote debugging
skyvern browser serve --tunnel  # Start Chrome + tunnel

Guardrail

Do NOT install Skyvern inside the ainish-coder codebase. This codebase is strictly a tool orchestrator. This skill is meant for target repositories where the agent is deployed.

Never use api.skyvern.com or any Skyvern cloud API key. Always self-host with your own LLM provider credentials.

Troubleshooting

1. LLM Configuration Issues

Issue	Cause	Fix
`LLM Provider NOT provided. You passed model=OPENAI_COMPATIBLE`	Docker image too old or model not registered	Pull latest image: `docker pull skyvern/skyvern:latest`. Ensure `ENABLE_OPENAI_COMPATIBLE=true` is set
`Using general model configuration for unknown LLM key`	`LLM_KEY` value doesn't match a registered config	Check `config_registry.py` for valid keys. For OpenAI-compatible, the `openai/` prefix is added automatically
OpenRouter model not found	Model slug doesn't match	Verify model exists at `openrouter.ai/models`. Use format: `vendor/model-name`
Vision not working	Provider doesn't support vision	Set `LLM_CONFIG_SUPPORT_VISION=true` or `OPENAI_COMPATIBLE_SUPPORTS_VISION=true`
Timeout during inference	Model too slow or rate-limited	Increase `MAX_STEPS_PER_RUN`, use faster model, check rate limits

2. Docker Issues

Issue	Fix
`table organizations already exists`	`rm ~/.skyvern/data.db` + upgrade to v1.0.32+
Port conflict	Check `docker ps` for conflicting Postgres. Change `PORT=8000` in `.env`
`.env` not loaded in compose	Ensure `env_file:` directive in `docker-compose.yml`

3. SDK / API Issues

Issue	Fix
`pip install skyvern` fails (ResolutionImpossible)	Use `uv pip install skyvern` or upgrade to v1.0.32+
SDK import error	Activate correct `.venv`; Python 3.11/3.12 only (not 3.13)
Task returns empty extraction	Verify `data_extraction_schema` matches page structure
Browser automation hangs	Check `BROWSER_ACTION_TIMEOUT_MS`; break complex tasks into smaller steps

4. Diagnostic Steps

Verify LLM connectivity: test your provider endpoint directly with curl
Check Skyvern logs: docker compose logs skyvern or server stdout
Verify .env is complete: compare against .env.example in the repo
Fetch latest docs: https://skyvern.com/docs/llms.txt
Search codebase: Grep("skyvern") to find deployment/config issues

Examples

"Automate filling out this insurance form using AI."
"Extract all product listings from this e-commerce site."
"Build a workflow to log into a portal, download a report, and email it."
"Set up Skyvern to monitor a competitor's pricing page daily."
"Replace our Selenium tests with Skyvern for more resilient automation."
"Create a workflow that navigates a multi-page signup process."
"Download all invoices newer than January 1st from a vendor portal."

Guidelines

Always self-host — never reference api.skyvern.com or cloud API keys
Prefer OpenRouter or OpenAI-compatible for maximum model flexibility
Use Skyvern 2.0 engine (Planner→Agent→Validator) for all new tasks
Use the Python SDK for programmatic integration; REST API for non-Python
Design workflows for reusability — parameterize inputs via workflow parameters
Provide clear, specific navigation goals — vague prompts = unreliable automation
Handle failures gracefully — check status, implement retries for transient errors
Use data_extraction_schema to structure output for downstream processing
Always fetch llms.txt before implementing to verify current API surface
When troubleshooting, check config_registry.py source for valid LLM_KEY values

Requirements

Self-Hosted (pip)

pip install skyvern  # or: uv pip install skyvern

Python 3.11.x or 3.12.x (NOT 3.13)
NodeJS & NPM (for UI)
LLM provider credentials (OpenRouter, Ollama, or any OpenAI-compatible endpoint)

Self-Hosted (Docker)

Docker and Docker Compose
LLM provider credentials in .env
See https://github.com/Skyvern-AI/skyvern for full setup

Resources

Resource	URL
llms.txt	https://skyvern.com/docs/llms.txt
GitHub	https://github.com/Skyvern-AI/skyvern
Docs (new)	https://docs-new.skyvern.com
Docs (legacy)	https://docs.skyvern.com
API Reference	https://docs.skyvern.com/api-reference
PyPI	https://pypi.org/project/skyvern
MCP Docs	https://www.skyvern.com/docs/getting-started/mcp
LLM Config	https://www.skyvern.com/docs/self-hosted/llm-configuration
OpenRouter Blog	https://www.skyvern.com/blog/surprise-launch-day-2-openrouter-support-is-live-in-skyvern/
config_registry.py	https://github.com/Skyvern-AI/skyvern/blob/main/skyvern/forge/sdk/api/llm/config_registry.py
.env.example	https://github.com/Skyvern-AI/skyvern/blob/main/.env.example