scrapling

name: "scrapling" description: "CLI-first web scraping & content extraction with optional MCP server. Use when you have target URLs and need clean, selector-based outputs (html/md/txt)."

Scrapling Skill (VCO)

Scrapling is a Python-based web scraping / extraction toolkit that exposes:

a CLI (scrapling ...) for fetching + extracting content into files
an optional MCP server (scrapling mcp) so an agent can call structured scraping tools

This skill is CLI-first. Prefer it when you already have URLs and need reliable, repeatable extraction (CSS selector → file).

When to use

Use scrapling when you need:

Extract specific parts of a web page (CSS selector / XPath) into .txt / .md / .html
Run repeatable scraping jobs (batch URLs with a small wrapper script)
Reduce token usage by extracting only the relevant DOM region before passing to the LLM
Provide a local MCP endpoint for scraping tools (agent → MCP → scrapling)

Boundaries (vs Playwright / Search)

vs `playwright`

scrapling: best for “get URL → extract selector → write file” workflows; simpler, faster iteration
playwright: best for interactive UI flows (login, multi-step navigation, downloads, complex JS actions, stateful sessions)

If you must navigate or click through a UI, use playwright. If you can directly fetch the target page and just need extraction, use scrapling.

vs search tools

Search tools are for discovering sources/URLs (query → result list → choose URLs).
scrapling is for acquisition + extraction once you already know the URL(s).

A common pipeline:

Search → find candidate URLs
Scrapling → extract focused content from chosen URLs
LLM → summarize / transform / analyze extracted outputs

Prerequisite check (required)

Python version (Scrapling requires Python >= 3.10):

python --version

Scrapling CLI availability:

scrapling --help

Installation (recommended)

Scrapling’s CLI and MCP features are enabled via extras.

Recommended (CLI + MCP + fetchers):

python -m pip install "scrapling[ai]"

If you only want CLI fetch/extract without MCP:

python -m pip install "scrapling[fetchers]"

If you use browser-based fetchers, you may need browser binaries:

# Option A: via Scrapling helper (after install)
scrapling install

# Option B: directly via Playwright
python -m playwright install

Wrapper script (Windows convenience)

This skill ships a thin PowerShell wrapper:

C:/Users/羽裳/.codex/skills/scrapling/scripts/scrapling.ps1

It checks whether scrapling exists and prints install hints if missing.

Common CLI patterns

1) Extract full page body (to Markdown)

scrapling extract get "https://example.com" out.md

2) Extract a specific element (CSS selector) to text

scrapling extract get "https://example.com" out.txt --css-selector "main article"

3) Extract HTML for downstream parsing

scrapling extract get "https://example.com" out.html --css-selector "#content"

4) Use browser-backed fetcher mode (when simple GET is blocked / dynamic)

scrapling extract fetch "https://example.com" out.md --css-selector "main"

Tip: keep outputs in files and only feed the smallest relevant snippet to the LLM.

MCP server relationship (optional)

Scrapling can run as an MCP server. This is useful when:

the agent needs tool-style scraping calls
you want scraping results to be structured and deterministic

Start MCP server (stdio transport by default):

scrapling mcp

Optional: run MCP server with HTTP transport:

scrapling mcp --http --host 127.0.0.1 --port 8765

Example MCP server config snippet

{
  "servers": {
    "scrapling": {
      "mode": "stdio",
      "command": "scrapling",
      "args": ["mcp"],
      "required": false,
      "note": "Requires: python -m pip install \"scrapling[ai]\""
    }
  }
}

Safety & ops notes

Prefer selector-based extraction to minimize data volume.
Treat scraping as an external dependency: handle timeouts, retries, and failures explicitly.
For aggressive bot protection, consider switching fetchers or using playwright.