scrapling

star 2.3k

CLI-first web scraping & content extraction with optional MCP server. Use when you have target URLs and need clean, selector-based outputs (html/md/txt).

foryourhealth111-pixel By foryourhealth111-pixel schedule Updated 3/5/2026

name: "scrapling" description: "CLI-first web scraping & content extraction with optional MCP server. Use when you have target URLs and need clean, selector-based outputs (html/md/txt)."

Scrapling Skill (VCO)

Scrapling is a Python-based web scraping / extraction toolkit that exposes:

  • a CLI (scrapling ...) for fetching + extracting content into files
  • an optional MCP server (scrapling mcp) so an agent can call structured scraping tools

This skill is CLI-first. Prefer it when you already have URLs and need reliable, repeatable extraction (CSS selector → file).

When to use

Use scrapling when you need:

  • Extract specific parts of a web page (CSS selector / XPath) into .txt / .md / .html
  • Run repeatable scraping jobs (batch URLs with a small wrapper script)
  • Reduce token usage by extracting only the relevant DOM region before passing to the LLM
  • Provide a local MCP endpoint for scraping tools (agent → MCP → scrapling)

Boundaries (vs Playwright / Search)

vs playwright

  • scrapling: best for “get URL → extract selector → write file” workflows; simpler, faster iteration
  • playwright: best for interactive UI flows (login, multi-step navigation, downloads, complex JS actions, stateful sessions)

If you must navigate or click through a UI, use playwright. If you can directly fetch the target page and just need extraction, use scrapling.

vs search tools

  • Search tools are for discovering sources/URLs (query → result list → choose URLs).
  • scrapling is for acquisition + extraction once you already know the URL(s).

A common pipeline:

  1. Search → find candidate URLs
  2. Scrapling → extract focused content from chosen URLs
  3. LLM → summarize / transform / analyze extracted outputs

Prerequisite check (required)

  1. Python version (Scrapling requires Python >= 3.10):
python --version
  1. Scrapling CLI availability:
scrapling --help

Installation (recommended)

Scrapling’s CLI and MCP features are enabled via extras.

Recommended (CLI + MCP + fetchers):

python -m pip install "scrapling[ai]"

If you only want CLI fetch/extract without MCP:

python -m pip install "scrapling[fetchers]"

If you use browser-based fetchers, you may need browser binaries:

# Option A: via Scrapling helper (after install)
scrapling install

# Option B: directly via Playwright
python -m playwright install

Wrapper script (Windows convenience)

This skill ships a thin PowerShell wrapper:

  • C:/Users/羽裳/.codex/skills/scrapling/scripts/scrapling.ps1

It checks whether scrapling exists and prints install hints if missing.

Common CLI patterns

1) Extract full page body (to Markdown)

scrapling extract get "https://example.com" out.md

2) Extract a specific element (CSS selector) to text

scrapling extract get "https://example.com" out.txt --css-selector "main article"

3) Extract HTML for downstream parsing

scrapling extract get "https://example.com" out.html --css-selector "#content"

4) Use browser-backed fetcher mode (when simple GET is blocked / dynamic)

scrapling extract fetch "https://example.com" out.md --css-selector "main"

Tip: keep outputs in files and only feed the smallest relevant snippet to the LLM.

MCP server relationship (optional)

Scrapling can run as an MCP server. This is useful when:

  • the agent needs tool-style scraping calls
  • you want scraping results to be structured and deterministic

Start MCP server (stdio transport by default):

scrapling mcp

Optional: run MCP server with HTTP transport:

scrapling mcp --http --host 127.0.0.1 --port 8765

Example MCP server config snippet

{
  "servers": {
    "scrapling": {
      "mode": "stdio",
      "command": "scrapling",
      "args": ["mcp"],
      "required": false,
      "note": "Requires: python -m pip install \"scrapling[ai]\""
    }
  }
}

Safety & ops notes

  • Prefer selector-based extraction to minimize data volume.
  • Treat scraping as an external dependency: handle timeouts, retries, and failures explicitly.
  • For aggressive bot protection, consider switching fetchers or using playwright.
Install via CLI
npx skills add https://github.com/foryourhealth111-pixel/Vibe-Skills --skill scrapling
Repository Details
star Stars 2,266
call_split Forks 168
navigation Branch main
article Path SKILL.md
Occupations
More from Creator
foryourhealth111-pixel
foryourhealth111-pixel Explore all skills →