web-scraper

star 0

Local firecrawl web scraping, crawling, site mapping, web search, structured extraction, llms.txt generation, and research synthesis. Activates when the user asks to scrape a URL, crawl a website, map site URLs, search the web, extract structured data from pages, generate llms.txt, batch scrape multiple URLs, cancel a crawl, list active crawls, convert a web page to markdown, fetch page content, or research a question using web search and scrape. Uses the self-hosted firecrawl instance on localhost:3002 — no API keys or subscriptions required.

f5xc-salesdemos By f5xc-salesdemos schedule Updated 6/2/2026

name: web-scraper description: >- Local firecrawl web scraping, crawling, site mapping, web search, structured extraction, llms.txt generation, and research synthesis. Activates when the user asks to scrape a URL, crawl a website, map site URLs, search the web, extract structured data from pages, generate llms.txt, batch scrape multiple URLs, cancel a crawl, list active crawls, convert a web page to markdown, fetch page content, or research a question using web search and scrape. Uses the self-hosted firecrawl instance on localhost:3002 — no API keys or subscriptions required. user-invocable: false

Web Scraper (Local Firecrawl)

This skill provides web scraping, crawling, URL mapping, web search, LLM-powered extraction, and llms.txt generation via the local self-hosted firecrawl API. All operations run against http://localhost:3002 with no authentication required.

Delegate to the firecrawl-operator agent to keep API payloads out of the main session context.

Capabilities

Operation What it does Endpoint Type
Scrape Extract content from a single URL POST /v1/scrape Sync
Batch Scrape Scrape multiple URLs at once POST /v1/batch/scrape Async
Crawl Crawl multiple pages from a starting URL POST /v1/crawl Async
Crawl Cancel Cancel a running crawl job DELETE /v1/crawl/:id Sync
Crawl Active List all active crawl jobs GET /v1/crawl/active Sync
Crawl Errors Get error details for a crawl GET /v1/crawl/:id/errors Sync
Map Discover all URLs on a site POST /v1/map Sync
Search Web search with optional scraping POST /v1/search Sync
Extract LLM-powered structured data extraction POST /v1/extract Async
llms.txt Generate llms.txt for a site POST /v1/llmstxt Async
Research Search + scrape + synthesize answer Multiple Sync

Delegation Protocol

When this skill activates, delegate immediately to the firecrawl-operator agent.

For scrape requests

Agent(
  subagent_type="firecrawl:firecrawl-operator",
  description="Scrape: [URL in 3 words]",
  prompt="PROTOCOL: SCRAPE\nURL: <the target URL>\nFORMATS: <requested formats or 'markdown'>\nOPTIONS: <any user-specified options like onlyMainContent, waitFor, etc.>\n\nScrape this URL and return the content."
)

For batch scrape requests

Agent(
  subagent_type="firecrawl:firecrawl-operator",
  description="Batch scrape: [count] URLs",
  prompt="PROTOCOL: BATCH_SCRAPE\nURLS: <comma-separated list of URLs>\nFORMATS: <requested formats or 'markdown'>\n\nBatch scrape these URLs and return results."
)

For crawl requests

Agent(
  subagent_type="firecrawl:firecrawl-operator",
  description="Crawl: [URL in 3 words]",
  prompt="PROTOCOL: CRAWL\nURL: <the target URL>\nLIMIT: <page limit, default 10>\nOPTIONS: <any user-specified options like maxDepth, includePaths, etc.>\n\nCrawl this site and return page summaries."
)

For crawl management requests

Agent(
  subagent_type="firecrawl:firecrawl-operator",
  description="Crawl mgmt: [action]",
  prompt="PROTOCOL: CRAWL_CANCEL|CRAWL_ACTIVE|CRAWL_ERRORS\nJOB_ID: <if applicable>\n\nExecute the requested crawl management operation."
)

For map requests

Agent(
  subagent_type="firecrawl:firecrawl-operator",
  description="Map: [URL in 3 words]",
  prompt="PROTOCOL: MAP\nURL: <the target URL>\nOPTIONS: <any user-specified options like search, includeSubdomains, etc.>\n\nMap all URLs on this site."
)

For search requests

Agent(
  subagent_type="firecrawl:firecrawl-operator",
  description="Search: [query in 3 words]",
  prompt="PROTOCOL: SEARCH\nQUERY: <the search query>\nLIMIT: <result limit, default 5>\nOPTIONS: <any options like lang, country, tbs, scrapeOptions>\n\nSearch the web and return results."
)

For extract requests

Agent(
  subagent_type="firecrawl:firecrawl-operator",
  description="Extract: [what] from [URL]",
  prompt="PROTOCOL: EXTRACT\nURLS: <target URLs>\nPROMPT: <what to extract>\nSCHEMA: <JSON schema if user specified one>\n\nExtract structured data from these URLs."
)

For llms.txt requests

Agent(
  subagent_type="firecrawl:firecrawl-operator",
  description="llms.txt: [URL in 3 words]",
  prompt="PROTOCOL: LLMSTXT\nURL: <the target URL>\n\nGenerate an llms.txt file for this site."
)

For research requests

Agent(
  subagent_type="firecrawl:firecrawl-researcher",
  description="Research: [topic in 3 words]",
  prompt="QUESTION: <the user's natural language question>\nLIMIT: <number of sources, default 5>\nDOMAINS: <comma-separated domain scope, or 'none'>\n\nResearch this question using web search + scrape and return a structured report."
)

Wait for the agent's response and relay it directly to the user.

When to activate

  • "scrape this URL" / "scrape https://..."
  • "batch scrape" / "scrape these URLs" / "scrape multiple pages"
  • "crawl this site" / "crawl https://..."
  • "cancel the crawl" / "stop the crawl" / "list active crawls"
  • "map this site" / "find all URLs on"
  • "search for" / "search the web for" / "web search"
  • "extract data from" / "extract structured data" / "pull fields from"
  • "generate llms.txt" / "create llms.txt" / "make an llms.txt"
  • "convert this page to Markdown"
  • "fetch page content" / "get Markdown from"
  • Any request to read, extract, or search web content
  • "research this" / "find out if" / "does [product] support"
  • "is [feature] available in" / "answer this question"
  • "look up" / "investigate" / "what do we know about"
  • Any natural language question that needs web research to answer

What this does NOT do

  • No cloud API — uses local self-hosted instance only
  • No API keys — no FIRECRAWL_API_KEY needed
  • No browser sessions — cloud-only feature
  • No deep research — cloud-only feature
  • Extract requires LLM proxy — needs OPENAI_BASE_URL configured
  • Research requires working search — firecrawl SEARCH endpoint must be operational
Install via CLI
npx skills add https://github.com/f5xc-salesdemos/marketplace --skill web-scraper
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
f5xc-salesdemos
f5xc-salesdemos Explore all skills →