name: firecrawl description: "Use when scraping web pages, extracting content from JS-rendered sites or SPAs, running search-plus-scrape workflows, or mapping entire site URL trees. Produces clean LLM-friendly Markdown. Prefer over WebFetch when JavaScript execution is required. Keywords: web scraping, fetch URL, scrape website, search web, extract content, SPA, JS-rendered, site map, crawl, Firecrawl. Triggers on "scrape website" or "extract web content"."
Firecrawl CLI
When to Use
Load this skill when the user request matches the frontmatter description for Firecrawl CLI.
Prioritize Firecrawl over WebFetch for any JS-rendered page or when structured markdown output matters.
NEVER
- Never scrape serially when doing 6+ URLs — 10 sequential scrapes take 50+ seconds; parallel takes 5-8 seconds. No error signals the problem; it just runs slowly.
- Never read an entire
.firecrawl/*.mdoutput file into context without checking size first — scraped pages routinely exceed 5000 lines. Usewc -lthengrep/headto extract what you need. - Never use Firecrawl for real-time data (stock prices, sports scores) — scraping is 10+ seconds stale and costs credits per request; use direct APIs.
- Never use Firecrawl for sites with official SDKs/APIs (e.g., GitHub → use
gh). - Never omit
-oflag — without it, output goes to stdout only and isn't persisted. Credits wasted, re-scraping required. - Never skip
firecrawl --statusbefore authenticated scraping — silent auth failures return empty output, not errors.
Tool Selection Decision Tree
Need web content?
│
├─ Single known URL
│ ├─ Static HTML → WebFetch (faster, free)
│ ├─ JS-rendered / SPA → Firecrawl --wait-for
│ ├─ Need structured markdown → Firecrawl
│ └─ Behind auth/paywall → Firecrawl (after firecrawl login)
│
├─ Search + scrape
│ ├─ Just URLs/titles → WebSearch (lighter, faster)
│ ├─ Top 5-10 results with content → firecrawl search --scrape
│ └─ Deep research (20+ sources) → parallel Firecrawl
│
├─ Discover all pages on a domain → firecrawl map
│
└─ Real-time data → Direct API only
Scale Decision
| Page count | Approach |
|---|---|
| 1–5 | Serial with -o flags |
| 6–50 | Parallel with & and wait |
| 50+ | xargs -P 10 with concurrency check first |
Always check capacity before bulk runs: firecrawl --status shows Concurrency: X/100.
Core Commands
# Search the web
firecrawl search "query" -o .firecrawl/search.json --json
firecrawl search "query" --scrape -o .firecrawl/results.json --json
firecrawl search "AI news" --tbs qdr:d -o .firecrawl/today.json --json # Past day
# Scrape single page
firecrawl scrape https://example.com -o .firecrawl/example.md
firecrawl scrape https://example.com --only-main-content -o .firecrawl/clean.md
firecrawl scrape https://spa.com --wait-for 3000 -o .firecrawl/spa.md
# Map a site
firecrawl map https://example.com -o .firecrawl/urls.txt
firecrawl map https://example.com --search "blog" -o .firecrawl/blog-urls.txt
Parallel Bulk Scraping
# Small batch — & with wait
firecrawl scrape site1.com -o .firecrawl/1.md &
firecrawl scrape site2.com -o .firecrawl/2.md &
firecrawl scrape site3.com -o .firecrawl/3.md &
wait
# Large batch — xargs
cat urls.txt | xargs -P 10 -I {} sh -c 'firecrawl scrape "{}" -o ".firecrawl/$(echo {} | md5).md"'
# Post-scrape extraction
grep "^# " .firecrawl/*.md # All H1 headings
grep -l "keyword" .firecrawl/*.md # Files matching keyword
jq -r '.data.web[].title' .firecrawl/*.json # JSON title extraction
Authentication
firecrawl --status # Check auth and credit status
firecrawl login --browser # Auto-opens browser — don't ask user to run manually
export FIRECRAWL_API_KEY=your_key # Fallback if browser auth fails
Error Quick Reference
| Error | First check | Fix |
|---|---|---|
| Not authenticated | firecrawl --status |
firecrawl login --browser |
| Concurrency limit | firecrawl --status (shows X/100) |
wait for jobs, reduce -P value |
| Page failed to load | curl -I URL (basic connectivity) |
Add --wait-for 5000; try --format html to inspect raw HTML |
| Output file empty | head -20 output.md |
Add --only-main-content; try --include-tags article,main |
Output Organization
Always write to .firecrawl/ directory (add to .gitignore):
.firecrawl/example.com.md
.firecrawl/search-ai-news.json
.firecrawl/docs-sitemap.txt
Load Reference Files When
Load references/cli-options.md when: troubleshooting 3+ unknown flags, header injection, cookie handling, sitemap modes, or custom user-agents.
Load references/output-processing.md when: building 3+ step transformation pipelines, parsing nested JSON from search results, or combining/deduplicating 10+ scraped files.
Do NOT load references for basic search/scrape/map with standard flags.
Arguments
$ARGUMENTS: Search query, URL, or scraping objective. Empty = ask what to scrape.