salesforce-help-site-scraper

star 27

Scrape Salesforce Help articles into clean Markdown with consent handling and content cleanup. Use when you need an internal, readable snapshot of Help content for research or documentation support.

taurgis By taurgis schedule Updated 2/3/2026

name: salesforce-help-site-scraper description: 'Scrape Salesforce Help articles into clean Markdown with consent handling and content cleanup. Use when you need an internal, readable snapshot of Help content for research or documentation support.' license: Forward Proprietary compatibility: VS Code 1.x+, Node.js 18+

Salesforce Help Site Scraper

Use this skill to extract Salesforce Help article content into clean Markdown when pages render dynamically or are blocked by consent banners.

When to Use This Skill

  • You need a readable Markdown snapshot of a Help article for internal research.
  • OneTrust cookie banners block access to the main content.
  • You want to remove headers, footers, or navigation chrome before extraction.
  • NOT for: high-volume crawling, bypassing access controls, or republishing Salesforce content.

Prerequisites

  • Node.js 18+
  • Scraper script at skills/salesforce-help-site-scraper/scripts/scrape-help-to-markdown.js

How to Use

Basic Usage

node skills/salesforce-help-site-scraper/scripts/scrape-help-to-markdown.js \
  --url "https://help.salesforce.com/s/articleView?id=sf.flow.htm&type=5" \
  --out "./artifacts/online-research/help_flow_overview.md" \
  --consent-selector "#onetrust-accept-btn-handler" \
  --remove-selectors "header,footer,nav,aside" \
  --wait 2500

Script Options

Option Required Description
--url Yes Target Help article URL.
--out Yes Output Markdown file path.
--consent-selector No Selector for cookie/consent accept button (OneTrust).
--remove-selectors No Comma-separated selectors to remove before extraction.
--wait No Milliseconds to wait after navigation or consent click.

Compliance Notes

  • Prefer the Salesforce Knowledge APIs for structured, supported access where possible.
  • Check and respect robots.txt before scraping.
  • Do not republish or redistribute Salesforce Help content.
  • Attribute content to Salesforce when used internally.

Examples

Example: Capture a Flow Help article

node skills/salesforce-help-site-scraper/scripts/scrape-help-to-markdown.js \
  --url "https://help.salesforce.com/s/articleView?id=sf.flow_build.htm&type=5" \
  --out "./artifacts/online-research/help_flow_build.md" \
  --consent-selector "#onetrust-accept-btn-handler" \
  --remove-selectors "header,footer,nav,aside" \
  --wait 2500

Troubleshooting

Issue: Output is empty or too short

Solution: Increase --wait or refine --remove-selectors to avoid removing the main content container.

Issue: Consent banner blocks content

Solution: Provide --consent-selector for the OneTrust accept button.

References

Install via CLI
npx skills add https://github.com/taurgis/sfcc-dev-mcp --skill salesforce-help-site-scraper
Repository Details
star Stars 27
call_split Forks 9
navigation Branch main
article Path SKILL.md
More from Creator