name: salesforce-help-site-scraper description: 'Scrape Salesforce Help articles into clean Markdown with consent handling and content cleanup. Use when you need an internal, readable snapshot of Help content for research or documentation support.' license: Forward Proprietary compatibility: VS Code 1.x+, Node.js 18+
Salesforce Help Site Scraper
Use this skill to extract Salesforce Help article content into clean Markdown when pages render dynamically or are blocked by consent banners.
When to Use This Skill
- You need a readable Markdown snapshot of a Help article for internal research.
- OneTrust cookie banners block access to the main content.
- You want to remove headers, footers, or navigation chrome before extraction.
- NOT for: high-volume crawling, bypassing access controls, or republishing Salesforce content.
Prerequisites
- Node.js 18+
- Scraper script at
skills/salesforce-help-site-scraper/scripts/scrape-help-to-markdown.js
How to Use
Basic Usage
node skills/salesforce-help-site-scraper/scripts/scrape-help-to-markdown.js \
--url "https://help.salesforce.com/s/articleView?id=sf.flow.htm&type=5" \
--out "./artifacts/online-research/help_flow_overview.md" \
--consent-selector "#onetrust-accept-btn-handler" \
--remove-selectors "header,footer,nav,aside" \
--wait 2500
Script Options
| Option | Required | Description |
|---|---|---|
--url |
Yes | Target Help article URL. |
--out |
Yes | Output Markdown file path. |
--consent-selector |
No | Selector for cookie/consent accept button (OneTrust). |
--remove-selectors |
No | Comma-separated selectors to remove before extraction. |
--wait |
No | Milliseconds to wait after navigation or consent click. |
Compliance Notes
- Prefer the Salesforce Knowledge APIs for structured, supported access where possible.
- Check and respect
robots.txtbefore scraping. - Do not republish or redistribute Salesforce Help content.
- Attribute content to Salesforce when used internally.
Examples
Example: Capture a Flow Help article
node skills/salesforce-help-site-scraper/scripts/scrape-help-to-markdown.js \
--url "https://help.salesforce.com/s/articleView?id=sf.flow_build.htm&type=5" \
--out "./artifacts/online-research/help_flow_build.md" \
--consent-selector "#onetrust-accept-btn-handler" \
--remove-selectors "header,footer,nav,aside" \
--wait 2500
Troubleshooting
Issue: Output is empty or too short
Solution: Increase --wait or refine --remove-selectors to avoid removing the main content container.
Issue: Consent banner blocks content
Solution: Provide --consent-selector for the OneTrust accept button.
References
- Salesforce Help: https://help.salesforce.com/
- Salesforce Knowledge Developer Guide: https://developer.salesforce.com/docs/atlas.en-us.knowledge_dev.meta/knowledge_dev/
- Salesforce REST API - Knowledge Resources: https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_knowledge.htm
- Salesforce Master Subscription Agreement: https://www.salesforce.com/company/legal/agreements/
- Salesforce Intellectual Property and Trademarks: https://www.salesforce.com/company/legal/intellectual/
- Robots Exclusion Protocol (RFC 9309): https://www.rfc-editor.org/rfc/rfc9309