create-provider - SKILL.md Agent Skill

name: create-provider description: Automatically research and create LeanIX Provider fact sheets with verified data. Use this skill whenever the user mentions creating providers, adding providers, researching providers, or says phrases like "Create Provider for X", "Add Provider X", "Research provider X", or provides a provider name/URL to add to the catalog. Also trigger when user mentions provider data collection, provider research, or vendor catalog management. This skill automates the entire workflow - always use it rather than doing ad-hoc research.

Provider Auto-Creation Skill

This skill automates the complete Provider fact sheet creation workflow using parallel research, cross-verification, and quality checks.

⚠️ CRITICAL: Read Before Proceeding ⚠️

This is a RIGID skill. Follow the workflow exactly. No ad-hoc queries. No improvisation.

Before doing ANYTHING, complete this checklist:

Pre-Flight Checklist

Read WORKFLOW_V2.md in this directory (complete workflow)
Read all 5 guidelines in guidelines/ directory:
- Provider_Classification_Definitions.md
- Provider_URL_Validation_Guidelines.md
- Provider_Headquarters_Address_Guidelines.md
- Provider_Alias_Discovery_Guidelines.md
- Provider_Description_Guidelines.md
Verify Perplexity MCP is available (mcp__perplexity__perplexity_search)
Verify WebFetch tool is available
Verify LeanIX MCP is available (mcp__LeanIX_MCP_Server_Remote__)
Read Python query templates: lib/provider_researcher.py, lib/parallel_researcher.py, lib/verification_agent.py (these show the correct query formats to use)

If you're about to skip reading WORKFLOW_V2.md or make ad-hoc Perplexity calls, STOP.

Ask yourself: "Why am I not following the documented workflow? What step did I skip?"

Then: Go back, read the documentation, start from Step 1.

When to Use This Skill

Trigger when user says:

"Create Provider for [name/URL]"
"Add Provider [name]"
"Research provider [name]"
"Create a provider entry for [name]"
"I need to add [name] to the provider catalog"
Provides a provider URL and asks to catalog it

Examples:

"Create Provider for Slack"
"Add Provider Microsoft"
"Create Provider for https://teamsmart.ai/"

What This Skill Does

Automates Provider fact sheet creation through a 4-step workflow:

Parallel Research - Perplexity MCP + WebFetch run simultaneously (5-10 seconds)
Agent Verification - Cross-verify data, resolve conflicts, assign confidence scores
Quality Check - Validate confidence > 70%, all critical fields present
Create & Update - Use LeanIX MCP + Python CLI to create fact sheet

8 Fields Auto-Populated:

Homepage URL (validated from 2+ sources)
Provider Category (Enterprise/Individual/Community)
Aliases and former names
Headquarters address
Description (30-90 words, organization-focused)
Collection status, as-of date, deprecated flag

The Workflow (Read WORKFLOW_V2.md for Full Details)

Step 1: Parallel Research

IMPORTANT: Read WORKFLOW_V2.md for the complete parallel research protocol.

How to Use the Python Research Modules

The Python modules in lib/ are query templates that show you exactly what to ask. Read them first to understand the correct query structure, then make parallel API calls.

Step 1a: Read the Query Templates

Before making ANY queries, read these files to see the exact query format:

Read lib/provider_researcher.py - Shows the 5 Perplexity query methods
Read lib/parallel_researcher.py - Shows WebFetch extraction prompts

Step 1b: Execute Parallel Research (Use the Templates)

Now make 8 simultaneous tool calls using the query formats from the modules:

Perplexity Queries (5 calls):

URL Query - Use get_url_research_query() format from provider_researcher.py
Category Query - Use get_category_research_query() format
Aliases Query - Use get_aliases_research_query() format
Headquarters Query - Use get_headquarters_research_query() format
Description Query - Use get_description_research_query() format

WebFetch Queries (3 calls): 6. Homepage - Use _get_homepage_extraction_prompt() format from parallel_researcher.py 7. About Page - Use _get_about_extraction_prompt() format 8. Contact Page - Use _get_contact_extraction_prompt() format

Example Pattern:

# Read the template first
Read lib/provider_researcher.py

# Then make the call using that query format
mcp__perplexity__perplexity_search(
    query="<exact query from get_url_research_query() method>"
)

DO NOT improvise queries. DO NOT skip reading the modules. The templates ensure consistency with the guidelines.

Step 2: Agent Verification

IMPORTANT: Read WORKFLOW_V2.md for complete verification protocol.

Use lib/verification_agent.py to:

Compare Perplexity vs WebFetch for each field
Identify conflicts
Apply resolution rules:
- Both agree → Use agreed value (HIGH confidence)
- One has data → Use available data (MEDIUM confidence)
- Both differ → Run verification query, choose most reliable
- Both fail → Mark "Not Found", ask user

Marketing Language Filter: For ALL descriptions, check for buzzwords:

Detect: "seamless", "transform", "enhance", "streamline", "empower", "revolutionary", "cutting-edge", "enterprise-grade", "leading", "innovative", "powerful"
Rewrite: Factual statements only ("integrates with" not "seamless integration")
Read MARKETING_LANGUAGE_FILTER.md for complete list

Output: Verified data with confidence scores per field

Step 3: Quality Check

Before creating fact sheet, verify:

✅ Overall confidence > 70%
✅ Critical fields present (URL, Category)
✅ No unresolved conflicts
✅ Description word count valid (30-90 words)

If any check fails, present options:

Save with partial data
Manual input for missing fields
Cancel operation

Step 4: Create & Update

# 4.1 Create fact sheet via LeanIX MCP
fact_sheet_id = mcp__LeanIX_MCP_Server_Remote__create_fact_sheet(
    name=provider_name,
    type="Provider"
)

# 4.2 Update custom fields via Python CLI
# Run: python create-provider/main.py update --fact-sheet-id {id} --type Provider --fields '{...}'
update_fields = {
    "homePageUrl": verified_data.url,
    "providerCategory": verified_data.category,
    "aliases": verified_data.aliases,
    "headquartersAddress": verified_data.headquarters,
    "description": verified_data.description,
    "collectionStatus": "inReview",
    "asOfDate": today's date,
    "deprecated": "no"
}

# 4.3 Return URL
return f"https://{subdomain}.leanix.net/{workspace}/factsheet/Provider/{fact_sheet_id}"

Guidelines (Must Read)

Before researching ANY field, read the corresponding guideline:

Provider Classification

File: guidelines/Provider_Classification_Definitions.md

3 categories with decision tree:

Individual: Single named person/personal portfolio
Enterprise: Commercial company with formal business operations
Community Based: Open-source projects, foundations, collaborative initiatives

Decision order matters: Check Individual first, then Enterprise, then Community.

URL Validation

File: guidelines/Provider_URL_Validation_Guidelines.md

NEVER HALLUCINATE URLS. Core principles:

Find in 2+ authoritative sources
Validate HTTP status code (200 OK)
Check SSL certificate
Return "Not Found" if uncertain

Headquarters Address

File: guidelines/Provider_Headquarters_Address_Guidelines.md

Priority order:

Official website (best)
Business registries (good)
News articles (acceptable)

Partial data is OK: "San Francisco, CA, United States" is better than guessing full street address.

Alias Discovery

File: guidelines/Provider_Alias_Discovery_Guidelines.md

7 alias types to check:

URL variations (with/without www)
Former company names (pre-rebrand)
Acquisition history (acquired companies)
Abbreviations (IBM for International Business Machines)
Legal entity names (Inc, Corp, Ltd)
DBA names (Doing Business As)
Stylistic variations (capitalization, punctuation)

Description Writing

File: guidelines/Provider_Description_Guidelines.md

Requirements:

30-90 words
Organization-focused (not product-focused)
Factual and objective (no marketing language)
Extract from official website
Remove buzzwords (see MARKETING_LANGUAGE_FILTER.md)

Python Modules

lib/provider_researcher.py

Executes single-provider research via Perplexity with 5 specialized queries.

Use when: You need to research one provider using Perplexity MCP.

lib/parallel_researcher.py

Coordinates parallel research across Perplexity + WebFetch.

Use when: Starting Step 1 of the workflow.

lib/verification_agent.py

Compares results from multiple sources and resolves conflicts.

Use when: Starting Step 2 of the workflow (after parallel research completes).

main.py (Python CLI)

Updates custom fields via GraphQL API.

Use when: Step 4, after fact sheet is created via LeanIX MCP.

Command format:

cd create-provider
python main.py update \
  --fact-sheet-id "uuid-here" \
  --type Provider \
  --fields '{"homePageUrl": "...", "providerCategory": "...", ...}'

Output Format

After completing workflow, present results like this:

✓ Provider created successfully!

Provider: TeamSmart AI
Overall Confidence: 88% (HIGH)

Verified Data:
├─ Homepage URL: https://teamsmart.ai/ ✓✓ (both sources)
├─ Category: Enterprise ✓✓ (both sources)
├─ Aliases: TeamSmart.ai ✓ (perplexity verified)
├─ Headquarters: San Francisco, CA, US ✓✓ (both sources)
└─ Description: 44 words ✓✓ (official website, marketing language removed)

Conflicts Resolved: 2
- Aliases: Used Perplexity (WebFetch found none)
- Description: Rewrote to remove "seamless", "transforms", "enhance"

URL: https://demo-eu-10.leanix.net/ltlsCollectionTesting/factsheet/Provider/{id}

Error Handling

Scenario 1: Perplexity Fails, WebFetch Works

✓ WebFetch: Successfully scraped website
✗ Perplexity: MCP connection failed

Action: Use WebFetch data only
Confidence: Downgraded to MEDIUM (single source)
Proceed: YES (sufficient data from official website)

Scenario 2: WebFetch Fails, Perplexity Works

✗ WebFetch: Website blocking/timeout
✓ Perplexity: Got results with citations

Action: Use Perplexity data only
Confidence: MEDIUM to HIGH (depends on citation quality)
Proceed: YES (Perplexity provides authoritative sources)

Scenario 3: Both Fail

✗ Perplexity: MCP error
✗ WebFetch: Website not accessible

Action: Cannot proceed
User prompt: "Unable to research [Provider]. Please provide data manually or try again later."

Scenario 4: Sources Conflict

⚠ Perplexity: Category = "Individual"
⚠ WebFetch: Category = "Enterprise"

Action: Run verification query to Perplexity
If still unclear: Present evidence to user and ask them to choose

Anti-Patterns (What NOT to Do)

❌ Making ad-hoc Perplexity queries instead of using provider_researcher.py ❌ Skipping WORKFLOW_V2.md and improvising the workflow ❌ Not reading guidelines before researching fields ❌ Hallucinating URLs instead of finding authoritative sources ❌ Using marketing language in descriptions (always filter buzzwords) ❌ Creating fact sheet before quality check (validate confidence first) ❌ Bypassing verification step (always cross-check sources)

Why This Workflow Matters

Without this workflow, you get:

Hallucinated URLs
Inconsistent categorization
Marketing language in descriptions
No confidence scores
Single source (no validation)

With this workflow, you get:

Verified data from 2+ sources
Consistent classification using decision tree
Factual, objective descriptions
Confidence scores per field
Conflict resolution with transparency

Speed: 5-10 seconds total (parallelized) Accuracy: 88%+ confidence typical Reliability: Fallback if one source fails

Self-Check Questions

Before creating a Provider, ask yourself:

Did I read WORKFLOW_V2.md completely?
Did I read the relevant guidelines for each field?
Am I using the documented modules (provider_researcher.py, verification_agent.py)?
Am I running parallel research, or making ad-hoc queries?
Did I verify data from 2+ sources?
Did I check for marketing language in descriptions?
Did I run quality checks before creating the fact sheet?

If answer to ANY question is "no" → STOP and read documentation first.

References

WORKFLOW_V2.md - Complete workflow with examples
guidelines/ - 5 field-specific guidelines
MARKETING_LANGUAGE_FILTER.md - Buzzword detection
FIELD_REFERENCE.md - LeanIX field specifications
README.md - Python CLI usage
lib/ - Python modules for research and verification

Getting Started

When user says "Create Provider for [name]":

Read WORKFLOW_V2.md (don't skip this)
Read guidelines for all 5 fields
Execute Step 1: Parallel research (use parallel_researcher.py)
Execute Step 2: Agent verification (use verification_agent.py)
Execute Step 3: Quality check (confidence > 70%?)
Execute Step 4: Create & update (LeanIX MCP + Python CLI)
Present results with confidence scores and verification details

Remember: This is a RIGID workflow. Follow it exactly. No shortcuts.