name: connector-doc-review
description: Review and fix OpenMetadata connector documentation against JSON schema and source code. Validates availableFeatures, permissions, yaml.mdx configuration, and overall completeness. Automatically fixes gaps.
user-invocable: true
argument-hint: " [--service-type=database|pipeline|dashboard|messaging|storage|search|mlmodel] [--version=v1.12.x|v1.13.x|v2.0.x-SNAPSHOT|all] [--dry-run]"
allowed-tools:
- Bash
- Read
- Glob
- Grep
- Edit
- Write
- Agent
Connector Documentation Review & Fix Skill
When to Activate
When a user asks to review, validate, audit, or fix connector documentation — checking it against the actual JSON schema and ingestion source code.
Arguments
- connector-name (required): Name of the connector (e.g.,
redshift,dynamodb,bigquery,airflow,looker,kafka) - --service-type (optional): One of
database,pipeline,dashboard,messaging,storage,search,mlmodel. Default: auto-detect from connector name. - --version (optional): Which version to review. Default:
all(reviews v1.11.x, v1.12.x, v1.13.x). - --dry-run (optional): Only report issues, don't fix them.
Directory References
DOCS_ROOT = . # The docs-om repo (current working directory)
OM_ROOT = ../OpenMetadata # Sibling directory to docs-om
SCHEMA_ROOT = ${OM_ROOT}/openmetadata-spec/src/main/resources/json/schema/entity/services
SOURCE_ROOT = ${OM_ROOT}/ingestion/src/metadata/ingestion/source
Review Process
Phase 1: Gather Ground Truth from Code
Step 1.1: Read the JSON Schema
Read the connector's JSON schema to extract the canonical list of capabilities and configuration fields.
Schema file: ${SCHEMA_ROOT}/connections/${service_type}/${connectorName}Connection.json
Extract:
- All properties — field names, types, descriptions, required fields, defaults
supports*boolean flags — these define what the connector can do- Filter pattern fields —
schemaFilterPattern,tableFilterPattern,storedProcedureFilterPattern, etc. sampleDataStorageConfig— presence indicates Sample Data support- Authentication references —
$refto auth schemas (basicAuth, iamAuthConfig, awsCredentials, gcpCredentials, etc.). Build a list of supported authentication types (e.g., "Basic Auth", "IAM Auth", "OAuth2", "API Key", "GCP Credentials") from theauthTypeproperty'soneOf/anyOfreferences. - SSL configuration —
sslMode,sslConfig,verifySSL - Required fields — the
requiredarray
Also read the parent service schema to verify the connector is registered:
Service schema: ${SCHEMA_ROOT}/${serviceType}Service.json
Step 1.2: Read the Source Code
Read these files for the connector:
${SOURCE_ROOT}/${service_type}/${connector_name}/metadata.py — Source class, capabilities
${SOURCE_ROOT}/${service_type}/${connector_name}/connection.py — Connection, test steps, permissions
${SOURCE_ROOT}/${service_type}/${connector_name}/service_spec.py — Service spec (lineage, usage, profiler classes)
Extract:
- Test connection steps — the
test_fndictionary intest_connection(). Each key represents a permission/capability the connector validates. - Service spec classes — which source classes are registered (lineage_source_class, usage_source_class, profiler_class). Their presence confirms feature support.
- Source class mixins/base classes — what the source extends (e.g.,
LifeCycleQueryMixin,MultiDBSource,CommonNoSQLSource) - Owner extraction — search for
yield_tagor owner-related methods in the source to determine if Owners/Tags are supported - Any permission-related comments or docstrings — hints about required IAM roles, database grants, etc.
Also check for:
${SOURCE_ROOT}/${service_type}/${connector_name}/queries.py — SQL queries (for permission requirements)
${SOURCE_ROOT}/${service_type}/${connector_name}/client.py — API client (for REST connectors)
Step 1.3: Build the Feature Truth Table
Using the schema and code, build a definitive feature map:
For Database Connectors:
| Schema Signal | Available Feature String | Unavailable Feature String |
|---|---|---|
supportsMetadataExtraction: true |
"Metadata" | — |
supportsUsageExtraction: true |
"Query Usage" | "Query Usage" if false |
supportsLineageExtraction: true |
"View Lineage" or "Lineage" | "Lineage" if false |
supportsViewLineageExtraction: true |
"View Column-level Lineage" or "Column-level Lineage" | "Column-level Lineage" if false |
supportsProfiler: true |
"Data Profiler" | "Data Profiler" if false |
supportsDBTExtraction: true |
"dbt" | "dbt" if false |
supportsDataDiff: true |
"Data Quality" | "Data Quality" if false |
storedProcedureFilterPattern present |
"Stored Procedures" | "Stored Procedures" if absent |
sampleDataStorageConfig present |
"Sample Data" | "Sample Data" if absent |
supportsProfiler: true (implicit) |
"Auto-Classification" | "Auto-Classification" if profiler false |
| Check source code for owner extraction | "Owners" if supported | "Owners" if not |
| Check source code for tag extraction | "Tags" if supported | "Tags" if not |
For Pipeline Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "Pipelines" |
| Check code for status extraction | "Pipeline Status" |
supportsLineageExtraction: true or lineage source in spec |
"Lineage" |
| Check code for owner extraction | "Owners" |
| Check code for usage tracking | "Usage" |
| Check code for tag extraction | "Tags" |
For Dashboard Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "Dashboards", "Charts" |
| Check code for datamodel extraction | "Datamodels" |
| Check code for project support | "Projects" |
supportsLineageExtraction: true or lineage source in spec |
"Lineage" |
| Check code for column lineage | "Column Lineage" |
| Check code for owner extraction | "Owners" |
| Check code for usage tracking | "Usage" |
| Check code for tag extraction | "Tags" |
For Messaging Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "Topics" |
| Check for sample data support | "Sample Data" |
For Storage Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "Metadata" |
| Check code for structured containers | "Structured Containers" |
| Check code for unstructured containers | "Unstructured Containers" |
For Search Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "Search Indexes" |
| Check for sample data support | "Sample Data" |
For ML Model Connectors:
| Schema Signal | Available Feature String |
|---|---|
| Always | "ML Features" |
| Check for hyperparameters | "Hyperparameters" |
| Check for ML store | "ML Store" |
Phase 2: Read Current Documentation
For each version being reviewed (v1.11.x, v1.12.x, v1.13.x):
Step 2.1: Read the Main Connector Page
${DOCS_ROOT}/${version}/connectors/${service_type}/${connector_name}.mdx
Extract:
availableFeaturesarray from<ConnectorDetailsHeader>unavailableFeaturesarray from<ConnectorDetailsHeader>stagevalue (PROD or BETA)- Requirements section — documented permissions, grants, IAM policies
- Connection details section — documented configuration fields
- Sections present — which optional sections exist (Query Usage, Lineage, Data Profiler, Data Quality, dbt, Troubleshooting)
- Authentication type callout — check if an
<Info>callout listing supported authentication types exists near the top of the page (after the intro text, before the table of contents)
Step 2.2: Read the YAML Page
${DOCS_ROOT}/${version}/connectors/${service_type}/${connector_name}/yaml.mdx
If it exists, extract:
- YAML example in the CodePanel — all configuration fields shown
- ContentSection definitions — field descriptions in the ContentPanel
- Optional sections — Query Usage, Lineage, Data Profiler, Auto-Classification, Data Quality
- ConnectorDetailsHeader — should match the main page
Step 2.3: Check Navigation Registration
${DOCS_ROOT}/docs.json
Verify:
- The main connector page is registered in navigation
- The yaml.mdx page is registered (if it exists)
- Pages are in the correct version section
Phase 3: Compare and Identify Issues
Run these validation checks and categorize findings:
Check 1: Available Features Accuracy
Compare availableFeatures in docs against the truth table from Phase 1.
- MISSING: Feature should be available (per schema/code) but is NOT in
availableFeatures - INCORRECT: Feature is in
availableFeaturesbut should NOT be (per schema/code) - WRONG_LIST: Feature is in
unavailableFeaturesbut should be inavailableFeatures, or vice versa
Severity: WARNING for each mismatch.
Check 2: Unavailable Features Completeness
Verify that features NOT supported are listed in unavailableFeatures. A feature that is neither available nor unavailable is confusing to users.
Severity: SUGGESTION for missing entries in unavailableFeatures.
Check 3: Permission Documentation
Compare documented permissions against:
- Test connection steps (from
connection.py) - SQL queries executed (from
queries.py) - API calls made (from
client.pyor source code)
Check for:
- MISSING_PERMISSION: A permission required by code but not documented
- EXTRA_PERMISSION: A permission documented but not actually needed
- UNCLEAR_PERMISSION: Permission listed without explanation of why it's needed
For database connectors, verify documented SQL grants match the queries. For cloud connectors (AWS/GCP/Azure), verify IAM policy actions match API calls.
Severity: WARNING for missing permissions, SUGGESTION for unclear ones.
Check 4: YAML Configuration Completeness
Compare the YAML example in yaml.mdx against the JSON schema properties:
- MISSING_FIELD: A schema property (especially required ones) not shown in YAML example
- EXTRA_FIELD: A field in YAML that doesn't exist in schema
- WRONG_DEFAULT: Default value in YAML doesn't match schema default
- MISSING_DESCRIPTION: A ContentSection is missing for a documented field
- OUTDATED_DESCRIPTION: ContentSection description doesn't match schema description
Severity: WARNING for missing required fields, SUGGESTION for optional ones.
Check 5: Section Completeness
Based on the feature truth table, verify the documentation has the right sections:
- If
supportsUsageExtraction: true→ Query Usage section should exist - If
supportsLineageExtraction: true→ Lineage section should exist - If
supportsProfiler: true→ Data Profiler section should exist - If
supportsDBTExtraction: true→ dbt section or link should exist - If
supportsDataDiff: true→ Data Quality section should exist
Severity: WARNING for missing sections.
Check 6: Cross-Version Consistency
If reviewing all versions, check that features and permissions are consistent across versions (unless a known version difference exists).
Severity: SUGGESTION for inconsistencies.
Check 7: Authentication Type Highlighting
Verify that the supported authentication types are highlighted at the top of the page using an <Info> callout. This callout should appear after the intro text and before the table of contents, listing each authentication method the connector supports (derived from the authType property in the JSON schema).
Expected format:
<Info>
**Supported Authentication Types:**
- **Basic Auth** — Username and password authentication
- **IAM Auth** — AWS IAM-based authentication with automatic temporary credential retrieval (supports both Provisioned Clusters and Serverless Workgroups)
</Info>
Common authentication type labels by schema reference:
basicAuth.json→ Basic Auth — Username and password authenticationiamAuthConfig.json→ IAM Auth — AWS IAM-based authentication with automatic temporary credential retrievalawsCredentials.json→ AWS Credentials — AWS access key, secret key, and optional session tokengcpCredentials.json→ GCP Credentials — Google Cloud service account authenticationazureCredentials.json→ Azure Credentials — Azure service principal or managed identity authentication- OAuth2 references → OAuth 2.0 — Token-based authentication
- API key/token references → API Key — API key or token authentication
Check for:
- MISSING_AUTH_CALLOUT: No
<Info>callout with authentication types exists at the top - INCOMPLETE_AUTH_CALLOUT: Callout exists but is missing authentication types that the schema supports
- STALE_AUTH_CALLOUT: Callout lists authentication types not supported by the schema
Severity: WARNING for missing callout, SUGGESTION for incomplete/stale.
This check applies to both the main page and the yaml.mdx page.
Phase 4: Report Findings
Present a structured report:
## Connector Documentation Review: {connector_name}
### Ground Truth (from schema + code)
**Service Type**: {service_type}
**Schema File**: {path}
**Source Files**: {paths}
**Supported Features**: [list]
**Unsupported Features**: [list]
**Required Permissions**: [list with explanations]
**Required Configuration Fields**: [list]
### Findings
#### {version}
| # | Check | Severity | Finding | Current | Expected |
|---|-------|----------|---------|---------|----------|
| 1 | Features | WARNING | Missing "Data Quality" in availableFeatures | [...] | [...] |
| 2 | Permissions | WARNING | Missing dynamodb:DescribeTable | - | Required for metadata extraction |
| ... | | | | | |
### Summary
- **Warnings**: {count} (should fix)
- **Suggestions**: {count} (nice to have)
Phase 5: Fix Issues (unless --dry-run)
After presenting the report, fix all findings automatically:
5a. Fix Available/Unavailable Features
Edit the <ConnectorDetailsHeader> component in both the main page and yaml.mdx to match the truth table. Ensure both pages use identical feature arrays.
5b. Fix Permissions Documentation
Add missing permissions with clear explanations. Format as:
- For AWS: IAM policy JSON with action descriptions
- For databases: SQL GRANT statements with explanations
- For REST APIs: Required API scopes/roles
Make permissions user-friendly:
- Group by capability (metadata extraction, profiling, lineage)
- Explain WHY each permission is needed
- Provide copy-pasteable policy/grant blocks
5c. Fix YAML Configuration
Update the YAML example to include all schema properties with correct defaults. Update ContentSection descriptions to match schema descriptions.
5d. Fix Missing Sections
Add missing documentation sections using the standard snippet pattern. Import the appropriate shared snippets.
5e. Fix Authentication Type Callout
If the <Info> callout for authentication types is missing or incomplete, add or update it in both the main page and yaml.mdx. Insert it after the intro sentence ("In this section, we provide guides and references to use the {connector} connector.") and before the table of contents. Derive the authentication types from the JSON schema's authType property. Use the label mapping from Check 7. For connectors with cloud-specific auth (IAM, GCP, Azure), include relevant details like supported deployment types.
5f. Ensure Cross-Version Consistency
Apply the same fixes across all versions being reviewed.
Phase 6: Verify Fixes
After applying fixes:
- Re-read the modified files to confirm changes look correct
- Verify ConnectorDetailsHeader has matching features in both main and yaml pages
- Verify all required fields are in the YAML example
- Present a before/after summary
Before: 5 warnings, 3 suggestions
After: 0 warnings, 0 suggestions
Fixed:
#1 WARNING Added "Data Quality" to availableFeatures
#2 WARNING Added dynamodb:DescribeTable to permissions
#3 WARNING Added missing hostPort field to YAML example
...
Feature String Reference
Database Connectors - All Possible Features
Available: "Metadata", "Query Usage", "Data Profiler", "Data Quality", "dbt",
"View Lineage" | "Lineage", "View Column-level Lineage" | "Column-level Lineage",
"Stored Procedures", "Sample Data", "Auto-Classification",
"Owners", "Tags"
Unavailable: same strings for features NOT supported
Pipeline Connectors
Available/Unavailable: "Pipelines", "Pipeline Status", "Lineage", "Owners", "Usage", "Tags"
Dashboard Connectors
Available/Unavailable: "Dashboards", "Charts", "Datamodels", "Projects",
"Lineage", "Column Lineage", "Owners", "Usage", "Tags"
Messaging Connectors
Available/Unavailable: "Topics", "Sample Data"
Storage Connectors
Available/Unavailable: "Metadata", "Structured Containers", "Unstructured Containers"
Search Connectors
Available/Unavailable: "Search Indexes", "Sample Data"
ML Model Connectors
Available/Unavailable: "ML Features", "Hyperparameters", "ML Store"
Schema Flag to Feature Mapping (Database)
| JSON Schema Flag | Default | Maps To |
|---|---|---|
supportsMetadataExtraction |
true |
"Metadata" |
supportsUsageExtraction |
true |
"Query Usage" |
supportsLineageExtraction |
true |
"View Lineage" or "Lineage" |
supportsViewLineageExtraction |
true |
"View Column-level Lineage" or "Column-level Lineage" |
supportsProfiler |
true |
"Data Profiler" |
supportsDBTExtraction |
true |
"dbt" |
supportsDataDiff |
true |
"Data Quality" |
supportsSystemProfile |
false |
(no direct feature, informational) |
supportsQueryComment |
true |
(no direct feature, informational) |
supportsDatabase |
true |
(no direct feature, structural) |
storedProcedureFilterPattern |
(present/absent) | "Stored Procedures" |
sampleDataStorageConfig |
(present/absent) | "Sample Data" |
supportsProfiler (implicit) |
same as profiler | "Auto-Classification" |
Permission Documentation Best Practices
When documenting permissions, follow these guidelines:
Database Connectors (SQL)
### Requirements
To extract metadata, the user needs the following permissions:
#### Metadata Ingestion
- `USAGE` on schemas — to list and access schemas
- `SELECT` on tables — to read table metadata and sample data
#### Profiler & Data Quality
- `SELECT` on tables — to run profiling queries
#### Usage & Lineage
- Access to query history views (e.g., `pg_stat_statements`, `stl_query`)
Cloud Connectors (AWS)
### Requirements
The IAM user/role needs the following permissions:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"service:ListAction", // Required for: discovering resources
"service:DescribeAction", // Required for: extracting metadata
"service:ReadAction" // Required for: profiling/sampling
],
"Resource": "*"
}
]
}
### Cloud Connectors (GCP)
```markdown
### Requirements
The service account needs the following roles:
- `roles/viewer` — for metadata extraction
- `roles/bigquery.dataViewer` — for profiling and sampling
YAML Documentation Best Practices
ContentSection Pattern
Each configuration field should have a ContentSection with:
- Bold field name matching the YAML key
- Description matching or expanding on the schema description
- Type info for non-obvious fields
- Link to relevant docs for complex fields (auth, SSL, etc.)
YAML Example Requirements
- All
requiredschema fields MUST appear in the YAML example - Optional fields with non-null defaults SHOULD appear
- Sensitive fields should use placeholder values:
"{password}","{access_key}" - Filter patterns should show the default include/exclude
- Comments should be minimal — only for non-obvious fields
Section Ordering in yaml.mdx
- Frontmatter
- Imports
- ConnectorDetailsHeader (must match main page)
- Table of Contents
- External Ingestion Deployment snippet
- Requirements (Python + connector-specific)
- Metadata Ingestion (CodePreview with YAML)
- Query Usage (if supported)
- Lineage (if supported)
- Data Profiler (if supported)
- Auto-Classification (if supported)
- Data Quality (if supported)
- dbt Integration (link)