connector-doc-review - SKILL.md Agent Skill

name: connector-doc-review description: Review and fix OpenMetadata connector documentation against JSON schema and source code. Validates availableFeatures, permissions, yaml.mdx configuration, and overall completeness. Automatically fixes gaps. user-invocable: true argument-hint: " [--service-type=database|pipeline|dashboard|messaging|storage|search|mlmodel] [--version=v1.12.x|v1.13.x|v2.0.x-SNAPSHOT|all] [--dry-run]" allowed-tools: - Bash - Read - Glob - Grep - Edit - Write - Agent

Connector Documentation Review & Fix Skill

When to Activate

When a user asks to review, validate, audit, or fix connector documentation — checking it against the actual JSON schema and ingestion source code.

Arguments

connector-name (required): Name of the connector (e.g., redshift, dynamodb, bigquery, airflow, looker, kafka)
--service-type (optional): One of database, pipeline, dashboard, messaging, storage, search, mlmodel. Default: auto-detect from connector name.
--version (optional): Which version to review. Default: all (reviews v1.11.x, v1.12.x, v1.13.x).
--dry-run (optional): Only report issues, don't fix them.

Directory References

DOCS_ROOT     = .                          # The docs-om repo (current working directory)
OM_ROOT       = ../OpenMetadata            # Sibling directory to docs-om
SCHEMA_ROOT   = ${OM_ROOT}/openmetadata-spec/src/main/resources/json/schema/entity/services
SOURCE_ROOT   = ${OM_ROOT}/ingestion/src/metadata/ingestion/source

Review Process

Phase 1: Gather Ground Truth from Code

Step 1.1: Read the JSON Schema

Read the connector's JSON schema to extract the canonical list of capabilities and configuration fields.

Schema file: ${SCHEMA_ROOT}/connections/${service_type}/${connectorName}Connection.json

Extract:

All properties — field names, types, descriptions, required fields, defaults
supports* boolean flags — these define what the connector can do
Filter pattern fields — schemaFilterPattern, tableFilterPattern, storedProcedureFilterPattern, etc.
sampleDataStorageConfig — presence indicates Sample Data support
Authentication references — $ref to auth schemas (basicAuth, iamAuthConfig, awsCredentials, gcpCredentials, etc.). Build a list of supported authentication types (e.g., "Basic Auth", "IAM Auth", "OAuth2", "API Key", "GCP Credentials") from the authType property's oneOf/anyOf references.
SSL configuration — sslMode, sslConfig, verifySSL
Required fields — the required array

Also read the parent service schema to verify the connector is registered:

Service schema: ${SCHEMA_ROOT}/${serviceType}Service.json

Step 1.2: Read the Source Code

Read these files for the connector:

${SOURCE_ROOT}/${service_type}/${connector_name}/metadata.py    — Source class, capabilities
${SOURCE_ROOT}/${service_type}/${connector_name}/connection.py  — Connection, test steps, permissions
${SOURCE_ROOT}/${service_type}/${connector_name}/service_spec.py — Service spec (lineage, usage, profiler classes)

Extract:

Test connection steps — the test_fn dictionary in test_connection(). Each key represents a permission/capability the connector validates.
Service spec classes — which source classes are registered (lineage_source_class, usage_source_class, profiler_class). Their presence confirms feature support.
Source class mixins/base classes — what the source extends (e.g., LifeCycleQueryMixin, MultiDBSource, CommonNoSQLSource)
Owner extraction — search for yield_tag or owner-related methods in the source to determine if Owners/Tags are supported
Any permission-related comments or docstrings — hints about required IAM roles, database grants, etc.

Also check for:

${SOURCE_ROOT}/${service_type}/${connector_name}/queries.py     — SQL queries (for permission requirements)
${SOURCE_ROOT}/${service_type}/${connector_name}/client.py      — API client (for REST connectors)

Step 1.3: Build the Feature Truth Table

Using the schema and code, build a definitive feature map:

For Database Connectors:

Schema Signal	Available Feature String	Unavailable Feature String
`supportsMetadataExtraction: true`	"Metadata"	—
`supportsUsageExtraction: true`	"Query Usage"	"Query Usage" if false
`supportsLineageExtraction: true`	"View Lineage" or "Lineage"	"Lineage" if false
`supportsViewLineageExtraction: true`	"View Column-level Lineage" or "Column-level Lineage"	"Column-level Lineage" if false
`supportsProfiler: true`	"Data Profiler"	"Data Profiler" if false
`supportsDBTExtraction: true`	"dbt"	"dbt" if false
`supportsDataDiff: true`	"Data Quality"	"Data Quality" if false
`storedProcedureFilterPattern` present	"Stored Procedures"	"Stored Procedures" if absent
`sampleDataStorageConfig` present	"Sample Data"	"Sample Data" if absent
`supportsProfiler: true` (implicit)	"Auto-Classification"	"Auto-Classification" if profiler false
Check source code for owner extraction	"Owners" if supported	"Owners" if not
Check source code for tag extraction	"Tags" if supported	"Tags" if not

For Pipeline Connectors:

Schema Signal	Available Feature String
Always	"Pipelines"
Check code for status extraction	"Pipeline Status"
`supportsLineageExtraction: true` or lineage source in spec	"Lineage"
Check code for owner extraction	"Owners"
Check code for usage tracking	"Usage"
Check code for tag extraction	"Tags"

For Dashboard Connectors:

Schema Signal	Available Feature String
Always	"Dashboards", "Charts"
Check code for datamodel extraction	"Datamodels"
Check code for project support	"Projects"
`supportsLineageExtraction: true` or lineage source in spec	"Lineage"
Check code for column lineage	"Column Lineage"
Check code for owner extraction	"Owners"
Check code for usage tracking	"Usage"
Check code for tag extraction	"Tags"

For Messaging Connectors:

Schema Signal	Available Feature String
Always	"Topics"
Check for sample data support	"Sample Data"

For Storage Connectors:

Schema Signal	Available Feature String
Always	"Metadata"
Check code for structured containers	"Structured Containers"
Check code for unstructured containers	"Unstructured Containers"

For Search Connectors:

Schema Signal	Available Feature String
Always	"Search Indexes"
Check for sample data support	"Sample Data"

For ML Model Connectors:

Schema Signal	Available Feature String
Always	"ML Features"
Check for hyperparameters	"Hyperparameters"
Check for ML store	"ML Store"

Phase 2: Read Current Documentation

For each version being reviewed (v1.11.x, v1.12.x, v1.13.x):

Step 2.1: Read the Main Connector Page

${DOCS_ROOT}/${version}/connectors/${service_type}/${connector_name}.mdx

Extract:

availableFeatures array from <ConnectorDetailsHeader>
unavailableFeatures array from <ConnectorDetailsHeader>
stage value (PROD or BETA)
Requirements section — documented permissions, grants, IAM policies
Connection details section — documented configuration fields
Sections present — which optional sections exist (Query Usage, Lineage, Data Profiler, Data Quality, dbt, Troubleshooting)
Authentication type callout — check if an <Info> callout listing supported authentication types exists near the top of the page (after the intro text, before the table of contents)

Step 2.2: Read the YAML Page

${DOCS_ROOT}/${version}/connectors/${service_type}/${connector_name}/yaml.mdx

If it exists, extract:

YAML example in the CodePanel — all configuration fields shown
ContentSection definitions — field descriptions in the ContentPanel
Optional sections — Query Usage, Lineage, Data Profiler, Auto-Classification, Data Quality
ConnectorDetailsHeader — should match the main page

Step 2.3: Check Navigation Registration

${DOCS_ROOT}/docs.json

Verify:

The main connector page is registered in navigation
The yaml.mdx page is registered (if it exists)
Pages are in the correct version section

Phase 3: Compare and Identify Issues

Run these validation checks and categorize findings:

Check 1: Available Features Accuracy

Compare availableFeatures in docs against the truth table from Phase 1.

MISSING: Feature should be available (per schema/code) but is NOT in availableFeatures
INCORRECT: Feature is in availableFeatures but should NOT be (per schema/code)
WRONG_LIST: Feature is in unavailableFeatures but should be in availableFeatures, or vice versa

Severity: WARNING for each mismatch.

Check 2: Unavailable Features Completeness

Verify that features NOT supported are listed in unavailableFeatures. A feature that is neither available nor unavailable is confusing to users.

Severity: SUGGESTION for missing entries in unavailableFeatures.

Check 3: Permission Documentation

Compare documented permissions against:

Test connection steps (from connection.py)
SQL queries executed (from queries.py)
API calls made (from client.py or source code)

Check for:

MISSING_PERMISSION: A permission required by code but not documented
EXTRA_PERMISSION: A permission documented but not actually needed
UNCLEAR_PERMISSION: Permission listed without explanation of why it's needed

For database connectors, verify documented SQL grants match the queries. For cloud connectors (AWS/GCP/Azure), verify IAM policy actions match API calls.

Severity: WARNING for missing permissions, SUGGESTION for unclear ones.

Check 4: YAML Configuration Completeness

Compare the YAML example in yaml.mdx against the JSON schema properties:

MISSING_FIELD: A schema property (especially required ones) not shown in YAML example
EXTRA_FIELD: A field in YAML that doesn't exist in schema
WRONG_DEFAULT: Default value in YAML doesn't match schema default
MISSING_DESCRIPTION: A ContentSection is missing for a documented field
OUTDATED_DESCRIPTION: ContentSection description doesn't match schema description

Severity: WARNING for missing required fields, SUGGESTION for optional ones.

Check 5: Section Completeness

Based on the feature truth table, verify the documentation has the right sections:

If supportsUsageExtraction: true → Query Usage section should exist
If supportsLineageExtraction: true → Lineage section should exist
If supportsProfiler: true → Data Profiler section should exist
If supportsDBTExtraction: true → dbt section or link should exist
If supportsDataDiff: true → Data Quality section should exist

Severity: WARNING for missing sections.

Check 6: Cross-Version Consistency

If reviewing all versions, check that features and permissions are consistent across versions (unless a known version difference exists).

Severity: SUGGESTION for inconsistencies.

Check 7: Authentication Type Highlighting

Verify that the supported authentication types are highlighted at the top of the page using an <Info> callout. This callout should appear after the intro text and before the table of contents, listing each authentication method the connector supports (derived from the authType property in the JSON schema).

Expected format:

<Info>
**Supported Authentication Types:**
- **Basic Auth** — Username and password authentication
- **IAM Auth** — AWS IAM-based authentication with automatic temporary credential retrieval (supports both Provisioned Clusters and Serverless Workgroups)
</Info>

Common authentication type labels by schema reference:

basicAuth.json → Basic Auth — Username and password authentication
iamAuthConfig.json → IAM Auth — AWS IAM-based authentication with automatic temporary credential retrieval
awsCredentials.json → AWS Credentials — AWS access key, secret key, and optional session token
gcpCredentials.json → GCP Credentials — Google Cloud service account authentication
azureCredentials.json → Azure Credentials — Azure service principal or managed identity authentication
OAuth2 references → OAuth 2.0 — Token-based authentication
API key/token references → API Key — API key or token authentication

Check for:

MISSING_AUTH_CALLOUT: No <Info> callout with authentication types exists at the top
INCOMPLETE_AUTH_CALLOUT: Callout exists but is missing authentication types that the schema supports
STALE_AUTH_CALLOUT: Callout lists authentication types not supported by the schema

Severity: WARNING for missing callout, SUGGESTION for incomplete/stale.

This check applies to both the main page and the yaml.mdx page.

Phase 4: Report Findings

Present a structured report:

## Connector Documentation Review: {connector_name}

### Ground Truth (from schema + code)

**Service Type**: {service_type}
**Schema File**: {path}
**Source Files**: {paths}

**Supported Features**: [list]
**Unsupported Features**: [list]
**Required Permissions**: [list with explanations]
**Required Configuration Fields**: [list]

### Findings

#### {version}

| # | Check | Severity | Finding | Current | Expected |
|---|-------|----------|---------|---------|----------|
| 1 | Features | WARNING | Missing "Data Quality" in availableFeatures | [...] | [...] |
| 2 | Permissions | WARNING | Missing dynamodb:DescribeTable | - | Required for metadata extraction |
| ... | | | | | |

### Summary

- **Warnings**: {count} (should fix)
- **Suggestions**: {count} (nice to have)

Phase 5: Fix Issues (unless --dry-run)

After presenting the report, fix all findings automatically:

5a. Fix Available/Unavailable Features

Edit the <ConnectorDetailsHeader> component in both the main page and yaml.mdx to match the truth table. Ensure both pages use identical feature arrays.

5b. Fix Permissions Documentation

Add missing permissions with clear explanations. Format as:

For AWS: IAM policy JSON with action descriptions
For databases: SQL GRANT statements with explanations
For REST APIs: Required API scopes/roles

Make permissions user-friendly:

Group by capability (metadata extraction, profiling, lineage)
Explain WHY each permission is needed
Provide copy-pasteable policy/grant blocks

5c. Fix YAML Configuration

Update the YAML example to include all schema properties with correct defaults. Update ContentSection descriptions to match schema descriptions.

5d. Fix Missing Sections

Add missing documentation sections using the standard snippet pattern. Import the appropriate shared snippets.

5e. Fix Authentication Type Callout

If the <Info> callout for authentication types is missing or incomplete, add or update it in both the main page and yaml.mdx. Insert it after the intro sentence ("In this section, we provide guides and references to use the {connector} connector.") and before the table of contents. Derive the authentication types from the JSON schema's authType property. Use the label mapping from Check 7. For connectors with cloud-specific auth (IAM, GCP, Azure), include relevant details like supported deployment types.

5f. Ensure Cross-Version Consistency

Apply the same fixes across all versions being reviewed.

Phase 6: Verify Fixes

After applying fixes:

Re-read the modified files to confirm changes look correct
Verify ConnectorDetailsHeader has matching features in both main and yaml pages
Verify all required fields are in the YAML example
Present a before/after summary

Before: 5 warnings, 3 suggestions
After:  0 warnings, 0 suggestions

Fixed:
#1 WARNING  Added "Data Quality" to availableFeatures
#2 WARNING  Added dynamodb:DescribeTable to permissions
#3 WARNING  Added missing hostPort field to YAML example
...

Feature String Reference

Database Connectors - All Possible Features

Available: "Metadata", "Query Usage", "Data Profiler", "Data Quality", "dbt",
           "View Lineage" | "Lineage", "View Column-level Lineage" | "Column-level Lineage",
           "Stored Procedures", "Sample Data", "Auto-Classification",
           "Owners", "Tags"

Unavailable: same strings for features NOT supported

Pipeline Connectors

Available/Unavailable: "Pipelines", "Pipeline Status", "Lineage", "Owners", "Usage", "Tags"

Dashboard Connectors

Available/Unavailable: "Dashboards", "Charts", "Datamodels", "Projects",
                       "Lineage", "Column Lineage", "Owners", "Usage", "Tags"

Messaging Connectors

Available/Unavailable: "Topics", "Sample Data"

Storage Connectors

Available/Unavailable: "Metadata", "Structured Containers", "Unstructured Containers"

Search Connectors

Available/Unavailable: "Search Indexes", "Sample Data"

ML Model Connectors

Available/Unavailable: "ML Features", "Hyperparameters", "ML Store"

Schema Flag to Feature Mapping (Database)

JSON Schema Flag	Default	Maps To
`supportsMetadataExtraction`	`true`	"Metadata"
`supportsUsageExtraction`	`true`	"Query Usage"
`supportsLineageExtraction`	`true`	"View Lineage" or "Lineage"
`supportsViewLineageExtraction`	`true`	"View Column-level Lineage" or "Column-level Lineage"
`supportsProfiler`	`true`	"Data Profiler"
`supportsDBTExtraction`	`true`	"dbt"
`supportsDataDiff`	`true`	"Data Quality"
`supportsSystemProfile`	`false`	(no direct feature, informational)
`supportsQueryComment`	`true`	(no direct feature, informational)
`supportsDatabase`	`true`	(no direct feature, structural)
`storedProcedureFilterPattern`	(present/absent)	"Stored Procedures"
`sampleDataStorageConfig`	(present/absent)	"Sample Data"
`supportsProfiler` (implicit)	same as profiler	"Auto-Classification"

Permission Documentation Best Practices

When documenting permissions, follow these guidelines:

Database Connectors (SQL)

### Requirements

To extract metadata, the user needs the following permissions:

#### Metadata Ingestion
- `USAGE` on schemas — to list and access schemas
- `SELECT` on tables — to read table metadata and sample data

#### Profiler & Data Quality
- `SELECT` on tables — to run profiling queries

#### Usage & Lineage
- Access to query history views (e.g., `pg_stat_statements`, `stl_query`)

Cloud Connectors (AWS)

### Requirements

The IAM user/role needs the following permissions:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "service:ListAction",     // Required for: discovering resources
        "service:DescribeAction", // Required for: extracting metadata
        "service:ReadAction"      // Required for: profiling/sampling
      ],
      "Resource": "*"
    }
  ]
}


### Cloud Connectors (GCP)
```markdown
### Requirements

The service account needs the following roles:
- `roles/viewer` — for metadata extraction
- `roles/bigquery.dataViewer` — for profiling and sampling

YAML Documentation Best Practices

ContentSection Pattern

Each configuration field should have a ContentSection with:

Bold field name matching the YAML key
Description matching or expanding on the schema description
Type info for non-obvious fields
Link to relevant docs for complex fields (auth, SSL, etc.)

YAML Example Requirements

All required schema fields MUST appear in the YAML example
Optional fields with non-null defaults SHOULD appear
Sensitive fields should use placeholder values: "{password}", "{access_key}"
Filter patterns should show the default include/exclude
Comments should be minimal — only for non-obvious fields

Section Ordering in yaml.mdx

Frontmatter
Imports
ConnectorDetailsHeader (must match main page)
Table of Contents
External Ingestion Deployment snippet
Requirements (Python + connector-specific)
Metadata Ingestion (CodePreview with YAML)
Query Usage (if supported)
Lineage (if supported)
Data Profiler (if supported)
Auto-Classification (if supported)
Data Quality (if supported)
dbt Integration (link)