name: new-source-onboarding description: "End-to-end workflow for onboarding a new data source from ingestion configuration through staging, testing, documentation, and lineage registration. Sequences: ingestion-strategy, staging-layer, data-quality-testing, data-catalog, data-lineage. Triggers: 'add a new source', 'new data source', 'onboard a connector', 'new connector', 'add Fivetran connector', 'add Airbyte source', 'ingest new data'." triggers: - "add a new source" - "new data source" - "onboard a connector" - "new connector" - "add Fivetran connector" - "add Airbyte source" - "ingest new data" reads_first: - data-stack-context cli_tools: - schema-introspect.js - source-freshness.js produces: - "ingestion connector configuration" - "sources.yml" - "stg_ model SQL" - "schema.yml with tests" - "catalog documentation" validates_with: - "dbt source freshness" - "dbt compile" - "dbt test --select staging"
When to Use This Workflow
Use new-source-onboarding when adding a net-new data source to the warehouse. This covers picking an ingestion tool, writing staging models, adding tests, documenting the source, and registering lineage. Skip phases that don't apply (e.g., if ingestion is already configured by another team).
Before You Start
Read .claude/data-stack-context.md for ingestion tool, warehouse, and dbt version. Then check what raw schemas already exist by running:
node tools/clis/schema-introspect.js --help
Phase 1: Configure Ingestion
Skill: ingestion-strategy
Choose and configure the ingestion tool to land raw data in the warehouse.
What to do:
- Invoke the
ingestion-strategyskill. - Select the appropriate tool (Fivetran, Airbyte, or custom) based on the source type.
- Configure the connector and confirm the raw data lands in the expected schema.
Phase complete when: Raw data is visible in the warehouse and dbt source freshness can reach it.
Phase 2: Build the Staging Layer
Skill: staging-layer
Write a clean stg_<source>__<object>.sql model for each raw table being onboarded.
What to do:
- Run
node tools/clis/schema-introspect.jsto enumerate raw table columns. - Invoke the
staging-layerskill. - Write one staging model per source table: rename columns, cast types, filter deleted records.
- Write
sources.ymlwith freshness checks. - Run
dbt compile && dbt source freshness.
Phase complete when: All staging models compile and source freshness reports green.
Phase 3: Add Data Quality Tests
Skill: data-quality-testing
Add the minimum required tests to every staging model.
What to do:
- Invoke the
data-quality-testingskill. - Add
unique+not_nullon all primary keys. - Add
relationshipstests for foreign keys where the upstream staging model exists. - Add
accepted_valuesfor categorical columns. - Run
node tools/clis/manifest-coverage.js --manifest target/manifest.jsonto confirm coverage.
Phase complete when: dbt test --select staging passes with zero failures.
Phase 4: Document the Source
Skill: data-catalog
Add descriptions to every column and model so the source is discoverable.
What to do:
- Invoke the
data-catalogskill. - Add
descriptionfields to all columns inschema.yml. - Add a model-level description explaining the source system and data grain.
- Run
dbt docs generateto confirm documentation builds.
Phase complete when: dbt docs generate succeeds and the source is visible in dbt docs.
Phase 5: Register Lineage
Skill: data-lineage
Ensure the new source is visible in the lineage graph.
What to do:
- Run
node tools/clis/lineage-export.js --manifest target/manifest.json --format dotto confirm the staging models appear. - Invoke the
data-lineageskill if OpenLineage or a catalog integration needs to be updated. - Verify no circular dependencies were introduced.
Phase complete when: Lineage export shows the new staging models as leaf nodes with no circular dependencies.
Final Verification
dbt source freshness
dbt compile
dbt test --select staging
dbt docs generate
node tools/clis/source-freshness.js --results target/sources.json
Verify Your Work
Do not present output from this skill as complete until every command below passes without error. If a command fails, consult "If Something Goes Wrong" before asking the user.
- Run
dbt source freshnessand confirm all newly onboarded sources report green — no warn or error freshness violations. - Run
dbt compileto confirm all staging models compile without missing column references or ref() errors. - Run
dbt test --select stagingto confirm all primary key, not_null, and relationships tests pass with zero failures. - Run
dbt docs generateto confirm documentation builds successfully and the new source is visible in the docs site. - Run
node tools/clis/source-freshness.js --results target/sources.jsonto verify freshness results match expectations from the raw warehouse schema.
If Something Goes Wrong
- Source freshness fails: check
loaded_at_fieldinsources.ymlmatches an actual timestamp column in the raw table. - Staging model compile error: usually a column reference that doesn't exist in the raw schema. Re-run
schema-introspect.jsto get the current column list. - Test failures on primary key: the source may have duplicates upstream — add a
dbt_utils.unique_combination_of_columnstest instead and document the issue. - dbt docs generate fails: a column description may contain special characters. Check schema.yml for unescaped quotes or colons.