data-matching

star 2

Match an external or ad-hoc table (HubSpot export, lead list, P2V sample) to ICP flags or other reference data in Databricks. Use when you have a table of people or races and need to attach int__icp_offices flags (icp_office_win / icp_office_serve, voter_count, etc.) but the table only has emails, BallotReady race IDs, or position IDs, not the ICP join. Routes by which ID you have: email (two-hop via campaigns), race ID (two-hop via race staging), or position ID (direct join). General Databricks matching reference; not Win-pipeline specific.

thegoodparty By thegoodparty schedule Updated 6/8/2026

name: data-matching description: Match an external or ad-hoc table (HubSpot export, lead list, P2V sample) to ICP flags or other reference data in Databricks. Use when you have a table of people or races and need to attach int__icp_offices flags (icp_office_win / icp_office_serve, voter_count, etc.) but the table only has emails, BallotReady race IDs, or position IDs, not the ICP join. Routes by which ID you have: email (two-hop via campaigns), race ID (two-hop via race staging), or position ID (direct join). General Databricks matching reference; not Win-pipeline specific.

Data matching

Match an arbitrary table to ICP flags (and other reference data) in Databricks using established join paths. Route by which ID your source table carries:

  • It has emails but no BallotReady IDs → Path 1 (via campaigns).
  • It has a BallotReady race ID but no position ID → Path 2 (via the race staging struct).
  • It already has a position IDPath 3 (direct join to int__icp_offices).

Verify any named table/column against the live goodparty_data_catalog before relying on it; these recipes drift.

Key Reference Tables

Table Schema Join Key(s) What It Provides
int__icp_offices dbt br_database_position_id icp_office_win, icp_office_serve, voter_count, normalized_position_type, judicial, appointed
campaigns mart_civics user_email, ballotready_position_id, user_id, campaign_id Bridge from email/user to BallotReady position ID
stg_airbyte_source__ballotready_api_race dbt database_id (race ID), position.databaseId (struct) Bridge from race ID to position ID
candidacy mart_civics br_position_database_id, gp_candidacy_id Candidacy-level data with direct position ID

All tables live under goodparty_data_catalog.<schema>.


Join Paths to ICP Offices

Path 1: Via Email → Campaigns (no position/race ID available)

When to use: Your table has user emails but no BallotReady IDs.

email → campaigns.user_email → campaigns.ballotready_position_id → int__icp_offices.br_database_position_id
SELECT
    h.*,
    c.ballotready_position_id,
    c.campaign_id,
    icp.icp_office_win,
    icp.icp_office_serve,
    icp.voter_count,
    icp.normalized_position_type,
    icp.judicial,
    icp.appointed
FROM your_table h
LEFT JOIN goodparty_data_catalog.mart_civics.campaigns c
    ON LOWER(h.email) = LOWER(c.user_email)
LEFT JOIN goodparty_data_catalog.dbt.int__icp_offices icp
    ON c.ballotready_position_id = icp.br_database_position_id

Used in: win_icp_signups_p2v_w_icp (2026-02-13) — 90% match rate (55/61).

Path 2: Via BallotReady Race ID (no position ID available)

When to use: Your table has br_race_id but not br_position_database_id.

br_race_id → stg_airbyte_source__ballotready_api_race.database_id → position.databaseId → int__icp_offices.br_database_position_id
SELECT
    h.*,
    icp.icp_office_win,
    icp.icp_office_serve,
    icp.voter_count,
    icp.normalized_position_type,
    icp.judicial,
    icp.appointed
FROM your_table h
LEFT JOIN goodparty_data_catalog.dbt.stg_airbyte_source__ballotready_api_race r
    ON TRY_CAST(h.br_race_id AS INT) = r.database_id
LEFT JOIN goodparty_data_catalog.dbt.int__icp_offices icp
    ON r.position.databaseId = icp.br_database_position_id

Used in: hubspot_leads_jack_20260211_w_icp (2026-02-11) — 92.5% match rate (1164/1259).

Path 3: Direct Position ID (simplest)

When to use: Your table already has br_position_database_id or ballotready_position_id.

SELECT h.*, icp.*
FROM your_table h
LEFT JOIN goodparty_data_catalog.dbt.int__icp_offices icp
    ON h.br_position_database_id = icp.br_database_position_id

Used in: dbt models like leads_win_candidacy, users_win_candidacy.


Gotchas

  • Delta column names: CREATE OR REPLACE TABLE fails if source columns have spaces. Alias everything to snake_case.
  • TRY_CAST for race IDs: Some br_race_id values are non-numeric strings (e.g. "ts_found_race_net_new"). Use TRY_CAST(br_race_id AS INT) instead of CAST.
  • Struct access: The race staging table stores position as a struct — use r.position.databaseId (camelCase).
  • Email case: Always use LOWER() on both sides when joining on email.
  • One-to-many risk: A user email can match multiple campaigns. Check for duplicates with GROUP BY email HAVING COUNT(*) > 1 before saving.
Install via CLI
npx skills add https://github.com/thegoodparty/gp-data-platform --skill data-matching
Repository Details
star Stars 2
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator
thegoodparty
thegoodparty Explore all skills →