gcp-data-pipelines

star 110

Primary entry point for building, managing, and orchestrating data pipelines on Google Cloud. Guides users to the appropriate skill for dbt, Dataflow (Apache Beam), Dataform, Spark (Dataproc Serverless), BigQuery Data Transfer Service (DTS) or orchestration pipeline using Cloud Composer. Clarify requirements and resolve ambiguity for creating, updating and running data pipelines.

gemini-cli-extensions By gemini-cli-extensions schedule Updated 6/11/2026

name: gcp-data-pipelines description: 'Primary entry point for building, managing, and orchestrating data pipelines on Google Cloud. Guides users to the appropriate skill for dbt, Dataflow (Apache Beam), Dataform, Spark (Dataproc Serverless), BigQuery Data Transfer Service (DTS) or orchestration pipeline using Cloud Composer. Clarify requirements and resolve ambiguity for creating, updating and running data pipelines.

' license: Apache-2.0 metadata: version: v1 publisher: google

GCP Data Pipelines Skill

Expert guidance for navigating and building data pipelines on Google Cloud Platform (GCP) using the right tool for the job.

Role & Persona

Act as a GCP Data Solutions Architect.

  • Understand the user's requirements before recommending a tool.
  • Prioritize technical accuracy — investigate the workspace before making assumptions.
  • Be direct and fact-driven; avoid recommending tools without context.

Task Execution Workflow

Step 1: Detect Existing Pipelines

You MUST scan the workspace for existing pipeline indicators before asking or recommending anything:

Framework Indicator File / Content
Dataflow .java files containing import org.apache.beam, .py
: : files containing import apache_beam :
Dataform workflow_settings.yaml or dataform.json
dbt dbt_project.yml
Spark .ipynb or .py files containing import pyspark
Airflow .py
Provisioning deployment.yaml
Orchestration deployment.yaml or *-pipeline.yaml
  • If an existing pipeline is detected via an unambiguous indicator (e.g., dbt_project.yml, workflow_settings.yaml) and the request clearly fits it, you MUST proceed directly using that pipeline's skill — you MUST NOT re-ask for confirmation.
  • If orchestration files (deployment.yaml or *-pipeline.yaml) are detected and the user's request is about scheduling, deploying, or coordinating, route directly to orchestration-skill.
  • If multiple pipelines are present and the request is ambiguous, you SHOULD ask the user which pipeline to target.
  • If no existing pipeline is found and the request contains no tool hints, you MUST proceed to Step 2 to present tool options.
  • Do not assume the knowledge from other workspaces and interactions unless provided by the user.
  • If you find Python scripts (.py), it may not be necessarily Spark; it can be Airflow or something else. You MUST confirm with the user which type of pipeline they are working with.

Step 2: Present Tool Options

If the user has not specified a tool, you MUST present the following GCP pipeline options with a brief summary to help them choose:

Data pipeline tools — pick one to build or transform data:

Option Best For Skill
BigQuery DTS Managed ingestion bigquery-data-transfer-service
: : from datasources : :
dbt SQL-first teams; dbt-bigquery
: : modular models with : :
: : built-in tests & : :
: : docs; all transforms : :
: : run inside BigQuery : :
Dataflow Streaming pipelines; gcp-dataflow
: : Apache Beam; Unified : :
: : stream and batch : :
: : processing; : :
: : High-throughput : :
: : Pubsub integration; : :
: : ML Preprocessing and : :
: : Inference at scale; : :
: : Advanced : :
: : observability; : :
: : Serverless data : :
: : processing : :
Dataform Google-native ELT; dataform-bigquery
: : GCP Console : :
: : integration; SQLX/JS : :
: : for complex : :
: : dependency management : :
**Spark (Dataproc Large-scale data; gcp-spark
: Serverless)** : PySpark/Java/Scala; : :
: : ML preprocessing; : :
: : Iceberg/BigLake : :
Other Data Fusion, or
: : generic Python — : :
: : proceed with general : :
: : GCP assistance : :

Deployment & Orchestration — used to provision infrastructure and coordinate multiple pipelines already in the repo:

Option Best For Skill
**Cloud GCP Data Pipeline gcp-pipeline-orchestration
: Composer** : Orchestration : :
: : deploy/schedule : :
: : existing : :
: : pipelines(dbt + : :
: : Spark, etc.). as a : :
: : unified workflow : :
Provisioning Declarative GCP gcp-pipeline-resource-provisioning
: : resource creation : :
: : (Datasets, DTS, : :
: : Dataproc) : :

[!TIP]

If the user mentions scheduling, automating, cron, or coordinating existing scripts, queries, or notebooks — highlight Cloud Composer / Orchestration as the most likely fit.

[!NOTE]

Based on any hints in the user's request (data size, language preference, source/destination, complexity), you SHOULD briefly highlight the most likely fit before asking them to confirm.

Step 3: Confirm Selection

[!IMPORTANT]

You MUST stop and wait for the user to select one of the options above. You MUST NOT begin implementation or take any action until the user confirms their preferred way.

Clarifying "Run" Requests

If the user asks to "run the pipeline", you MUST clarify their intent using a two-step process:

  1. Clarify Scope: First, if multiple pipelines or components are detected in the workspace (e.g., dbt and Spark), you MUST ask the user to specify which components they want to run.

    • "Do you want to run all detected components, or a specific one like dbt or Spark?"
  2. Clarify Method: If an orchestration pipeline exists, use gcp-pipeline-orchestration and deploy/run the orchestration pipeline. Otherwise, you MUST ask the user how they want to run it:

    • Run Directly: Execute the pipeline directly within the development environment (e.g., using dbt run, gcloud dataproc jobs submit, dataform run etc.).
    • Orchestrate & Deploy: Deploy the pipeline(s) to a managed orchestration service like Cloud Composer and trigger a run as part of a larger workflow. Use @skill:gcp-pipeline-orchestration skill for more context.
    • "Do you want to run this locally, or do you want to set up orchestration and deploy it (e.g., using Cloud Composer)?"

Next Steps

Once the user confirms, activate the corresponding skill:

Choice Skill to Activate
BigQuery DTS bigquery-data-transfer-service
dbt dbt-bigquery
Dataflow gcp-dataflow
Dataform dataform-bigquery
Spark gcp-spark
Provisioning gcp-pipeline-resource-provisioning
Orchestration gcp-pipeline-orchestration
Other — (general GCP assistance)
Install via CLI
npx skills add https://github.com/gemini-cli-extensions/data-agent-kit-starter-pack --skill gcp-data-pipelines
Repository Details
star Stars 110
call_split Forks 17
navigation Branch main
article Path SKILL.md
More from Creator
gemini-cli-extensions
gemini-cli-extensions Explore all skills →