databricks-ads-session

star 2

Conduct Azure Databricks Architecture Design Sessions (ADS). Orchestrate a structured, multi-turn conversation to gather solution requirements from users across any industry or starting point (on-prem migration, IoT, data warehousing, ML/AI, streaming, etc.), then generate a Databricks-centric architecture diagram as PNG. Use when the user wants to (1) design an Azure Databricks solution, (2) run an architecture design session, (3) scope a Databricks migration or greenfield project, (4) gather requirements for a data platform, or (5) generate an Azure Databricks architecture diagram from requirements.

HaoZhang615 By HaoZhang615 schedule Updated 2/27/2026

name: databricks-ads-session description: Conduct Azure Databricks Architecture Design Sessions (ADS). Orchestrate a structured, multi-turn conversation to gather solution requirements from users across any industry or starting point (on-prem migration, IoT, data warehousing, ML/AI, streaming, etc.), then generate a Databricks-centric architecture diagram as PNG. Use when the user wants to (1) design an Azure Databricks solution, (2) run an architecture design session, (3) scope a Databricks migration or greenfield project, (4) gather requirements for a data platform, or (5) generate an Azure Databricks architecture diagram from requirements. license: MIT compatibility: Works with Claude Code, GitHub Copilot, VS Code, Cursor, and any Agent Skills compatible tool. PNG export requires Node.js (npx @mermaid-js/mermaid-cli). Fallback is Mermaid in Markdown preview. metadata: author: community version: "3.0" domain: Azure Databricks

Azure Databricks ADS Session

This skill provides domain-specific knowledge for Azure Databricks to be used within an Architecture Design Session. The ADS methodology (persona, pacing, session structure, decision narration, trade-off framework, self-critique) is defined in the runtime system prompt — this skill supplies the Databricks-specific questions, patterns, components, and references that the methodology operates on.

Domain: Azure Databricks

This skill covers the Azure Databricks data platform including:

  • Data Engineering: LakeFlow Connect, LakeFlow Jobs, LakeFlow Spark Declarative Pipelines (DLT), Auto Loader, Structured Streaming, Apache Flink, Delta Lake
  • Data Warehousing: SQL Warehouse (Serverless/Pro/Classic), Lakehouse Federation, dbt integration, materialized views
  • AI/ML: Mosaic AI (Model Serving, Feature Store, AI Gateway), MLflow 3.0, serverless GPU compute, distributed training
  • GenAI: Mosaic AI Agent Framework, Agent Bricks, Vector Search, MCP Servers, AI Gateway with guardrails
  • Governance: Unity Catalog, Delta Sharing, ABAC, column-level masking, data lineage, Compatibility Mode
  • Infrastructure: Serverless Workspace, Classic VNet-injected workspace, ADLS Gen2, Azure Key Vault, Microsoft Entra ID

Phase-Specific Databricks Questions

Phase 1: Context Discovery

Ask about:

  • Business problem or opportunity driving this initiative
  • Industry and regulatory context
  • Greenfield project vs. migration from existing system
  • Key stakeholders and decision-makers
  • Timeline and budget constraints
  • Success criteria (what does "done" look like?)
  • KPIs, latency targets, and cost envelope

Adapt: If user mentions migration, read references/migration-patterns.md. If user names a specific industry, read references/industry-templates.md for starter context.

Phase 2: Current Landscape

Ask about:

  • Data sources (databases, APIs, files, streams, SaaS platforms)
  • Current data platform (if migrating: Hadoop, Snowflake, on-prem SQL, etc.)
  • Data volumes and growth rate
  • Real-time vs. batch requirements
  • Data governance and cataloging needs (Unity Catalog considerations)
  • Sensitive data classification (PII, PHI, financial)
  • Unstructured data (documents, PDFs, images, audio) for AI processing

See references/probing-questions.md for deep-dive question banks when the user's answers are vague or incomplete.

Phase 3: Security & Networking

Ask about:

  • Network topology (VNet injection, private endpoints, hub-spoke)
  • Identity provider (Entra ID, federation, SCIM provisioning)
  • Data access control model (table-level, row-level, column-level, attribute-based access control / ABAC)
  • Regulatory compliance (HIPAA, SOC2, GDPR, FedRAMP, industry-specific)
  • Encryption requirements (at-rest, in-transit, customer-managed keys)
  • Secrets management approach

Phase 4: Operational Requirements

Ask about:

  • HA/DR requirements and RPO/RTO targets
  • Multi-region or single-region deployment
  • Cost optimization priorities (reserved capacity, spot instances, serverless compute, FinOps strategy)
  • Workspace deployment model (Serverless Workspace vs Classic with VNet injection)
  • Monitoring and alerting requirements
  • Environment strategy (dev/staging/prod workspace separation)
  • Tagging and cost allocation strategy
  • Operating model: who owns data products, who approves access, who triages incidents

Use references/trade-offs-and-failure-modes.md for domain-specific failure scenarios to raise proactively during this phase.

Readiness Gate

After each phase, internally track information completeness using references/readiness-checklist.md.

Decision Logic

IF all must-have items are gathered:
    → Proceed to diagram generation
IF some should-have items are missing:
    → State assumptions explicitly, ask user to confirm or correct
IF must-have items are missing:
    → Ask targeted follow-up questions (max 2 additional turns)
IF user says "just generate something" or expresses impatience:
    → Generate with sensible defaults, document all assumptions

Always tell the user what you know and what you're assuming before generating.

Databricks Diagram Components

When generating diagrams, use these Databricks-specific node shapes (extends the generic style guide from the architecture-diagramming skill):

Component Type Shape Mermaid Syntax Example
Databricks Workspace Rectangle [Name] [Databricks Workspace]
Delta Tables / ADLS Cylinder [(Name)] [(ADLS Gen2)], [(Delta Tables)]
Unity Catalog Rectangle [Name] [Unity Catalog]
Security (Key Vault, Entra ID) Rounded (Name) (Azure Key Vault), (Microsoft Entra ID)
Networking (VNet, ExpressRoute) Hexagon / Stadium {{Name}} / ([Name]) {{Hub VNet}}, ([ExpressRoute])
External / On-Prem Systems Double-bordered [[Name]] [[On-Prem HDFS]]

Pattern Selection

Match gathered requirements to a Databricks architecture pattern. Read references/databricks-patterns.md for the pattern catalog.

Common patterns:

Use Case Pattern
Data lakehouse (general) Medallion architecture with Unity Catalog
On-prem migration Lift-and-shift → modernize with Delta Lake
Real-time analytics Structured Streaming + Apache Flink + LakeFlow Spark Declarative Pipelines
ML/AI platform Feature Store + MLflow 3.0 + Mosaic AI + Model Serving + Serverless GPU Compute
GenAI / AI agents Mosaic AI Agent Framework + Agent Bricks + Vector Search + AI Gateway + MCP Servers
Business analytics Databricks One + AI/BI Genie + SQL Warehouse
Data warehouse replacement SQL Warehouse + dbt + Lakehouse Federation
IoT data platform Event Hubs → Databricks Streaming → Delta
Multi-team data mesh Unity Catalog + workspace per domain + Delta Sharing
Hybrid batch + streaming LakeFlow Connect + LakeFlow Jobs + Structured Streaming + Flink

Generating the Diagram

Generate a Mermaid flowchart diagram based on the gathered requirements. Use the pattern templates in references/databricks-patterns.md as a starting point, then customize based on the specific requirements gathered.

Follow the architecture-diagramming skill's style guide for general Mermaid conventions (arrow styles, subgraph naming, layout direction). Apply the Databricks-specific node shapes listed above.

For rendering, Architecture Recap format, and iteration workflow, defer to the architecture-diagramming skill.

Optional: Workload Profiling

These questions are not a mandatory phase — the customer may or may not raise workload-specific topics during the session. Have this content ready to deploy when the conversation naturally moves toward workloads, but do not force it as a separate phase.

If the customer discusses workloads, ask about:

  • ETL/ELT pipelines (complexity, frequency, SLAs)
  • ML/AI workloads (training, inference, MLOps maturity, Mosaic AI)
  • GenAI applications (RAG, chatbots, AI agents, document intelligence, Mosaic AI Agent Bricks)
  • BI/reporting and self-service analytics (tools, user count, concurrency, AI/BI Genie)
  • Streaming/real-time analytics requirements
  • SQL analytics workloads (ad-hoc queries, dashboards)
  • Data application hosting needs (Databricks Apps, custom UIs)
  • Notebook/interactive development needs
  • CI/CD and DevOps practices for data engineering (DABs, Azure DevOps, GitHub Actions)

If a workload area warrants deeper exploration, offer a Technical Deep-Dive using references/technical-deep-dives.md.

Databricks Expertise

You know Databricks inside-out — the tradeoffs between serverless and provisioned, when LakeFlow Connect beats ADF, why Liquid Clustering replaced partitioning. Connect technical decisions to business outcomes: "LakeFlow Spark Declarative Pipelines" isn't just a product name — it's fewer pipeline engineers and faster time-to-insight. Translate tech into value.

Reference Files

File Load When
references/conversation-framework.md Understanding the full phase detail, signal detection, and transition logic
references/databricks-patterns.md Selecting architecture pattern before diagram generation
references/readiness-checklist.md Evaluating if enough info has been gathered
references/industry-templates.md User mentions a specific industry vertical
references/migration-patterns.md User is migrating from an existing platform
references/probing-questions.md User gives vague answers, need to dig deeper
references/trade-offs-and-failure-modes.md Trade-off analysis or failure mode walkthrough needed
references/technical-deep-dives.md User accepts a technical deep-dive (spike) on a workload topic

Scripts

Script Purpose
scripts/generate_architecture.py Generate Mermaid diagram code for a given Databricks architecture pattern
Install via CLI
npx skills add https://github.com/HaoZhang615/ads-copilot --skill databricks-ads-session
Repository Details
star Stars 2
call_split Forks 2
navigation Branch main
article Path SKILL.md
More from Creator