sourcegraph-explorer

star 0

Search and explore Databricks source code via Sourcegraph to answer questions about how the platform works, why things are implemented a certain way, what the limitations are, and how services interact. Use when the user asks about Databricks internals, platform behavior, service architecture, error causes, or feature limitations.

sllynn By sllynn schedule Updated 2/26/2026

name: sourcegraph-explorer description: "Search and explore Databricks source code via Sourcegraph to answer questions about how the platform works, why things are implemented a certain way, what the limitations are, and how services interact. Use when the user asks about Databricks internals, platform behavior, service architecture, error causes, or feature limitations."

Sourcegraph Code Explorer

Answer questions about the Databricks platform by searching and reading source code through Sourcegraph.

First-time setup check

Before doing anything, verify the tools are available:

which sg-search sg-read sg-cookie-refresh

If any are missing, tell the user:

The Sourcegraph tools aren't set up yet. Run /sourcegraph-setup first to install them.

Then stop — do not proceed until setup is complete.

Tools

sg-search — search code

sg-search "query string"

Returns JSON lines: {"repo": "...", "path": "...", "line": N, "text": "..."}. Summary stats (match count, duration) go to stderr.

sg-read — read a file

sg-read <repo> <filepath> [start_line] [end_line]

Returns file content with line numbers. Use line ranges for large files.

Examples:

sg-read databricks-eng/universe proto/clusters/v1/clusters.proto
sg-read databricks-eng/runtime sql/core/src/main/scala/SomeFile.scala 50 120

sg-cookie-refresh — refresh authentication

sg-cookie-refresh           # auto-detect browser (tries Arc, Chrome, Safari)
sg-cookie-refresh chrome    # force a specific browser

Extracts Sourcegraph session cookies directly from the browser's cookie database and writes them to /tmp/sg_cookie.txt. Validates they work before finishing.

Authentication

These scripts authenticate via browser session cookies (Sourcegraph access tokens are admin-disabled). The cookie file at /tmp/sg_cookie.txt is managed by sg-cookie-refresh.

If a search returns an error, empty results, or an HTML redirect, the cookie has expired:

sg-cookie-refresh

If that reports EXPIRED, the user needs to open https://sourcegraph.prod.databricks-corp.com in their browser and log in, then re-run the script.

Repositories

Repo Contents Filter
databricks-eng/universe Services, APIs, frontend, infrastructure, proto definitions, feature flags, BUILD configs repo:databricks-eng/universe
databricks-eng/runtime Spark runtime (DBR), cluster components, execution engine repo:databricks-eng/runtime

Search both by default. Narrow to one when the question clearly belongs to a specific repo.

Universe structure (key directories)

  • proto/ — Protobuf service and message definitions (the API contracts)
  • feature-flag/ — Feature flag configurations (Jsonnet files)
  • webapp/web, accounts-ui/web — Frontend applications
  • common/ — Shared libraries
  • access-control*, auth* — Authorization and authentication
  • api-server/, api-client/ — API layer
  • cluster-* — Cluster management
  • billing* — Billing and metering
  • Service directories are generally named after the service (e.g., sql-endpoint/, model-serving/)

Search Strategy

Follow this iterative loop — the same grep-read-grep-read cycle that works locally, but via Sourcegraph.

Step 1: Find entry points

Start broad to locate where the relevant code lives.

sg-search "feature flag evaluation repo:databricks-eng/universe count:10"

For error messages or user-facing strings, search the exact text:

sg-search '"workspace limit exceeded" repo:databricks-eng/universe'

For API endpoints, search the path:

sg-search '"/api/2.0/clusters/create" repo:databricks-eng/universe'

Step 2: Narrow with filters

Once you know the general area, add filters to cut noise:

  • Exclude tests: -file:test -file:Test -file:spec
  • Language: lang:scala, lang:python, lang:java, lang:go, lang:rust, lang:typescript
  • File path: file:\.scala$, file:proto/, file:src/main
  • Directory: file:sql-endpoint/ to scope to a service

Example progression:

sg-search "ClusterCreateRequest repo:databricks-eng/universe count:10"
sg-search "ClusterCreateRequest repo:databricks-eng/universe lang:scala -file:test count:10"
sg-search "ClusterCreateRequest repo:databricks-eng/universe file:proto/ count:10"

Step 3: Read the code

When you find a relevant file, read it to understand context:

sg-read databricks-eng/universe path/to/File.scala

For large files, use line ranges (read ~100 lines around the match):

sg-read databricks-eng/universe path/to/File.scala 50 150

Step 4: Follow the trail

From what you've read, identify the next thing to search for:

  • A function is called → search for its definition: "def functionName" lang:scala
  • A class is used → search for its definition and usages
  • A proto message is referenced → search file:proto/ for its definition
  • A feature flag is checked → search file:feature-flag/ for its configuration
  • A config constant is used → search for where it's defined: "val CONSTANT_NAME"
  • An error is thrown → trace back to what condition triggers it
  • A trait/interface is extended → search for "extends TraitName" or "with TraitName"

Step 5: Check related artifacts

Depending on the question, also search for:

  • Proto definitions: file:proto/ MessageName — the API contract
  • Feature flags: file:feature-flag/ flagName — whether a feature is gated
  • BUILD files: file:BUILD.bazel serviceName — what depends on what
  • Config/limits: search for constants, env vars, or config keys
  • Error messages: search the exact user-facing error string

Step 6: Synthesize

After 2-5 iterations of search-read-follow, present your answer with:

  1. Direct answer to the question
  2. Code references — cite specific files and line numbers (format: repo/path/to/file.ext:L123)
  3. The chain of reasoning — briefly explain how you traced through the code
  4. Caveats — note if you couldn't find definitive proof, if the behavior might be behind a feature flag, or if there are multiple code paths

Query Syntax Reference

Syntax Purpose Example
repo:org/name Filter to repository repo:databricks-eng/universe
file:path Filter to file path (regex) file:\.scala$
-file:path Exclude file path -file:test
lang:name Filter by language lang:scala
type:symbol Search symbols only type:symbol ClusterManager
type:diff Search diffs/changes type:diff removed feature
type:commit Search commit messages type:commit "fix cluster limit"
case:yes Case-sensitive case:yes MAX_NODES
"exact phrase" Exact match "permission denied"
/regex/ Regular expression /cluster.*limit.*\d+/
OR Boolean OR ClusterManager OR ClusterService
NOT Boolean NOT ClusterCreate NOT test
count:N Max results count:20
repo:org/name@branch Specific branch repo:databricks-eng/universe@main

Common Investigation Patterns

"Why can't customers do X?"

  1. Search for the error message they see
  2. Find the validation/check that produces it
  3. Trace the condition — is it a hard limit? feature flag? permission check?
  4. Check if there's a feature flag that gates it
  5. Look for related config constants or limits

"How does service/feature X work?"

  1. Search for the service name in proto definitions (the API contract)
  2. Find the main handler/controller class
  3. Read the core logic, following key method calls
  4. Check what other services it calls (look for RPC/HTTP client usage)
  5. Identify the data flow: request → validation → processing → response

"What are the limits/constraints of X?"

  1. Search for constants: MAX_, LIMIT_, DEFAULT_
  2. Search for validation methods related to the feature
  3. Check config files and feature flags
  4. Look for error messages about limits being exceeded

"What changed recently in X?"

  1. Use type:diff search to find recent changes
  2. Use type:commit to search commit messages
  3. Focus on the relevant directory/service

"How do services X and Y interact?"

  1. Find proto definitions for both services
  2. Search for client/stub usage of one service within the other
  3. Look for shared proto messages or common dependencies
  4. Check BUILD.bazel deps to understand the dependency graph

Tips

  • Proto files (*.proto) are the best starting point for understanding any API — they define the contract.
  • Feature flags in feature-flag/ are Jsonnet files. Search there to understand what's gated.
  • If a search returns too many results, add -file:test -file:mock -file:fake to exclude test infrastructure.
  • Services in universe typically follow a pattern: proto/ defines the API, a top-level directory contains the implementation, and BUILD.bazel files show dependencies.
  • When tracing Scala code, look for extends ConsoleLogging and with clauses to understand mixins.
  • For Spark/runtime questions, start in the runtime repo. For everything else, start in universe.
  • Always include count:N in searches to control result volume. Start with count:10, increase if needed.
  • Use sg-read with line ranges when files are large — reading 100 lines at a time keeps context manageable.
Install via CLI
npx skills add https://github.com/sllynn/static --skill sourcegraph-explorer
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator