name: sourcegraph-explorer description: "Search and explore Databricks source code via Sourcegraph to answer questions about how the platform works, why things are implemented a certain way, what the limitations are, and how services interact. Use when the user asks about Databricks internals, platform behavior, service architecture, error causes, or feature limitations."
Sourcegraph Code Explorer
Answer questions about the Databricks platform by searching and reading source code through Sourcegraph.
First-time setup check
Before doing anything, verify the tools are available:
which sg-search sg-read sg-cookie-refresh
If any are missing, tell the user:
The Sourcegraph tools aren't set up yet. Run
/sourcegraph-setupfirst to install them.
Then stop — do not proceed until setup is complete.
Tools
sg-search — search code
sg-search "query string"
Returns JSON lines: {"repo": "...", "path": "...", "line": N, "text": "..."}.
Summary stats (match count, duration) go to stderr.
sg-read — read a file
sg-read <repo> <filepath> [start_line] [end_line]
Returns file content with line numbers. Use line ranges for large files.
Examples:
sg-read databricks-eng/universe proto/clusters/v1/clusters.proto
sg-read databricks-eng/runtime sql/core/src/main/scala/SomeFile.scala 50 120
sg-cookie-refresh — refresh authentication
sg-cookie-refresh # auto-detect browser (tries Arc, Chrome, Safari)
sg-cookie-refresh chrome # force a specific browser
Extracts Sourcegraph session cookies directly from the browser's cookie database and writes them to /tmp/sg_cookie.txt. Validates they work before finishing.
Authentication
These scripts authenticate via browser session cookies (Sourcegraph access tokens are admin-disabled). The cookie file at /tmp/sg_cookie.txt is managed by sg-cookie-refresh.
If a search returns an error, empty results, or an HTML redirect, the cookie has expired:
sg-cookie-refresh
If that reports EXPIRED, the user needs to open https://sourcegraph.prod.databricks-corp.com in their browser and log in, then re-run the script.
Repositories
| Repo | Contents | Filter |
|---|---|---|
| databricks-eng/universe | Services, APIs, frontend, infrastructure, proto definitions, feature flags, BUILD configs | repo:databricks-eng/universe |
| databricks-eng/runtime | Spark runtime (DBR), cluster components, execution engine | repo:databricks-eng/runtime |
Search both by default. Narrow to one when the question clearly belongs to a specific repo.
Universe structure (key directories)
proto/— Protobuf service and message definitions (the API contracts)feature-flag/— Feature flag configurations (Jsonnet files)webapp/web,accounts-ui/web— Frontend applicationscommon/— Shared librariesaccess-control*,auth*— Authorization and authenticationapi-server/,api-client/— API layercluster-*— Cluster managementbilling*— Billing and metering- Service directories are generally named after the service (e.g.,
sql-endpoint/,model-serving/)
Search Strategy
Follow this iterative loop — the same grep-read-grep-read cycle that works locally, but via Sourcegraph.
Step 1: Find entry points
Start broad to locate where the relevant code lives.
sg-search "feature flag evaluation repo:databricks-eng/universe count:10"
For error messages or user-facing strings, search the exact text:
sg-search '"workspace limit exceeded" repo:databricks-eng/universe'
For API endpoints, search the path:
sg-search '"/api/2.0/clusters/create" repo:databricks-eng/universe'
Step 2: Narrow with filters
Once you know the general area, add filters to cut noise:
- Exclude tests:
-file:test -file:Test -file:spec - Language:
lang:scala,lang:python,lang:java,lang:go,lang:rust,lang:typescript - File path:
file:\.scala$,file:proto/,file:src/main - Directory:
file:sql-endpoint/to scope to a service
Example progression:
sg-search "ClusterCreateRequest repo:databricks-eng/universe count:10"
sg-search "ClusterCreateRequest repo:databricks-eng/universe lang:scala -file:test count:10"
sg-search "ClusterCreateRequest repo:databricks-eng/universe file:proto/ count:10"
Step 3: Read the code
When you find a relevant file, read it to understand context:
sg-read databricks-eng/universe path/to/File.scala
For large files, use line ranges (read ~100 lines around the match):
sg-read databricks-eng/universe path/to/File.scala 50 150
Step 4: Follow the trail
From what you've read, identify the next thing to search for:
- A function is called → search for its definition:
"def functionName" lang:scala - A class is used → search for its definition and usages
- A proto message is referenced → search
file:proto/for its definition - A feature flag is checked → search
file:feature-flag/for its configuration - A config constant is used → search for where it's defined:
"val CONSTANT_NAME" - An error is thrown → trace back to what condition triggers it
- A trait/interface is extended → search for
"extends TraitName"or"with TraitName"
Step 5: Check related artifacts
Depending on the question, also search for:
- Proto definitions:
file:proto/ MessageName— the API contract - Feature flags:
file:feature-flag/ flagName— whether a feature is gated - BUILD files:
file:BUILD.bazel serviceName— what depends on what - Config/limits: search for constants, env vars, or config keys
- Error messages: search the exact user-facing error string
Step 6: Synthesize
After 2-5 iterations of search-read-follow, present your answer with:
- Direct answer to the question
- Code references — cite specific files and line numbers (format:
repo/path/to/file.ext:L123) - The chain of reasoning — briefly explain how you traced through the code
- Caveats — note if you couldn't find definitive proof, if the behavior might be behind a feature flag, or if there are multiple code paths
Query Syntax Reference
| Syntax | Purpose | Example |
|---|---|---|
repo:org/name |
Filter to repository | repo:databricks-eng/universe |
file:path |
Filter to file path (regex) | file:\.scala$ |
-file:path |
Exclude file path | -file:test |
lang:name |
Filter by language | lang:scala |
type:symbol |
Search symbols only | type:symbol ClusterManager |
type:diff |
Search diffs/changes | type:diff removed feature |
type:commit |
Search commit messages | type:commit "fix cluster limit" |
case:yes |
Case-sensitive | case:yes MAX_NODES |
"exact phrase" |
Exact match | "permission denied" |
/regex/ |
Regular expression | /cluster.*limit.*\d+/ |
OR |
Boolean OR | ClusterManager OR ClusterService |
NOT |
Boolean NOT | ClusterCreate NOT test |
count:N |
Max results | count:20 |
repo:org/name@branch |
Specific branch | repo:databricks-eng/universe@main |
Common Investigation Patterns
"Why can't customers do X?"
- Search for the error message they see
- Find the validation/check that produces it
- Trace the condition — is it a hard limit? feature flag? permission check?
- Check if there's a feature flag that gates it
- Look for related config constants or limits
"How does service/feature X work?"
- Search for the service name in proto definitions (the API contract)
- Find the main handler/controller class
- Read the core logic, following key method calls
- Check what other services it calls (look for RPC/HTTP client usage)
- Identify the data flow: request → validation → processing → response
"What are the limits/constraints of X?"
- Search for constants:
MAX_,LIMIT_,DEFAULT_ - Search for validation methods related to the feature
- Check config files and feature flags
- Look for error messages about limits being exceeded
"What changed recently in X?"
- Use
type:diffsearch to find recent changes - Use
type:committo search commit messages - Focus on the relevant directory/service
"How do services X and Y interact?"
- Find proto definitions for both services
- Search for client/stub usage of one service within the other
- Look for shared proto messages or common dependencies
- Check BUILD.bazel deps to understand the dependency graph
Tips
- Proto files (
*.proto) are the best starting point for understanding any API — they define the contract. - Feature flags in
feature-flag/are Jsonnet files. Search there to understand what's gated. - If a search returns too many results, add
-file:test -file:mock -file:faketo exclude test infrastructure. - Services in universe typically follow a pattern:
proto/defines the API, a top-level directory contains the implementation, andBUILD.bazelfiles show dependencies. - When tracing Scala code, look for
extends ConsoleLoggingandwithclauses to understand mixins. - For Spark/runtime questions, start in the
runtimerepo. For everything else, start inuniverse. - Always include
count:Nin searches to control result volume. Start withcount:10, increase if needed. - Use
sg-readwith line ranges when files are large — reading 100 lines at a time keeps context manageable.