name: databricks-python-imports description: Patterns for sharing code between Databricks notebooks using pure Python files and standard imports. Covers Asset Bundle path setup (rsplit canonical pattern), notebook-to-module conversion, import patterns vs %run magic commands, job submission context vs notebook context, MLflow model packaging path requirements, and troubleshooting ModuleNotFoundError. Use when creating shared modules, deploying jobs that import local code, or packaging MLflow models with local dependencies. metadata: author: prashanth subrahmanyam version: "1.1" domain: infrastructure role: shared used_by_stages: [1, 2, 3, 4, 5, 6, 7, 8, 9] last_verified: "2026-06-02" volatility: medium clients: [ide_cli, genie_code] # deploy via databricks-asset-bundles (the spine); Genie detail via genie-code-environment deploy_verb: "bundle deploy --target dev" deploy_note: "import/path patterns are client-agnostic; relative ../.. links are repo-relative skill cross-refs (RULE_6: portable, retained)" coverage: all_stages upstream_sources: - name: "ai-dev-kit" repo: "databricks-solutions/ai-dev-kit" paths: - "databricks-skills/databricks-config/SKILL.md" relationship: "reference" last_synced: "2026-02-19" sync_commit: "97a3637"
Databricks Python Imports and Code Sharing
CRITICAL: Canonical Path Resolution
The ONLY acceptable pattern for computing the Asset Bundle root is rsplit:
_bundle_root = "/Workspace" + str(_notebook_path).rsplit('/src/', 1)[0]
NEVER use project-specific .replace() calls. This is the #1 failure mode observed in production.
| Pattern | Verdict |
|---|---|
rsplit('/src/', 1)[0] |
CORRECT -- works for any project |
.replace("/src/my_project", "") |
WRONG -- breaks when project name changes |
.replace("/src/booking_app_semantic", "") |
WRONG -- hardcoded to one project |
Shared Helper: _notebook_paths.py (PREFERRED)
New scripts must use the shared helper — do not re-implement bundle-root resolution inline.
- Source of truth:
assets/_notebook_paths.py - Exports:
resolve_bundle_root,ensure_bundle_root_on_path,bundle_path,fail_loud - Scaffold placement: copy to
src/common/_notebook_paths.pyin the generated project so every notebook can import it withfrom src.common._notebook_paths import ....
Canonical notebook header (new scripts)
# Databricks notebook source
import sys, os
from pathlib import Path
# Minimal bootstrap: prepend bundle root to sys.path so we can import the helper.
try:
_nb = (
dbutils.notebook.entry_point.getDbutils()
.notebook().getContext().notebookPath().get()
)
_root = "/Workspace" + str(_nb).rsplit("/src/", 1)[0]
if _root not in sys.path:
sys.path.insert(0, _root)
except Exception:
pass # Local execution
from src.common._notebook_paths import ensure_bundle_root_on_path, bundle_path, fail_loud
BUNDLE_ROOT = ensure_bundle_root_on_path(verbose=True)
# ... the rest of your notebook follows ...
Fail-loud rule (CRITICAL)
Never use sys.exit(0) to report a failure. Exit code 0 is success — Databricks Jobs will mark the run as green and silently propagate broken deploys to downstream tasks.
| Pattern | Verdict |
|---|---|
raise RuntimeError("…") / fail_loud("…") |
CORRECT — job marked FAILED, traceback visible |
sys.exit(0) after printing an error |
WRONG — run shows green, failure hidden |
sys.exit(1) |
Acceptable for top-level CLI; prefer raise in notebooks |
__file__ is undefined in notebook cells
Do not use __file__ to locate asset files inside a Databricks notebook — it raises NameError once the code runs in the notebook context. Always use bundle_path("src", "semantic", "metric_views", "revenue.yaml") (or the raw rsplit pattern) instead.
When NOT to Use This Skill
Skip this skill entirely when:
- Single-notebook scenarios with no cross-notebook imports
- Notebooks that don't share code with other notebooks or pure Python modules
- Pure SQL notebooks or
%run-only workflows withoutrestartPython()
Core Principle: Pure Python Files for Importable Code
Key Rule: To share code between Databricks notebooks using standard Python imports, the shared code must be a pure Python file (.py), not a Databricks notebook.
Reference: Share code between Databricks notebooks
⚠️ CRITICAL: Asset Bundle Path Setup
When deploying notebooks via Databricks Asset Bundles, you MUST add a sys.path setup block to enable imports from other folders. Without this, you'll get ModuleNotFoundError: No module named 'src'.
Required Path Setup Pattern
Add this block immediately after # Databricks notebook source:
# Databricks notebook source
# ===========================================================================
# PATH SETUP FOR ASSET BUNDLE IMPORTS
# ===========================================================================
# This enables imports from src.ml.config and src.ml.utils when deployed
# via Databricks Asset Bundles. The bundle root is computed dynamically.
# Reference: https://docs.databricks.com/aws/en/notebooks/share-code
import sys
import os
try:
# Get current notebook path and compute bundle root
_notebook_path = dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
_bundle_root = "/Workspace" + str(_notebook_path).rsplit('/src/', 1)[0]
if _bundle_root not in sys.path:
sys.path.insert(0, _bundle_root)
print(f"✓ Added bundle root to sys.path: {_bundle_root}")
except Exception as e:
print(f"⚠ Path setup skipped (local execution): {e}")
# ===========================================================================
"""
Your notebook docstring here...
"""
# COMMAND ----------
# Now imports work!
from src.ml.config.feature_registry import FeatureRegistry
from src.ml.utils.training_base import setup_training_environment
Why This Is Needed
- Asset Bundles deploy to
/Workspace/.bundle/<target>/files/ - The Python path doesn't include the bundle root by default
- This setup dynamically computes the bundle root from the notebook path
Script to Add Path Setup
Use scripts/add_path_setup_to_notebooks.py to batch-add this setup to all notebooks:
python3 scripts/add_path_setup_to_notebooks.py
File Type Identification
Pure Python File (✅ Importable)
"""
Module documentation
This file can be imported using standard Python imports.
"""
from databricks.sdk import WorkspaceClient
import pyspark.sql.types as T
def get_configuration():
"""Shared function"""
return {...}
Characteristics:
- ✅ No special Databricks headers
- ✅ Standard Python module structure
- ✅ Can be imported with
from module import function - ✅ Works after
dbutils.library.restartPython()
Databricks Notebook (❌ Not Importable)
# Databricks notebook source
"""
Module documentation
This file CANNOT be imported using standard Python imports.
"""
from databricks.sdk import WorkspaceClient
import pyspark.sql.types as T
def get_configuration():
"""Shared function"""
return {...}
Characteristics:
- ❌ Has
# Databricks notebook sourceheader - ❌ Cannot be imported after
restartPython() - ❌ Must use
%runmagic command (doesn't persist after restart) - ✅ Can be executed as a job/task
Pattern Recognition
When You See Import Errors After restartPython()
# Notebook with restartPython()
%pip install --upgrade "databricks-sdk>=0.28.0" --quiet
dbutils.library.restartPython()
# Databricks notebook source
from monitor_configs import get_all_monitor_configs # ❌ ModuleNotFoundError
# This fails if monitor_configs.py is a Databricks notebook!
Checklist:
- ✅ Check if the module file has
# Databricks notebook sourceheader - ✅ If present, remove it to convert to pure Python file
- ✅ Test import - should work with standard Python import
- ❌ Don't create complex workarounds (code duplication, sys.path manipulation)
Conversion Pattern
To convert a Databricks notebook to an importable pure Python file, remove the # Databricks notebook source header line. That is the only change required. See references/import-patterns-and-examples.md for a detailed before/after walkthrough.
Import Patterns
✅ CORRECT: Standard Python Import
# notebook.py (Databricks notebook)
%pip install --upgrade "databricks-sdk>=0.28.0" --quiet
# Databricks notebook source
dbutils.library.restartPython()
# Databricks notebook source
# ✅ Works if config_module.py is a pure Python file
from config_module import get_configuration
from databricks.sdk import WorkspaceClient
...
def main():
config = get_configuration() # ✅ Available
...
Requirements:
config_module.pymust be a pure Python file (no notebook header)- Place import after
restartPython()block - Use standard Python import syntax
❌ WRONG: Complex Workarounds
# ❌ DON'T: Use %run (doesn't work after restartPython() in Asset Bundles)
%run ./config_module
# ❌ DON'T: Manipulate sys.path
import sys
sys.path.insert(0, "/some/path")
# ❌ DON'T: Duplicate code
def get_configuration(): # Duplicated from another file
return {...}
# ❌ DON'T: Use exec() or eval()
exec(open("config_module.py").read())
Why These Fail:
%rundoesn't persist afterrestartPython()in deployed .py filessys.pathmanipulation doesn't help if file is a notebook- Code duplication creates maintenance burden
exec()is a security risk and hard to debug
Job Submission Context vs Notebook Context
When code runs as a job submission (not interactive notebook), key differences apply:
- CWD is
/, not the notebook directory -- relative file paths fail - MLflow
log_model()copies code to a temp directory -- relative file references (e.g.,model_config="agent-config.yaml") will fail unless the file is co-located or referenced with an absolute path mlflow[databricks]is required on Azure (not justmlflow) -- without it,mlflow.register_model()raisesModuleNotFoundError: azure.core
See references/job-context-guide.md for full patterns and examples.
Use Cases and Examples
Common patterns: shared configuration modules (monitor configs, DQ rules), utility functions used across layers (Bronze/Silver/Gold), and helper functions (surrogate keys, transformations). See references/import-patterns-and-examples.md for detailed code examples and a decision table for when to use pure Python files vs notebooks vs %run.
Common Mistakes and Troubleshooting
See references/troubleshooting.md for diagnosis steps and fixes for:
ModuleNotFoundErrorafterrestartPython()(notebook header issue)NameErrorafter%runandrestartPython()(use standard import)ModuleNotFoundError: azure.coreon Azure (usemlflow[databricks])FileNotFoundErrorformodel_configYAML in job context (use absolute paths)- Project-specific
.replace()anti-pattern (usersplit)
Validation Checklist
When creating shared code:
- File is pure Python (no
# Databricks notebook sourceheader) - Has proper docstring explaining purpose
- Functions are well-documented
- Can be imported with standard
importorfrom ... import ... - Works after
restartPython()if needed - Used in at least 2 notebooks (if not, consider inlining)
When importing shared code:
- Import statement after
restartPython()block - Using standard Python import (not
%run) - Source file is pure Python file
- No sys.path manipulation needed
- No code duplication
When using Asset Bundle path setup:
- Uses
rsplit('/src/', 1)[0]-- NEVER.replace()with project-specific strings - Path setup block is immediately after
# Databricks notebook source - On Azure with MLflow: using
mlflow[databricks], not justmlflow
References
- Share code between Databricks notebooks - Official documentation
- Work with Python and R modules
- dbutils.library.restartPython()
Related Patterns
- Databricks Asset Bundles Configuration - Deployment patterns
- Lakehouse Monitoring Patterns - Monitor configuration sharing
- DLT Expectations Patterns - DQ rules sharing
Last Updated: April 16, 2026
Pattern Origin: Production issue resolution - update_monitors job
Key Lesson: Always use rsplit('/src/', 1)[0] for path resolution; always check if shared code is pure Python file vs. Databricks notebook