name: kedro-to-zenml-migration
description: >-
Migrate Kedro pipelines and projects to idiomatic ZenML pipelines. Handles
concept mapping (node->step, Pipeline->pipeline, Data Catalog->explicit
boundary steps plus artifacts, params:->typed parameters), catalog analysis,
code translation, hooks/runners/deployment mapping, and flags unsupported
patterns (transcoding, dataset lifecycle hooks, namespace remapping,
SharedMemoryDataset, slicing semantics) for human review. Use this skill
whenever the user mentions Kedro migration, converting a Kedro project to
ZenML, porting Kedro pipelines, replacing Kedro orchestration or deployment
plugins with ZenML, or asks how a Kedro concept maps to ZenML -- even if they
do not explicitly say "migrate". Also use when the user pastes catalog.yml,
parameters.yml, pipeline_registry.py, node code, hook code, or describes
a workflow using Kedro terminology such as node, pipeline, Data Catalog,
params:, namespace, modular pipeline, runner, MemoryDataset, or
transcoding in a ZenML context. If the user just asks a quick conceptual
question ("what is the ZenML equivalent of MemoryDataset?"), answer it
directly from the concept map -- no need to run the full migration workflow.
Migrate Kedro to ZenML
This skill translates Kedro projects into idiomatic ZenML pipelines. It handles the full migration workflow: analyze the Kedro project, classify each pattern as direct / approximate / absent, translate what maps cleanly, flag what needs redesign, and produce a working ZenML project plus a clear migration report.
How migration works at a high level
Kedro and ZenML are both Python-first workflow systems, so the business logic inside node functions often survives migration surprisingly well. The hard part is not the node code. The hard part is the contract around the node code.
Kedro uses the Data Catalog as the central registry of datasets, storage details, credentials, versioning, and sometimes representation tricks such as transcoding. ZenML does not have a DataCatalog equivalent. ZenML treats internal handoffs as typed artifacts and expects external reads and writes to be made explicit in step code, materializers, stack settings, and secrets.
So this migration is not a rename exercise. It is mostly a careful rewrite of:
- Boundary handling -- catalog-driven reads and writes become explicit loader/exporter steps
- Data flow -- string dataset names become typed artifact edges
- Configuration and auth --
parameters.ymlandcredentials.ymlmove into ZenML parameters, YAML config, secrets, and stack settings - Operational semantics -- runners, slicing, hooks, namespaces, and deployment plugins must be checked for behavior drift
The three mapping types
Every Kedro concept falls into one of these categories:
| Type | Meaning | Action |
|---|---|---|
| Direct | Clean 1:1 mapping exists | Translate automatically |
| Approximate | Similar intent exists but semantics differ | Translate with caveats noted in the migration report |
| Absent | No ZenML equivalent exists | Flag for human review with redesign suggestions |
See references/concept-map.md for the full tables.
The Migration Workflow
Phase 1: Receive and Analyze the Kedro Project
Ask the user for the Kedro project files. Read them before writing any code. Prefer this order because it reveals the real contract of the system:
conf/base/catalog.ymland any environment-specific catalog filesconf/base/parameters.ymland related parameter filesconf/local/credentials.ymlreferences if present- node files (
nodes.pyand related modules) - pipeline files (
pipeline.py, modular pipeline factories) pipeline_registry.pysettings.py- custom dataset classes
- hook implementations
- runner or deployment usage (
kedro run, Airflow/Kubeflow/Vertex/Docker plugins, CI entrypoints)
For each project, inventory the following before deciding how to migrate it:
- Catalog structure -- Which datasets are external inputs, external outputs, intermediates, free runtime datasets, versioned datasets, credential-bound datasets, transcoded aliases, partitioned datasets, incremental datasets, or custom datasets?
- Node graph -- Which nodes are pure transforms and which nodes are really IO boundaries hidden behind catalog entries?
- Parameters -- Which inputs come from
params:references? Are they simple scalars, nested config objects, or runtime overrides? - Namespaces and modular pipelines -- Is reuse relying on automatic remapping or namespace-prefixed dataset names?
- Hooks -- Any
before_node_run,after_node_run,on_node_error, dataset lifecycle hooks, command hooks, or catalog hooks? - Execution habits -- Does the team rely on
--from-nodes,--to-nodes,--tags,--from-inputs,--to-outputs, or--only-missing-outputs? - Runners --
SequentialRunner,ParallelRunner,ThreadRunner,SharedMemoryDataset, or other memory-sensitive behavior? - Deployment intent -- Is Kedro being exported to Airflow, Kubeflow, Vertex AI, Docker, or paired with
kedro-mlflow/ Kedro-Viz?
Phase 2: Classify and Plan
After Phase 1, classify everything as direct / approximate / absent using references/concept-map.md and references/gaps-and-flags.md.
Quick classification guide
Direct or nearly direct translations (usually safe to generate automatically):
- Plain node function ->
@step - Multiple inputs / outputs -> step args + tuple returns with
Annotated[...] Pipeline->@pipelineparameters.yml+params:-> typed pipeline / step parameters- Registry-organized project -> normal Python module layout
Approximate translations (generate, but explain what changed):
pandas.CSVDataset,ParquetDataset,JSONDataset,ExcelDataset-> explicit loader/exporter boundary stepsversioned: true-> ZenML artifact versioning, explicit external version handling, or both- custom datasets -> loader/exporter logic and sometimes custom materializers
ParallelRunner-> orchestrator-managed parallelism with isolated stepskedro-mlflow-> ZenML experiment trackerkedro-docker->DockerSettingsand stack-driven containerization- deployment plugins -> ZenML orchestrators and stack components
Absent / redesign required (must be flagged):
DataCatalogas a global dataset registry- transcoding (
dataset@pandas,dataset@spark) - automatic namespace/remapping semantics
SharedMemoryDataset/SharedMemoryDataCatalog- dataset lifecycle hooks
- slicing semantics (
--from-nodes,--to-nodes,--only-missing-outputs, etc.) - dataset factories and catch-all pattern resolution
Present the migration plan before generating code
Before writing code, summarize your findings for the user:
"Here's what I found in your Kedro project:
- Direct translations (will migrate cleanly): [list]
- Approximate translations (will work but with caveats): [list]
- Needs redesign (cannot be auto-migrated safely): [list]
The main migration theme is: [for example, 'catalog-driven IO becomes explicit boundary steps']. Shall I proceed with the migration?"
If there are HIGH-severity flags, explain each one concretely:
- what the Kedro project currently does
- why ZenML cannot preserve that behavior automatically
- what the safest redesign approach looks like
Phase 3: Generate ZenML Code
Translate the Kedro project into a ZenML project. Follow these conventions strictly.
Project structure
Every migrated project MUST use this layout:
migrated_pipeline/
├── steps/ # One file per step
│ ├── load_customers.py
│ ├── transform_features.py
│ └── export_report.py
├── pipelines/
│ └── my_pipeline.py
├── materializers/ # Only when truly needed
├── configs/
│ ├── dev.yaml
│ └── prod.yaml
├── run.py # argparse, not click
├── README.md
└── pyproject.toml
Key rules:
- One step per file in
steps/ - Keep the pipeline definition separate from execution
run.pyusesargparsepyproject.tomlshould userequires-python = ">=3.12"andzenml>=0.94.1- Always generate
configs/dev.yamlandconfigs/prod.yaml - Always generate a
README.md - Run
zenml initat the project root
Core translation rules
See references/code-patterns.md for the concrete side-by-side examples. Use these rules consistently:
Pure nodes become steps
- Move the node body into a
@step - Add explicit type hints to all inputs and outputs
- Use
Annotated[...]when stable output names matter
- Move the node body into a
Catalog-driven IO becomes boundary steps
- File reads, table reads, report exports, and service writes should be explicit
- Do not keep catalog-name indirection for internal edges
- Internal handoffs should usually be artifacts, not persisted files
params:becomes explicit typed parameters- Convert
params:thresholdinto a pipeline or step parameter - Put defaults and environment-specific overrides in YAML config
- Do not recreate an implicit global parameter object if step signatures can stay explicit
- Convert
Versioning must be decided, not assumed
- If Kedro used versioned files for reproducibility only, ZenML artifact versioning may be enough
- If downstream systems rely on concrete versioned paths, keep explicit exporter logic
- If both matter, implement both
Hooks only map partially
@step(on_success=...)and@step(on_failure=...)are only partial substitutes- Dataset lifecycle hooks almost always need to be rebuilt at explicit boundaries
- Do not claim a missing before-hook exists when it does not
Namespaces and modular pipelines become explicit composition
- Reuse the same step graph via helper functions or wrapper pipelines
- Use explicit invocation IDs and parameters
- Do not silently mimic Kedro namespace behavior with simple string prefixes
Runners and deployment plugins become stack design
- Translate runner expectations into orchestrator choice, resource settings, and step operators
- Translate Airflow/Kubeflow/Vertex/Docker plugin intent into stack configuration
Comment style in generated code
Keep migration comments short and actionable:
- Use
# Migration note:for brief inline caveats - Use
# TODO(migration):for required manual follow-up
Do not hide major semantic differences in comments alone. They must also appear in the migration report.
Handling approximate translations
When the migration is approximate, explain the difference right at the point of use:
@step
def load_orders(path: str) -> pd.DataFrame:
# Migration note: Kedro previously loaded this dataset through catalog.yml
# with credentials and versioning managed outside the node code. In ZenML
# that boundary is explicit here and configured via parameters + secrets.
return pd.read_csv(path)
Handling absent patterns
Never silently approximate patterns with no real ZenML equivalent. Instead:
- Add a clear
# TODO(migration):comment in generated code - Record it in
MIGRATION_REPORT.md - Offer a redesign approach
# TODO(migration): UNSUPPORTED -- this Kedro project relied on transcoding
# (`features@pandas` and `features@spark`) for the same logical dataset.
# ZenML has no equivalent hidden representation switch. Pick one canonical
# artifact representation and make conversions explicit in dedicated steps.
Phase 4: Produce the Migration Report
After generating the ZenML project, produce a MIGRATION_REPORT.md in the project root:
# Migration Report: [Kedro Project] -> [ZenML Pipeline]
## Summary
- **Source**: Kedro project `[project_name]`
- **Target**: ZenML pipeline `[pipeline_name]`
- **Nodes migrated**: X direct, Y approximate, Z flagged
- **Catalog entries analyzed**: N
## Direct Translations
| Kedro Concept | ZenML Equivalent | Notes |
|---|---|---|
| node(clean_orders) | steps/clean_orders.py | Pure transform |
## Approximate Translations
| Kedro Pattern | ZenML Equivalent | What Changed |
|---|---|---|
| pandas.CSVDataset | explicit loader/exporter steps | IO boundary is now explicit |
| versioned: true | artifact versioning + optional external export versioning | File-path semantics may differ |
## Flagged for Review
| Kedro Pattern | Severity | Issue | Suggested Redesign |
|---|---|---|---|
| transcoding | HIGH | No hidden representation switching in ZenML | Use explicit conversion steps |
| before_dataset_loaded hook | HIGH | No dataset lifecycle hook equivalent | Move logic into loader step |
## Catalog Translation Summary
| Catalog Entry | Original Type | Migration Target | Notes |
|---|---|---|---|
| raw_orders | pandas.CSVDataset | load_orders step | external input |
| model | pickle.PickleDataset | artifact + exporter step | check path/version semantics |
## Configuration Migration Summary
- `parameters.yml` keys moved to: [pipeline config path(s)]
- Runtime overrides previously passed through: [Kedro mechanism]
- New ZenML config entrypoints: [dev/prod yaml files]
## Credential / Auth Migration Summary
- `credentials.yml` entries moved to ZenML secrets / env vars / service connectors
- Any manual setup still required: [list]
## Namespace / Composition Summary
- Which modular pipelines were preserved
- Which namespaces/remappings required explicit wrapper logic
## Runner / Deployment Migration Summary
- Original runner(s): [SequentialRunner / ParallelRunner / ThreadRunner]
- Original plugins: [kedro-airflow / kedro-kubeflow / kedro-vertexai / kedro-docker / kedro-mlflow]
- New ZenML stack assumptions: [orchestrator / step operator / tracking setup]
## Limitations and Key Differences
[Put the most important behavior differences here BEFORE the benefits section]
## What's NOT Migrated
[List unsupported or intentionally deferred patterns]
## What You Get for Free After Migration
- Artifact versioning and lineage
- Step caching
- Stack abstraction
- Stronger typed artifact flow
- Optional dynamic pipelines and cross-pipeline artifact reuse
- Better alignment with experiment trackers, step operators, and deployment workflows
## Recommended Next Steps
1. Run `zenml-quick-wins`
2. Install the ZenML docs MCP server
3. Use `zenml-pipeline-authoring` for deeper Docker / materializer / YAML / deployment work
4. Follow up on every HIGH-severity flag before production use
Phase 5: Suggest Next Steps
After migration is complete, always include a Recommended Next Steps section in the migration report and communicate it to the user directly.
1. Run the zenml-quick-wins skill
Always suggest this first:
"Now that the migration is done, I recommend running the
zenml-quick-winsskill to add metadata logging, experiment tracking, alerters, and other production features."
2. Include documentation links for flagged patterns
For every flagged pattern, include relevant ZenML documentation links when they help:
- YAML configuration:
https://docs.zenml.io/how-to/steps-pipelines/yaml-configuration - Secrets:
https://docs.zenml.io/how-to/secrets/secrets - Dynamic pipelines:
https://docs.zenml.io/how-to/steps-pipelines/dynamic-pipelines - Containerization:
https://docs.zenml.io/how-to/containerization/containerization - Step operators:
https://docs.zenml.io/stacks/stack-components/step-operators/custom
3. Suggest installing the ZenML docs MCP server
"For easier access to ZenML docs while you keep iterating, you can install the ZenML docs MCP server:
claude mcp add zenmldocs --transport http https://docs.zenml.io/~gitbook/mcp"
4. Community support for unsupported patterns
When there are HIGH-severity flags, offer to help the user ask the ZenML community for guidance. When there are 2+ HIGH-severity flags, generate a ready-to-send Slack message for zenml.io/slack that includes:
- what Kedro project is being migrated
- the exact unsupported patterns
- what workarounds were suggested
- what behavior might change if the workaround is used
5. Open GitHub issues for genuine product gaps
When the migration reveals a real missing ZenML capability -- not merely a different design, but a genuine gap that multiple users would benefit from -- offer to open a GitHub issue on zenml-io/zenml.
6. Suggest /simplify
After migration, always suggest running /simplify on the generated code so migration comments, wrappers, and duplication can be cleaned up.
7. Further customization via zenml-pipeline-authoring
Recommend zenml-pipeline-authoring when the user needs deeper help with:
DockerSettings- YAML configuration
- custom materializers
- step operators
- deployments
- multi-environment architecture
Important Behavioral Differences to Communicate
Always mention the relevant behavior differences in the migration report.
Data Catalog != Artifact Store
Kedro uses a named dataset registry that hides storage details from node code. ZenML uses typed artifacts flowing between steps. That means the migration often adds explicit loader/exporter steps at the edges of the graph.
MemoryDataset != "stay in memory"
Kedro can keep intermediates ephemeral. ZenML artifacts are normally persisted and versioned. If ephemerality mattered for correctness or cost, flag it.
Namespaces != simple prefixes
Kedro namespaces and remapping change how datasets and parameters resolve. ZenML reuse is explicit Python composition. These are related ideas, not the same feature.
Slicing != caching
Kedro's CLI slicing changes which part of a graph runs. ZenML caching reuses outputs when inputs, code, and settings match. Do not present them as equivalents.
Runners != orchestrators
Kedro runners are not a direct code-level concept in ZenML. Translate them into stack design, step isolation assumptions, and resource settings.
Anti-Patterns to Avoid
- Do not recreate a fake
DataCatalogabstraction on top of ZenML unless the user explicitly wants a transitional adapter and understands the tradeoff. - Do not convert every Kedro dataset into a custom materializer. Many Kedro datasets are storage adapters, not artifact types.
- Do not silently flatten transcoding into one representation.
- Do not claim dataset lifecycle hooks or namespace remapping "basically work the same".
- Do not force a full repo layout rewrite before behavior is correct.
Additional References
- references/concept-map.md -- full Kedro -> ZenML concept map
- references/code-patterns.md -- side-by-side translation examples
- references/gaps-and-flags.md -- must-flag patterns, behavioral differences, and decision tree