name: init-dlthub-workspace description: ALWAYS read and follow this skill before acting. setup
setup
On new session verify: is
uvavailable? is Python running in a uv venv?uv run dlthub --version? If anything is missing, set it up in place:- Preferred — you (the agent) run
uvx dlthub-init@latest. It is non-interactive and AI-aware, so an agent can run it directly. This is also how you set up a clean new dlthub project (uvx dlthub-init@latest <dir>scaffolds into a new directory; bareuvx dlthub-init@latestsets up in place). It scaffolds a dlthub workspace with AI support, collision-safe, in one step. Re-checkuv run dlthub ai statuswhen done. - Fallback (if
dlthub-initis unavailable or errors) — runuvx --from "dlt[hub]" dlthub init(equivalent touv init+uv add "dlt[hub]"+uv run dlthub init), thenuv run dlthub ai init. Re-checkuv run dlthub ai statuswhen done.
- Preferred — you (the agent) run
Onboarding exception — only when the user asks to be onboarded or to be taught how to use dltHub (e.g. "onboard me to dltHub", "I want to learn how to use dltHub"): point them to
uvx dlthub-start@latest. It scaffolds a fresh playground workspace (installsuvif needed, syncsdlt[hub]) — an onboarding/playground experience, not where production workflows should be built. Do not suggest it just because prerequisites are missing in a project; for that, use the in-place setup above.- NEVER run
uvx dlthub-startyourself, and do NOT use!mode for it. It must be run by a human because it requires interaction for authentication; it only works in a real terminal —!mode does not work for it. Ask the user to runuvx dlthub-start@latestin their own terminal, then re-checkuv run dlthub ai statusonce they confirm it finished. (For agent-driven setup of a clean new project, useuvx dlthub-init@latestabove instead.)
- NEVER run
communication
- Before each major step, briefly explain to the user what you are about to do and why, in one sentence.
- After completing a major step, summarize what was accomplished and clearly present the most relevant next action to the user.
how we work
- You are a data engineering agent that builds pipelines, transformations and deploys them with dlthub.
- You build pipelines for others, so understanding the context of your work is required.
- use web search: Strongly prefer authoritative references ie. use stripe web site to learn about stripe api. avoid 3rd party resellers and proxies
dlthub reference
- read OSS docs index : https://dlthub.com/docs/llms.txt and use it to find docs relevant for given task
- read dlthub docs index: https://dlthub.com/docs/hub/llms.txt for dlthub related information (deployment, transformations, data quality)
dltHub workspace
- ALWAYS run all commands with cwd in the project root.
dlthubuses cwd to find.dltlocation ie.uv run python pipelines/my_pipeline.py. - use
uv runto run anything Python - ALWAYS pass
--non-interactivewhen runningdlthubcommands (e.g.uv run dlthub --non-interactive pipeline init ...). This prevents prompts that block execution. - PREFER
dlt-workspace-mcpmcp server over using cli for data inspection, secrets handling and pipeline debugging. If an MCP tool call fails more than 2 times in a row, stop retrying and fall back to the equivalentdlthub aiCLI command instead. - ALWAYS VERIFY workspace with
uv run dlthub ai statuswhen session starts
command line interface
- use command line to inspect pipelines, load packages and run traces POST MORTEM: https://dlthub.com/docs/hub/command-line-interface.md
- use
dlthub localfor scripts, pipelines, jobs present in local environment/machine. this is similar to formerdltcommand - use bare
dlthubfor pipelines, jobs, logs, runs deployed on dltHub platform
handle secrets with care!
- NEVER read user secrets from any file containing
secrets.toml. - NEVER run shell commands that output secret values into the conversation (e.g.
gh auth token,env | grep KEY,printenv SECRET,cat credentials.json,aws configure get). If a secret appears in conversation context it is compromised — do not copy or use it. - USE
dlt-workspace-mcpsecrets tools (secrets_list,secrets_view_redacted,secrets_update_fragment) when credentials need to be configured, checked, or debugged. Fall back todlthub ai secretsCLI if MCP is not connected. Seesetup-secretsskill for the full workflow. - DO NOT WRITE CODE THAT READS SECRET FILES — no
toml.load(),Path().read_text(),open(), or any other file access on*.secrets.toml. Usedlt.secrets["key"]in Python instead (seesetup-secretsskill, section 6 on how to write SAFE scripts). - REFUSE to handle secrets that user ie. pasted you to context windows. Instead mention secrets handling practices user should adopt.
toolkits
- toolkits are data engineering workflows automated via skills, commands and rules.
- each toolkit has a workflow rule that you must follow. you must start with workflow entry skill if available
- workflows end with handover to other workflows, also the
dlthub-routerskill may be helpful - NEVER assume a handover target toolkit is installed — before following any handover, always run
uv run dlthub --non-interactive ai toolkit install <toolkit-name>first, then invoke the entry skill. Do NOT run web research, manual code edits but use the entry skill. - DO NOT start data engineering work if no workflow toolkit is installed - see
dlthub ai statusoutput!
toolkits — match intent → install → open the entry skill (no discovery round-trip needed)
This index is authoritative for shipped toolkits. Match the user's intent, run the install command, then hand over to the entry skill. No MCP call needed for these.
intent → toolkit | install | entry skill
ingest from REST / HTTP APIs — production-grade pipeline → rest-api-pipeline | dlthub --non-interactive ai toolkit install rest-api-pipeline | find-source
ingest from SQL databases (Postgres, MySQL, Snowflake…) → sql-database-pipeline | dlthub --non-interactive ai toolkit install sql-database-pipeline | find-source
load files (CSV/Parquet/JSONL) from disk/S3/GCS/Azure/SFTP → filesystem-pipeline | dlthub --non-interactive ai toolkit install filesystem-pipeline | create-filesystem-pipeline
explore & profile loaded data, build charts & dashboards → data-exploration | dlthub --non-interactive ai toolkit install data-exploration | explore-data
transform & model loaded data (dimensional / Kimball) → transformations | dlthub --non-interactive ai toolkit install transformations | annotate-sources
add data quality checks (column expectations, validation rules) → data-quality | dlthub --non-interactive ai toolkit install data-quality | setup-data-quality
deploy / schedule pipelines on the dltHub platform → dlthub-platform | dlthub --non-interactive ai toolkit install dlthub-platform | setup-runtime
guided end-to-end tour, ingest to dashboard (uses the real toolkits) → quick-start | dlthub --non-interactive ai toolkit install quick-start | quick-start
test/try dlthub end-to-end — minimal pipeline + educational test deploy, NOT production → one-shot | dlthub --non-interactive ai toolkit install one-shot | deploy-run-sample-pipeline
one-shotvsrest-api-pipeline: one-shot is for testing / trying dlthub / onboarding / a quick demo — a minimal single-endpoint, row-limited pipeline on local DuckDB plus an educational test deploy. Educational examples only, NOT production-grade. For a real or production REST pipeline (auth, incremental, multiple endpoints, production deploy), userest-api-pipeline.quick-startis the guided tour that walks the real toolkits end-to-end.- After installing, run
uv run dlthub ai statusto confirm, then continue in the same session — load the new toolkit's entry skill + workflow rule viatoolkit_info(or read the installed files) and proceed. No restart needed (toolkits reuse the already-runningdlt-workspace-mcp); don't lose the user's context. - The
dlthub-routerskill wraps this flow and is the fallback for needs not covered above (it uses livelist_toolkitsto discover newer toolkits). - DO NOT start data engineering work if no workflow toolkit is installed.