name: create-rest-api-pipeline description: Create a dlt REST API pipeline. Use for the rest_api core source, or any generic REST/HTTP API source. Not for sql_database or filesystem sources. argument-hint: "[dlthub-pipeline-init-command]"
Create a rest api dlt pipeline
Create the simplest working dlt pipeline — single endpoint, no pagination or incremental loading — to get data flowing fast.
Requires a dlthub pipeline init command as the argument (e.g. dlthub pipeline init shopify_store duckdb).
If you don't have one yet, run find-source first to identify the right source.
The argument is the full dlthub pipeline init command to run (e.g. dlthub pipeline init shopify_store duckdb or dlthub pipeline init sql_database postgres).
Steps
1. Snapshot current folder
Run ls -la to see the current state before scaffolding.
2. Run dlthub pipeline init
dlthub pipeline init can be run multiple times in the same project — each run adds new files without overwriting existing pipeline scripts. It will update shared files (.dlt/secrets.toml, .dlt/config.toml, requirements.txt, .gitignore).
Run the provided dlthub pipeline init command with the global --non-interactive flag in the active venv (e.g. uv run dlthub --non-interactive pipeline init rest_api duckdb).
If the command fails with invalid choice: 'pipeline', the dlthub workspace is not initialized. Run uv run dlthub init and follow its instructions — most importantly run uv sync to pull required dependencies — then retry.
Depending on the source type, this creates:
Core source (dlthub pipeline init rest_api duckdb):
rest_api_pipeline.py(or similar) — full working example with RESTAPIConfig, pagination, incremental loading
Generic fallback (dlthub pipeline init <unknown_name> duckdb):
<name>_pipeline.py— basic intro template (less useful, prefer core sources)
Shared files (created on first init, updated on subsequent runs):
.dlt/secrets.toml— credentials template.dlt/config.toml— pipeline configrequirements.txt— Python dependencies.gitignore
Run ls -la again to confirm what was created.
3. Read generated files
Read the following files to understand the scaffold:
<source>_pipeline.py— the pipeline code template<source>-docs.yaml— API endpoint scaffold with auth, endpoints, params, data selectors (if present).dlt/config.toml— source/destination config ie.api_url
Do NOT read the .md file or any secrets.toml file.
4. Research before writing code
Do these in parallel:
Read essential dlt docs upfront:
- REST API source (config, auth, pagination, processing_steps):
https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api/basic.md - Source & resource decorators, parameters:
https://dlthub.com/docs/general-usage/source.mdandhttps://dlthub.com/docs/general-usage/resource.md
Web search the data source:
- Confirm the scaffold is accurate, learn about auth method, available endpoints
- How does the user get API keys/tokens for this service
Read additional docs as needed in later steps:
- How dlt works (extract → normalize → load):
https://dlthub.com/docs/reference/explainers/how-dlt-works.md - CLI reference (trace, load-package, schema):
https://dlthub.com/docs/hub/command-line-interface.md - File formats:
https://dlthub.com/docs/dlt-ecosystem/file-formats/ - Full docs index:
https://dlthub.com/docs/llms.txt
5. Present your findings
Present your findings so user can pick ONE of the endpoints that you will implement. Answer questions, do more research if needed.
6. Create pipeline with single endpoint
Edit <source>_pipeline.py using information from the scaffold, API research, and dlt docs:
- Focus on a single endpoint, ignore pagination and incremental loading for now
- Configure
base_urlandauth - Add resources with
endpoint.path,data_selector,params,primary_key - Use
dev_mode=Trueon the pipeline (fresh dataset on every run during debugging) - Use
.add_limit(1)on the source when callingpipeline.run()(load one page only) - Use
replacewrite disposition to start - Remove
refresh="drop_sources"if present —dev_modehandles the clean slate
Optionally: parameterize the source function
@dlt.source and @dlt.resource are regular Python function decorators — expose useful parameters:
- Credentials (
dlt.secrets.value): auto-loaded from secrets.toml, user can also pass explicitly - Config (
dlt.config.value): auto-loaded from config.toml, user can also pass explicitly - Runtime params (plain defaults): date ranges, filters, granularity — give sensible defaults so the pipeline works out of the box
Users will call the source both ways:
pipeline.run(my_source()) # auto-inject from TOML
pipeline.run(my_source(starting_at="2025-01-01T00:00:00Z", bucket_width="1h")) # explicit
Add a docstring documenting parameters and example calls.
Example
@dlt.source
def my_source(
access_token: str = dlt.secrets.value,
starting_at: str = None,
):
"""Load data from My API.
Args:
access_token: API token. Auto-loaded from secrets.toml.
starting_at: Start of range (ISO8601). Defaults to 7 days ago.
"""
if starting_at is None:
starting_at = pendulum.now("UTC").subtract(days=7).start_of("day").to_iso8601_string()
config: RESTAPIConfig = {
"client": {"base_url": "https://api.example.com/v1/", ...},
"resources": [...],
}
yield from rest_api_resources(config)
6b. Set up config and secrets TOMLs
Essential Reading Credentials & config resolution: https://dlthub.com/docs/general-usage/credentials/setup.md https://dlthub.com/docs/general-usage/credentials/advanced
Config (non-secret values like base_url, api_version): edit .dlt/config.toml directly.
# .dlt/config.toml
[sources.<name>]
base_url = "https://api.example.com/v1/"
Secrets (API keys, tokens, passwords): never read or write secrets.toml directly. Never run commands that output secret values (e.g. gh auth token, env | grep KEY).
Use secrets_view_redacted, secrets_list, and secrets_update_fragment MCP tools (or equivalent dlthub ai secrets CLI commands) — see setup-secrets skill for details.
Use secrets_list to pick the target file, then secrets_update_fragment with the TOML fragment:
[sources.<name>]
access_token = "ak-*******-cae"
<name>=name=arg on@dlt.sourceif set; otherwise the function name- Use meaningful placeholders that hint at format (not generic
<configure me>)
For more complex credential setup (research where to get keys, multiple providers), use setup-secrets skill.
Rate limits: dlt handles HTTP 429 and Retry-After automatically — no custom retry code needed. For strict rate-limited APIs, tune retry settings in adjust-endpoint.
ALWAYS Get Feedback before you run the pipeline for a first time. Show summary of files that you changed or generated.
7. Debug pipeline - first run
When user requests to run pipeline ALWAYS use debug-pipeline to diagnose and guide credential setup
NEVER add more endpoints before that - keep it simple