name: setup-guide description: "Use when the user wants to set up synthetic data generation for the first time, or when sdg_hub is not yet installed/configured in the current environment." allowed-tools: ["Bash(${CLAUDE_PLUGIN_ROOT}/scripts/sdg_detect.sh:)", "Bash(${CLAUDE_PLUGIN_ROOT}/scripts/sdg_flows.sh:)"]
sdg_hub Setup Guide
You are helping the user set up synthetic data generation.
Step 1: Detect Environment
"${CLAUDE_PLUGIN_ROOT}/scripts/sdg_detect.sh"
Step 2: Install if Needed
If library=missing:
- Explain: "sdg_hub is a framework for synthetic data generation — it uses composable blocks and YAML-defined flows to build LLM training datasets from seed data."
- Ask permission: "I can install it for you. Want me to proceed?"
- If yes and
installer=uv: runuv pip install sdg_hub - If yes and
installer=pip: runpip install sdg_hub - If
installer=none: tell the user they need Python and pip/uv installed first
Step 3: Quick Setup or Custom
Also check for API keys in the environment:
echo "openai_key=${OPENAI_API_KEY:+found}" "anthropic_key=${ANTHROPIC_API_KEY:+found}"
If an API key was detected, offer a one-question fast path:
"I detected your OpenAI API key. I can set up with these defaults:
- Model:
openai/gpt-4o-mini- Temperature:
0.7- Concurrency:
5Accept these defaults, or would you like to customize?"
If the user accepts, skip to Step 5 using the detected key and defaults.
For Anthropic keys, default to anthropic/claude-sonnet-4-20250514.
If no API key was detected, or the user wants to customize, proceed to Step 4.
Step 4: Collect Configuration
Ask these questions one at a time:
- Model: "Which LLM model do you want to use for generation?" — e.g.,
openai/gpt-4o-mini,meta-llama/Llama-3.3-70B-Instruct,anthropic/claude-sonnet-4-20250514 - API endpoint: "What's your model endpoint URL?" — e.g.,
http://localhost:8000/v1for vLLM, or leave empty for cloud provider defaults - Temperature: "What temperature for generation?" (default: 0.7)
- Max concurrency: "How many parallel LLM requests?" (default: 5) — higher is faster but may hit rate limits
- Checkpoint directory: "Where should generation checkpoints be saved?" (default:
./checkpoints) — allows resuming interrupted runs
Step 5: Ensure API Key
API keys are read from environment variables — never store them in the config file. LiteLLM (used by sdg_hub) reads standard env vars automatically.
If no API key was detected in Step 3, tell the user to set the appropriate environment variable:
"Set your API key as an environment variable before running generation:
export OPENAI_API_KEY="sk-..." # OpenAI models export ANTHROPIC_API_KEY="sk-ant-..." # Anthropic modelsFor local endpoints (vLLM, Ollama) that don't require authentication, no API key is needed. LiteLLM picks up these env vars automatically — no extra configuration required."
Step 6: Save Config
Write the config to .sdg-hub/config.json:
{
"model": "<model>",
"api_base": "<endpoint>",
"temperature": 0.7,
"max_concurrency": 5,
"checkpoint_dir": "./checkpoints"
}
Add .sdg-hub/ to .gitignore if not already present.
Confirm the config file was written, then report success:
"Setup complete! To run generation, use the
data-generationskill, or theflow-browserskill to browse available flows.API keys are read from environment variables, not the config file. Make sure the appropriate variable is set in your shell:
export OPENAI_API_KEY="sk-..." # OpenAI models export ANTHROPIC_API_KEY="sk-ant-..." # Anthropic modelsLocal endpoints (vLLM, Ollama) don't need an API key."
Step 7: Verify
List available flows to confirm the installation works:
"${CLAUDE_PLUGIN_ROOT}/scripts/sdg_flows.sh" list
Report success and remind the user they can now use the data-generation skill to run generation, or the flow-browser skill to browse available flows.
Updating Config
If this skill is invoked again and a config already exists, ask: "You already have a configuration. Do you want to update it or start fresh?"