setup-guide - SKILL.md Agent Skill

name: setup-guide description: "Use when the user wants to set up synthetic data generation for the first time, or when sdg_hub is not yet installed/configured in the current environment." allowed-tools: ["Bash(${CLAUDE_PLUGIN_ROOT}/scripts/sdg_detect.sh:)", "Bash(${CLAUDE_PLUGIN_ROOT}/scripts/sdg_flows.sh:)"]

sdg_hub Setup Guide

You are helping the user set up synthetic data generation.

Step 1: Detect Environment

"${CLAUDE_PLUGIN_ROOT}/scripts/sdg_detect.sh"

Step 2: Install if Needed

If library=missing:

Explain: "sdg_hub is a framework for synthetic data generation — it uses composable blocks and YAML-defined flows to build LLM training datasets from seed data."
Ask permission: "I can install it for you. Want me to proceed?"
If yes and installer=uv: run uv pip install sdg_hub
If yes and installer=pip: run pip install sdg_hub
If installer=none: tell the user they need Python and pip/uv installed first

Step 3: Quick Setup or Custom

Also check for API keys in the environment:

echo "openai_key=${OPENAI_API_KEY:+found}" "anthropic_key=${ANTHROPIC_API_KEY:+found}"

If an API key was detected, offer a one-question fast path:

"I detected your OpenAI API key. I can set up with these defaults:

Model: openai/gpt-4o-mini

Temperature: 0.7

Concurrency: 5

Accept these defaults, or would you like to customize?"

If the user accepts, skip to Step 5 using the detected key and defaults.

For Anthropic keys, default to anthropic/claude-sonnet-4-20250514.

If no API key was detected, or the user wants to customize, proceed to Step 4.

Step 4: Collect Configuration

Ask these questions one at a time:

Model: "Which LLM model do you want to use for generation?" — e.g., openai/gpt-4o-mini, meta-llama/Llama-3.3-70B-Instruct, anthropic/claude-sonnet-4-20250514
API endpoint: "What's your model endpoint URL?" — e.g., http://localhost:8000/v1 for vLLM, or leave empty for cloud provider defaults
Temperature: "What temperature for generation?" (default: 0.7)
Max concurrency: "How many parallel LLM requests?" (default: 5) — higher is faster but may hit rate limits
Checkpoint directory: "Where should generation checkpoints be saved?" (default: ./checkpoints) — allows resuming interrupted runs

Step 5: Ensure API Key

API keys are read from environment variables — never store them in the config file. LiteLLM (used by sdg_hub) reads standard env vars automatically.

If no API key was detected in Step 3, tell the user to set the appropriate environment variable:

"Set your API key as an environment variable before running generation:
export OPENAI_API_KEY="sk-..."        # OpenAI models
export ANTHROPIC_API_KEY="sk-ant-..."  # Anthropic models
For local endpoints (vLLM, Ollama) that don't require authentication, no API key is needed. LiteLLM picks up these env vars automatically — no extra configuration required."

Step 6: Save Config

Write the config to .sdg-hub/config.json:

{
  "model": "<model>",
  "api_base": "<endpoint>",
  "temperature": 0.7,
  "max_concurrency": 5,
  "checkpoint_dir": "./checkpoints"
}

Add .sdg-hub/ to .gitignore if not already present.

Confirm the config file was written, then report success:

"Setup complete! To run generation, use the data-generation skill, or the flow-browser skill to browse available flows.

API keys are read from environment variables, not the config file. Make sure the appropriate variable is set in your shell:
export OPENAI_API_KEY="sk-..."        # OpenAI models
export ANTHROPIC_API_KEY="sk-ant-..."  # Anthropic models
Local endpoints (vLLM, Ollama) don't need an API key."

Step 7: Verify

List available flows to confirm the installation works:

"${CLAUDE_PLUGIN_ROOT}/scripts/sdg_flows.sh" list

Report success and remind the user they can now use the data-generation skill to run generation, or the flow-browser skill to browse available flows.

Updating Config

If this skill is invoked again and a config already exists, ask: "You already have a configuration. Do you want to update it or start fresh?"