dspy-langwatch - SKILL.md Agent Skill

name: dspy-langwatch description: Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.

LangWatch — Auto-Tracing + Real-Time Optimizer Progress for DSPy

Guide the user through setting up LangWatch for automatic DSPy tracing and live optimizer progress tracking.

What is LangWatch

LangWatch is an open-source LLMOps platform with two distinct DSPy integrations:

Auto-tracing (inference): automatically captures module inputs/outputs, LM calls, and retrieval queries
Optimizer progress tracking (unique feature): streams live step-by-step scores, predictor states, and cost as optimizers run

No other observability tool (Langtrace, Phoenix, Weave, MLflow) patches DSPy optimizers to stream live progress.

Cloud: Managed at app.langwatch.ai (free tier available)
Self-hosted: Docker Compose, Helm chart, enterprise on-prem
Open source: github.com/langwatch/langwatch

When to use LangWatch

Use LangWatch when:

You run long optimization passes and want to see progress in real-time
You want auto-tracing of DSPy inference with no manual decorators
You want a dashboard showing optimizer scores, cost, and predictor state as they happen
You need both inference tracing AND optimizer monitoring in one tool

Do NOT use LangWatch when:

You only need tracing and want the simplest one-line setup — see /dspy-langtrace
You want a local trace viewer with built-in evals — see /dspy-phoenix
Your team already uses W&B for experiment tracking — see /dspy-weave
You need a model registry and full ML lifecycle — see /dspy-mlflow

Setup

Install

pip install langwatch
# Or pin DSPy version compatibility:
pip install langwatch[dspy]

Cloud setup (quickest)

Sign up at app.langwatch.ai
Create a project and copy your API key
Set the environment variable:

export LANGWATCH_API_KEY="your-key"

Self-hosted setup

Docker Compose

git clone https://github.com/langwatch/langwatch.git
cd langwatch
docker compose up -d

Then point your SDK at your local instance:

export LANGWATCH_ENDPOINT="http://localhost:5560"

Helm chart (Kubernetes)

LangWatch provides a Helm chart for production Kubernetes deployments. See the LangWatch docs for Helm values and configuration.

Integration 1: Auto-Tracing (Inference)

Use @langwatch.trace() and autotrack_dspy() to automatically capture all DSPy calls during inference.

What gets traced

Component	Details captured
Module calls	Inputs/outputs per `dspy.Module.forward()`
LM calls	Model name, messages, response, token counts
Retrievals	Queries, retrieved passages
Nested spans	Full call tree with parent-child relationships

Basic auto-tracing

import langwatch
import dspy

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))  # or "anthropic/claude-sonnet-4-5-20250929", etc.

@langwatch.trace()
def answer_question(question):
    langwatch.get_current_trace().autotrack_dspy()

    program = dspy.ChainOfThought("question -> answer")
    return program(question=question)

result = answer_question("What is DSPy?")
# View traces at app.langwatch.ai (or your self-hosted URL)

Tracing a full pipeline

import langwatch
import dspy

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))  # or "anthropic/claude-sonnet-4-5-20250929", etc.

class RAGPipeline(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=3)
        self.answer = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.answer(context=context, question=question)

pipeline = RAGPipeline()

@langwatch.trace()
def handle_query(question):
    langwatch.get_current_trace().autotrack_dspy()
    return pipeline(question=question)

result = handle_query("How do refunds work?")
# LangWatch captures:
#   - The RAGPipeline call
#   - The Retrieve call (query, passages)
#   - The ChainOfThought LM call (prompt, response, tokens)

Adding metadata to traces

@langwatch.trace()
def handle_query(user_id, question):
    trace = langwatch.get_current_trace()
    trace.autotrack_dspy()
    trace.update(metadata={"user_id": user_id, "environment": "production"})
    return pipeline(question=question)

Integration 2: Optimizer Progress Tracking (Unique Feature)

LangWatch patches DSPy optimizer classes to stream live step-by-step progress. This is LangWatch's killer feature — no other tool does this.

What the optimizer dashboard shows

Live scores: see each trial's score as it completes
Predictor states: which instructions and demos the optimizer is testing
LM calls: every call the optimizer makes during search
Cost tracking: running cost total as the optimizer runs
Progress bar: how far through the optimization you are

Supported optimizers

Optimizer	Supported
`dspy.BootstrapFewShot`	Yes
`dspy.BootstrapFewShotWithRandomSearch`	Yes
`dspy.COPRO`	Yes
`dspy.MIPROv2`	Yes
Others	Raises `ValueError`

Setup optimizer tracking

import langwatch.dspy
import dspy

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))  # or "anthropic/claude-sonnet-4-5-20250929", etc.

trainset = [...]  # your training examples

def metric(example, prediction, trace=None):
    return prediction.answer.strip().lower() == example.answer.strip().lower()

program = dspy.ChainOfThought("question -> answer")
optimizer = dspy.MIPROv2(metric=metric, auto="medium")

# Initialize LangWatch optimizer tracking
langwatch.dspy.init(
    experiment="mipro-medium-run1",
    optimizer=optimizer,
)

# Run optimization — progress streams to the LangWatch dashboard
optimized = optimizer.compile(program, trainset=trainset)
# Watch live progress at app.langwatch.ai

Tracking BootstrapFewShot

import langwatch.dspy
import dspy

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))  # or "anthropic/claude-sonnet-4-5-20250929", etc.

program = dspy.ChainOfThought("question -> answer")
optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)

langwatch.dspy.init(
    experiment="bootstrap-4demos",
    optimizer=optimizer,
)

optimized = optimizer.compile(program, trainset=trainset)

Comparing multiple optimizer runs

Run multiple experiments with different names — they appear side-by-side in the LangWatch dashboard:

import langwatch.dspy
import dspy

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))  # or "anthropic/claude-sonnet-4-5-20250929", etc.

experiments = [
    ("bootstrap-4", dspy.BootstrapFewShot, {"metric": metric, "max_bootstrapped_demos": 4}),
    ("bootstrap-8", dspy.BootstrapFewShot, {"metric": metric, "max_bootstrapped_demos": 8}),
    ("mipro-light", dspy.MIPROv2, {"metric": metric, "auto": "light"}),
    ("mipro-medium", dspy.MIPROv2, {"metric": metric, "auto": "medium"}),
]

for name, opt_class, kwargs in experiments:
    program = dspy.ChainOfThought("question -> answer")
    optimizer = opt_class(**kwargs)
    langwatch.dspy.init(experiment=name, optimizer=optimizer)
    optimized = optimizer.compile(program, trainset=trainset)

LangWatch vs Langtrace vs Phoenix vs Weave vs MLflow

Feature	LangWatch	Langtrace	Phoenix	Weave	MLflow
DSPy auto-tracing	Yes	Yes (built-in)	Yes (plugin)	No (manual)	Yes (`autolog`)
Optimizer progress	Yes (unique)	No	No	No	No
Live scores dashboard	Yes	No	No	No	No
Setup effort	2-3 lines	One line	Two lines + launch	Manual decorators	One line
Self-hosted	Yes (Docker, Helm)	Yes (Docker)	Yes	No (cloud only)	Yes
Cloud option	Yes (app.langwatch.ai)	Yes (app.langtrace.ai)	Yes (Arize)	Yes (wandb.ai)	Yes (Databricks)
Model registry	No	No	No	No	Yes
Built-in evals	Basic	Basic	Yes	Basic	Basic

Decision guide

What do you need?
|
+- Watch optimizer progress live? -> LangWatch (this skill)
+- Easiest auto-tracing setup? -> Langtrace (/dspy-langtrace)
+- Tracing + evals (local)? -> Phoenix (/dspy-phoenix)
+- Tracing + experiment tracking (cloud)? -> Weave (/dspy-weave)
+- Full ML lifecycle + model registry? -> MLflow (/dspy-mlflow)

Gotchas

Claude forgets to call autotrack_dspy() inside the traced function. The @langwatch.trace() decorator creates the trace context, but DSPy auto-tracking only activates when you call langwatch.get_current_trace().autotrack_dspy() inside the function body. Without it, you get an empty trace with no DSPy spans.
Claude puts autotrack_dspy() outside the @langwatch.trace() function. The autotrack_dspy() call must be inside the decorated function where a trace context exists. Calling it at module level or before the trace starts raises an error because there is no current trace.
Claude calls langwatch.dspy.init() after optimizer.compile(). The init() call must come before compile() — it patches the optimizer to stream progress. If called after, no progress data is captured. Always: create optimizer, call langwatch.dspy.init(experiment=..., optimizer=...), then call optimizer.compile().
Claude reuses the same experiment name across runs. Each langwatch.dspy.init(experiment=...) call should use a unique experiment name so runs appear as separate entries in the dashboard. Reusing names overwrites or merges data, making comparison impossible.

Additional resources

Cross-references

Install any skill: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>

Langtrace (auto-instrumentation, easiest one-line setup) — /dspy-langtrace
Arize Phoenix (open-source with evals) — /dspy-phoenix
W&B Weave (team dashboards, experiment tracking) — /dspy-weave
MLflow (full ML lifecycle, model registry) — /dspy-mlflow
Lightweight experiment tracking (JSONL-based, no extra tools) — /ai-tracking-experiments
Production monitoring — /ai-monitoring
For worked examples, see examples.md
Install /ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do