name: mlflow-traces description: "Use when working with MLflow traces: debugging via MCP tools, analyzing performance, logging feedback, writing custom scorers/evaluations, or cleaning up trace data"
MLflow Trace Management — video-research-mcp
Overview
Query, tag, evaluate, and manage MLflow traces captured from video-research-mcp Gemini API calls. Uses mcp__mlflow-mcp__* MCP tools — no code writing needed for most operations.
Core principle: Search first, then act. Always verify before destructive operations.
Quick Reference
| Task | Tool | Key Params |
|---|---|---|
| Find traces | search_traces |
experiment_id, filter_string, extract_fields |
| Get details | get_trace |
trace_id, extract_fields |
| Tag trace | set_trace_tag |
trace_id, key, value |
| Log score | log_feedback |
trace_id, name, value, rationale |
| Run scorers | evaluate_traces |
experiment_id, trace_ids, scorers |
| List scorers | list_scorers |
— |
Canonical Field Paths
CRITICAL — only use fields that actually exist:
| Path | Content | Common mistake |
|---|---|---|
info.trace_id |
Trace identifier | — |
info.state |
Status: OK, ERROR | NOT info.status |
info.request_time |
Timestamp | NOT info.timestamp_ms |
info.execution_duration_ms |
Duration in ms | NOT info.execution_duration |
info.request_preview |
First ~100 chars of request | — |
info.response_preview |
First ~100 chars of response | — |
info.tags |
All tags as object | Use info.tags.* for all |
data.spans.*.name |
Span names | Must include data. prefix |
data.spans.*.status_code |
Span status | NOT data.spans.*.status |
data.spans.*.inputs |
Span inputs | Moderate size |
data.spans.*.outputs |
Span outputs | Moderate size |
extract_fields Discipline
Always use extract_fields. Video-research-mcp traces contain video URIs, cached content references, full Gemini prompts/responses. A single get_trace without extract_fields can flood your context window.
// BAD - pulls everything
get_trace({ trace_id: "tr-..." })
search_traces({ experiment_id: "2" })
// GOOD - selective fields
get_trace({ trace_id: "tr-...",
extract_fields: "info.*,data.spans.*.name,data.spans.*.status_code" })
search_traces({ experiment_id: "2", max_results: 10,
extract_fields: "info.trace_id,info.state,info.execution_duration_ms" })
Never request data.spans.*.attributes unqualified — it silently drops dotted keys and can contain massive payloads.
Filter String vs Extract Fields — DIFFERENT NAMING!
CRITICAL: filter_string and extract_fields use DIFFERENT field names:
| Data | filter_string syntax |
extract_fields syntax |
|---|---|---|
| Status | status = 'ERROR' |
info.state |
| Timestamp | timestamp_ms > 170000... |
info.request_time |
| Duration | execution_time_ms > 5000 |
info.execution_duration_ms |
| Tags | tags.reviewed = 'true' |
info.tags.* |
Common Workflows
Debug failed traces
search_traces({ experiment_id: "<id>", filter_string: "status='ERROR'", max_results: 20,
extract_fields: "info.trace_id,info.state,info.execution_duration_ms,info.request_preview" })
get_trace({ trace_id: "tr-abc123",
extract_fields: "info.*,data.spans.*.name,data.spans.*.status_code" })
set_trace_tag({ trace_id: "tr-abc123", key: "needs_investigation", value: "true" })
Find slow traces
search_traces({ experiment_id: "<id>", filter_string: "execution_time_ms > 5000",
max_results: 20, extract_fields: "info.trace_id,info.execution_duration_ms,data.spans.*.name" })
Log human feedback
log_feedback({ trace_id: "tr-abc123", name: "response_quality", value: 4.5,
source_type: "human", rationale: "Accurate analysis, good structure" })
Run built-in scorers
// List available scorers first
list_scorers()
// Run evaluation
evaluate_traces({ experiment_id: "<id>", trace_ids: "tr-abc,tr-def",
scorers: "Correctness,RelevanceToQuery" })
Search before delete
// Step 1: Preview
search_traces({ experiment_id: "<id>", filter_string: "timestamp < 1704067200000",
max_results: 10, extract_fields: "info.trace_id,info.request_time" })
// Step 2: Verify count and IDs, then delete
delete_traces({ experiment_id: "<id>", max_timestamp_millis: 1704067200000 })
Field Selection Recipes
"info.trace_id,info.state" // Minimal overview
"info.trace_id,info.execution_duration_ms,data.spans.*.name" // Performance
"info.*,data.spans.*.name,data.spans.*.status_code" // Full context (safe)
"info.trace_id,info.tags.*" // Tags only
"info.trace_id,info.assessments.*.feedback.value" // Feedback scores
video-research-mcp Context
| Setting | Value |
|---|---|
| Tracking server | http://127.0.0.1:5001 (default) |
| Experiment name | video-research-mcp |
| Env var | MLFLOW_TRACKING_URI |
| Autolog captures | All GeminiClient generate/generate_structured calls |
| Trace spans | Gemini API calls with model, thinking level, tokens, cost |
Traces are captured automatically when MLFLOW_TRACKING_URI is set. No code changes needed — mlflow.gemini.autolog() hooks into the google-genai SDK.
Troubleshooting
MCP tools not available / connection refused
The MLflow tracking server must be running:
MLFLOW_TRACKING_URI=http://127.0.0.1:5001 mlflow server --port 5001
Then restart Claude Code to reconnect.
No traces found
- Check
MLFLOW_TRACKING_URIis set in the server environment - Verify the experiment name: search with
max_results: 1across experiment IDs - Confirm traces are being captured: run a tool call, then search again
Wrong experiment
The default experiment is video-research-mcp. If traces land in Default (experiment 0), the MLFLOW_EXPERIMENT_NAME env var is not set.