mlflow-traces

star 21

Use when working with MLflow traces: debugging via MCP tools, analyzing performance, logging feedback, writing custom scorers/evaluations, or cleaning up trace data

Galbaz1 By Galbaz1 schedule Updated 2/28/2026

name: mlflow-traces description: "Use when working with MLflow traces: debugging via MCP tools, analyzing performance, logging feedback, writing custom scorers/evaluations, or cleaning up trace data"

MLflow Trace Management — video-research-mcp

Overview

Query, tag, evaluate, and manage MLflow traces captured from video-research-mcp Gemini API calls. Uses mcp__mlflow-mcp__* MCP tools — no code writing needed for most operations.

Core principle: Search first, then act. Always verify before destructive operations.

Quick Reference

Task Tool Key Params
Find traces search_traces experiment_id, filter_string, extract_fields
Get details get_trace trace_id, extract_fields
Tag trace set_trace_tag trace_id, key, value
Log score log_feedback trace_id, name, value, rationale
Run scorers evaluate_traces experiment_id, trace_ids, scorers
List scorers list_scorers

Canonical Field Paths

CRITICAL — only use fields that actually exist:

Path Content Common mistake
info.trace_id Trace identifier
info.state Status: OK, ERROR NOT info.status
info.request_time Timestamp NOT info.timestamp_ms
info.execution_duration_ms Duration in ms NOT info.execution_duration
info.request_preview First ~100 chars of request
info.response_preview First ~100 chars of response
info.tags All tags as object Use info.tags.* for all
data.spans.*.name Span names Must include data. prefix
data.spans.*.status_code Span status NOT data.spans.*.status
data.spans.*.inputs Span inputs Moderate size
data.spans.*.outputs Span outputs Moderate size

extract_fields Discipline

Always use extract_fields. Video-research-mcp traces contain video URIs, cached content references, full Gemini prompts/responses. A single get_trace without extract_fields can flood your context window.

// BAD - pulls everything
get_trace({ trace_id: "tr-..." })
search_traces({ experiment_id: "2" })

// GOOD - selective fields
get_trace({ trace_id: "tr-...",
  extract_fields: "info.*,data.spans.*.name,data.spans.*.status_code" })
search_traces({ experiment_id: "2", max_results: 10,
  extract_fields: "info.trace_id,info.state,info.execution_duration_ms" })

Never request data.spans.*.attributes unqualified — it silently drops dotted keys and can contain massive payloads.

Filter String vs Extract Fields — DIFFERENT NAMING!

CRITICAL: filter_string and extract_fields use DIFFERENT field names:

Data filter_string syntax extract_fields syntax
Status status = 'ERROR' info.state
Timestamp timestamp_ms > 170000... info.request_time
Duration execution_time_ms > 5000 info.execution_duration_ms
Tags tags.reviewed = 'true' info.tags.*

Common Workflows

Debug failed traces

search_traces({ experiment_id: "<id>", filter_string: "status='ERROR'", max_results: 20,
  extract_fields: "info.trace_id,info.state,info.execution_duration_ms,info.request_preview" })

get_trace({ trace_id: "tr-abc123",
  extract_fields: "info.*,data.spans.*.name,data.spans.*.status_code" })

set_trace_tag({ trace_id: "tr-abc123", key: "needs_investigation", value: "true" })

Find slow traces

search_traces({ experiment_id: "<id>", filter_string: "execution_time_ms > 5000",
  max_results: 20, extract_fields: "info.trace_id,info.execution_duration_ms,data.spans.*.name" })

Log human feedback

log_feedback({ trace_id: "tr-abc123", name: "response_quality", value: 4.5,
  source_type: "human", rationale: "Accurate analysis, good structure" })

Run built-in scorers

// List available scorers first
list_scorers()

// Run evaluation
evaluate_traces({ experiment_id: "<id>", trace_ids: "tr-abc,tr-def",
  scorers: "Correctness,RelevanceToQuery" })

Search before delete

// Step 1: Preview
search_traces({ experiment_id: "<id>", filter_string: "timestamp < 1704067200000",
  max_results: 10, extract_fields: "info.trace_id,info.request_time" })

// Step 2: Verify count and IDs, then delete
delete_traces({ experiment_id: "<id>", max_timestamp_millis: 1704067200000 })

Field Selection Recipes

"info.trace_id,info.state"                                    // Minimal overview
"info.trace_id,info.execution_duration_ms,data.spans.*.name"  // Performance
"info.*,data.spans.*.name,data.spans.*.status_code"           // Full context (safe)
"info.trace_id,info.tags.*"                                   // Tags only
"info.trace_id,info.assessments.*.feedback.value"             // Feedback scores

video-research-mcp Context

Setting Value
Tracking server http://127.0.0.1:5001 (default)
Experiment name video-research-mcp
Env var MLFLOW_TRACKING_URI
Autolog captures All GeminiClient generate/generate_structured calls
Trace spans Gemini API calls with model, thinking level, tokens, cost

Traces are captured automatically when MLFLOW_TRACKING_URI is set. No code changes needed — mlflow.gemini.autolog() hooks into the google-genai SDK.

Troubleshooting

MCP tools not available / connection refused

The MLflow tracking server must be running:

MLFLOW_TRACKING_URI=http://127.0.0.1:5001 mlflow server --port 5001

Then restart Claude Code to reconnect.

No traces found

  1. Check MLFLOW_TRACKING_URI is set in the server environment
  2. Verify the experiment name: search with max_results: 1 across experiment IDs
  3. Confirm traces are being captured: run a tool call, then search again

Wrong experiment

The default experiment is video-research-mcp. If traces land in Default (experiment 0), the MLFLOW_EXPERIMENT_NAME env var is not set.

Resources

Install via CLI
npx skills add https://github.com/Galbaz1/video-research-mcp --skill mlflow-traces
Repository Details
star Stars 21
call_split Forks 5
navigation Branch main
article Path SKILL.md
More from Creator