mlflow - SKILL.md Agent Skill

name: mlflow description: MLflow for ML lifecycle management - experiment tracking, LLM/GenAI tracing, model registry, and deployment with GenAI and MCP support triggers: - mlflow - experiment tracking - model registry - llm tracking - genai tracing - mlflow autolog - mlflow tracking - log metrics - track experiment allowed_tools: - Read - Write - Edit - Bash - mcpmlflowsearch_traces - mcpmlflowget_trace - mcpmlflowdelete_traces - mcpmlflowset_trace_tag - mcpmlflowdelete_trace_tag - mcpmlflowlog_feedback - mcpmlflowlog_expectation - mcpmlflowget_assessment - mcpmlflowupdate_assessment - mcpmlflowdelete_assessment - mcpmlflowevaluate_traces - mcpmlflowlist_scorers - mcpmlflowregister_llm_judge - mcp__plugin_context7_context7resolve-library-id - mcpplugin_context7_context7__query-docs

MLflow Development Skill

Expert guidance for ML lifecycle management with MLflow, including GenAI/LLM tracking and MCP integration.

Core Concepts

MLflow is an open-source platform for managing the ML lifecycle with four main components:

Tracking: Log parameters, metrics, and artifacts
Models: Package and deploy models
Registry: Centralize model storage and versioning
Projects: Package reproducible runs

Quick Start

Configuration

import mlflow

# Set tracking URI
mlflow.set_tracking_uri("http://localhost:5000")

# Set experiment
mlflow.set_experiment("my-experiment")

Autologging

MLflow provides automatic logging for major frameworks:

ML Frameworks

import mlflow

# Scikit-learn
mlflow.sklearn.autolog()

# PyTorch
mlflow.pytorch.autolog()

# TensorFlow/Keras
mlflow.tensorflow.autolog()

GenAI/LLM Providers

import mlflow

# OpenAI
mlflow.openai.autolog()

# Anthropic
mlflow.anthropic.autolog()

# LangChain
mlflow.langchain.autolog()

What gets logged automatically:

Model parameters and hyperparameters
Training metrics
Model artifacts and dependencies
Tokens, latency, and cost (for LLMs)
Tool calls and function invocations

Manual Tracking

import mlflow

with mlflow.start_run():
    # Log parameters
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_params({"batch_size": 32, "epochs": 100})

    # Log metrics
    mlflow.log_metric("train_loss", 0.5)

    # Log metrics with steps
    for epoch in range(num_epochs):
        train_loss = train_model()
        mlflow.log_metric("train_loss", train_loss, step=epoch)

    # Log model
    mlflow.sklearn.log_model(model, name="model")

GenAI Tracing

Basic Tracing

import mlflow

@mlflow.trace
def my_llm_app(question: str) -> str:
    """Traced LLM application"""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": question}]
    )
    return response.choices[0].message.content

# Trace is automatically logged
result = my_llm_app("What is MLflow?")

# Access trace
trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=trace_id)

Production Context

# Add production context to traces
mlflow.update_current_trace(
    tags={
        "mlflow.trace.session": session_id,
        "mlflow.trace.user": user_id,
        "environment": "production"
    }
)

Model Registry

Register and Deploy Models

import mlflow
from mlflow import MlflowClient

client = MlflowClient()

# Register during training
with mlflow.start_run():
    mlflow.sklearn.log_model(
        model,
        name="model",
        registered_model_name="MyModel"
    )

# Set alias for deployment
client.set_registered_model_alias(
    name="MyModel",
    alias="champion",
    version=1
)

# Load model by alias
model = mlflow.pyfunc.load_model("models:/MyModel@champion")

Model Versioning

# Load specific version
model = mlflow.pyfunc.load_model("models:/MyModel/2")

# Load by stage
model = mlflow.pyfunc.load_model("models:/MyModel/Production")

# Transition model stage
client.transition_model_version_stage(
    name="MyModel",
    version=2,
    stage="Production"
)

MCP Integration

MLflow has MCP support for trace operations:

Installation

# Install as UV tool (recommended)
uv tool install "mlflow[genai,mcp]>=2.19.0"

# Or add to project
uv add "mlflow[genai,mcp]>=2.19.0"

Starting MLflow MCP Server

# Run directly (after uv tool install)
mlflow mcp run

# With custom tracking URI
MLFLOW_TRACKING_URI=sqlite:///mlruns.db mlflow mcp run

Claude Code Configuration

.mcp.json (project configuration):

{
  "mcpServers": {
    "mlflow": {
      "command": "mlflow",
      "args": ["mcp", "run"],
      "env": {
        "MLFLOW_TRACKING_URI": "sqlite:///mlruns.db"
      }
    }
  }
}

Environment Variables

Variable	Required	Description
`MLFLOW_TRACKING_URI`	Yes	MLflow tracking server URL or sqlite path
`MLFLOW_EXPERIMENT_ID`	No	Default experiment ID
`DATABRICKS_HOST`	For Databricks	Workspace URL
`DATABRICKS_TOKEN`	For Databricks	Personal access token

MCP Tools Available

The MLflow MCP server exposes these tools:

Tool	Purpose
`search_traces`	Search traces with filters (experiment_id, tags, timestamps)
`get_trace`	Get detailed trace info including spans, inputs, outputs
`log_feedback`	Log feedback scores (accuracy, quality, custom)
`log_expectation`	Log expected values for trace evaluation
`evaluate_traces`	Run automated evaluation with scorers
`list_scorers`	List available evaluation scorers
`register_llm_judge`	Create custom LLM-based scorer
`set_trace_tag` / `delete_trace_tag`	Manage trace metadata
`delete_traces`	Clean up traces by criteria

Workflow Example:

1. search_traces → Find traces to evaluate
2. evaluate_traces → Run built-in scorers (Correctness, Safety, etc.)
3. log_feedback → Add human feedback
4. get_trace → Inspect detailed results

Best Practices

Organize Experiments: Use hierarchical naming and tags
Version Everything: Use Git versioning for GenAI apps
Add Production Context: Always include session, user, and environment info
Monitor Costs: Track token usage and estimated costs
Regular Evaluation: Run evaluations with multiple scorers

Cross-Skill Integration

Marimo Dashboard for Experiments

Build interactive experiment dashboards:

import marimo as mo
import mlflow

# Experiment selector
experiments = mlflow.search_experiments()
exp_select = mo.ui.dropdown(
    options={e.name: e.experiment_id for e in experiments},
    label="Select Experiment"
)

# Display runs with filtering
runs_df = mlflow.search_runs(experiment_ids=[exp_select.value])
mo.ui.table(runs_df, selection="single", label="Experiment Runs")

PINA Training Tracking

Track physics-informed neural network training:

import mlflow
from pina import Trainer
from pina.callbacks import MetricTracker

mlflow.set_experiment("pina-experiments")

with mlflow.start_run():
    mlflow.log_params({"layers": [64, 64], "activation": "Tanh"})

    trainer = Trainer(solver, max_epochs=1000, callbacks=[MetricTracker()])
    trainer.train()

    # Log PINA metrics
    for key, value in trainer.callback_metrics.items():
        mlflow.log_metric(key, value)

    mlflow.pytorch.log_model(solver.model, "pinn")

Using context7 for Documentation

Query up-to-date MLflow documentation directly:

# context7 Library IDs (no resolve needed):
# - /mlflow/mlflow (official docs, 9559 snippets)
# - /websites/mlflow (website docs, 36205 snippets)

# Example: query-docs("/mlflow/mlflow", "mlflow.trace decorator usage")

When to Use This Skill

✅ Use MLflow when:

Tracking ML experiments and hyperparameters
Building and deploying LLM/GenAI applications
Managing model lifecycle and versioning
Comparing model performance across iterations
Need production observability for GenAI apps
Collaborating on ML projects

❌ Don't use MLflow when:

Simple one-off scripts without iteration
No need for experiment tracking or model versioning
Using platform-specific tools (e.g., SageMaker Experiments)

Reference Documentation

For detailed guides, see the references folder:

GenAI Tracking: Complete LLM tracking guide with all providers, Git versioning, token tracking, and trace querying
Framework Integrations: LangChain, LlamaIndex, DSPy, CrewAI, and ML framework integration examples
Evaluation: GenAI evaluation, built-in scorers, custom scorers, and model comparison
Production Deployment: FastAPI integration, async logging, Kubernetes deployment, and monitoring
Model Registry: Complete model registry guide with versioning, aliases, stages, and deployment patterns

Example Templates

Ready-to-use templates in the examples folder:

langchain_tracking.py: LangChain with MLflow autologging - chains, agents, and production patterns
fastapi_tracing.py: FastAPI production tracing with streaming, batch processing, and health checks
hyperparameter_tuning.py: Grid search, random search, Optuna integration, and nested runs

Resources

Documentation: https://mlflow.org/docs/latest/
GitHub: https://github.com/mlflow/mlflow
GenAI Guide: https://mlflow.org/docs/latest/genai/
Tracking API: https://mlflow.org/docs/latest/tracking.html
Model Registry: https://mlflow.org/docs/latest/model-registry.html