name: mlflow description: MLflow 3.13 (tracking + AI Gateway) on this stack. Load when logging experiments from Spark/Python jobs, registering models, or wiring AI Gateway routes to Anthropic / Ollama for LLM-in-pipeline patterns.
MLflow 3.13
Tracking server: http://localhost:5000. AI Gateway: http://localhost:5001. Both in docker-compose-mlflow.yml. Backend: PostgreSQL (mlflow database). Artifact store: SeaweedFS (s3://mlflow/). Image base is ghcr.io/mlflow/mlflow:v3.13.0-full (the -full variant bundles psycopg2-binary + boto3, so Postgres and S3 work without a custom pip layer).
Tracking client
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("my-experiment")
with mlflow.start_run() as run:
mlflow.log_param("lr", 0.01)
mlflow.log_metric("accuracy", 0.94)
mlflow.log_artifact("plot.png")
print(run.info.run_id)
Set MLFLOW_TRACKING_URI=http://localhost:5000 in .env and clients pick it up without explicit set_tracking_uri.
Spark autologging
import mlflow.spark
mlflow.spark.autolog() # logs Spark ML pipelines, datasets, params
# or specifically for sklearn-style estimators
import mlflow.sklearn
mlflow.sklearn.autolog()
Autolog requires the run to be active (with mlflow.start_run():).
Model registry
mlflow.register_model(
model_uri=f"runs:/{run_id}/model",
name="orders-classifier",
)
Stage transitions (Staging → Production) are managed via the UI or mlflow.MlflowClient().transition_model_version_stage(...).
AI Gateway (port 5001)
The Gateway is an LLM proxy with route-based config in config/mlflow/gateway-config.yml. Pre-configured routes (verify the file in this repo, but typically):
chat-anthropic— Anthropic Claude (requiresANTHROPIC_API_KEYin.env)chat-ollama— local Ollama models (requires Ollama running on the host)
Call from Python:
from mlflow.deployments import get_deploy_client
client = get_deploy_client("http://localhost:5001")
response = client.predict(
endpoint="chat-anthropic",
inputs={"messages": [{"role": "user", "content": "Summarize this Iceberg snapshot."}]},
)
Use cases on this stack:
- LLM-as-evaluator inside a Spark UDF
- Generating documentation strings for new tables in a Iceberg maintenance DAG
- Routing demo queries to a local Ollama model to avoid API costs
Common pitfalls
Connection refused: localhost:5000— MLflow container isn't up../lakehouse start mlflow.Artifact upload failed: S3 access denied— Mlflow's S3 creds (indocker-compose-mlflow.ymlenv) don't match the SeaweedFS keys in.env. They must match.- Autolog logs to the wrong tracking server —
MLFLOW_TRACKING_URIenv var beatsset_tracking_uri()call order. Set the env var explicitly. - Spark autologging duplicate runs —
mlflow.spark.autolog()+ manualstart_run()can create nested runs. Choose one.
Performance / cleanup
MLflow artifacts in SeaweedFS accumulate quickly. Cleanup:
# Delete runs older than 30 days via MLflow API
mlflow gc --backend-store-uri postgresql://... --older-than 30
For demos, just docker compose -f docker-compose-mlflow.yml down -v to wipe everything.