add-tracing

star 47.7k

Add OpenTelemetry tracing spans to Clojure code following Metabase tracing conventions. Use when instrumenting backend code with trace coverage.

metabase By metabase schedule Updated 3/26/2026

name: add-tracing description: Add OpenTelemetry tracing spans to Clojure code following Metabase tracing conventions. Use when instrumenting backend code with trace coverage.

Add Tracing Spans to Clojure Code

This skill helps you add OpenTelemetry (OTel) tracing spans to the Metabase backend codebase using the custom tracing/with-span macro.

Reference Files

  • src/metabase/tracing/core.clj - with-span macro, group registry, SDK lifecycle, best-effort-sanitize-sql, Pyroscope integration
  • src/metabase/task/impl.clj - defjob macro that wraps Quartz jobs with root spans
  • .clj-kondo/config/modules/config.edn - Module boundary configuration

Module Architecture

The tracing module has a deliberately minimal API surface. Only 2 namespaces are public (listed in :api in the module config):

Namespace Role Status
tracing.core Primary API: with-span, groups, SDK lifecycle, Pyroscope, MDC, best-effort-sanitize-sql Public API
tracing.init Side-effect loader for quartz and settings Public API (init convention)
tracing.attributes best-effort-sanitize-sql implementation (re-exported via tracing.core) Internal
tracing.settings Setting definitions (MB_TRACING_* env vars) Internal
tracing.quartz Quartz JDBC proxy + JobListener Internal

Rules:

  • Only require [metabase.tracing.core :as tracing] from outside the module. tracing/best-effort-sanitize-sql and all other public functions are available from this single namespace.
  • Do not add new API namespaces. Add new public functions to tracing.core instead.
  • Do not require internal namespaces (tracing.attributes, tracing.settings, tracing.quartz) from outside the module.
  • :uses :any on the core module does NOT bypass the target module's :api check — internal namespaces are still enforced.

Cyclic Dependency Avoidance

tracing/core.clj is required by many modules across the codebase. It must NOT compile-time require tracing.settings, as this creates transitive cyclic load dependencies (e.g., settings/core -> tracing/settings -> tracing/core -> events/impl -> events/core).

Instead, tracing/core.clj uses requiring-resolve for settings access:

;; CORRECT — lazy runtime resolution, no compile-time dependency
((requiring-resolve 'metabase.tracing.settings/tracing-enabled))

;; WRONG — creates cyclic load dependency
(require '[metabase.tracing.settings :as settings])
(settings/tracing-enabled)

External library namespaces (clj-otel API, SDK, exporters) are safe to require normally — they don't participate in Metabase namespace cycles.

Important: requiring-resolve must use literal quoted symbols. Kondo hooks validate that required-namespaces are all simple symbols, so dynamic construction fails:

;; CORRECT — literal quoted symbol
(requiring-resolve 'metabase.tracing.settings/tracing-endpoint)

;; WRONG — kondo hook rejects this: "Assert failed: (every? simple-symbol? required-namespaces)"
(requiring-resolve (symbol "metabase.tracing.settings" "tracing-endpoint"))

Quick Checklist

When adding tracing spans:

  • Module has tracing in its :uses set in .clj-kondo/config/modules/config.edn
  • Added [metabase.tracing.core :as tracing] to ns requires (alphabetically sorted)
  • Span wraps a meaningful I/O boundary (not pure computation)
  • Group matches the domain (check src/metabase/tracing/core.clj for registered groups; add a new one if none fit)
  • Span name follows dot-notation convention ("domain.subsystem.operation")
  • Attributes use namespaced keywords (:search/query-length, :db/id)
  • No sensitive data in attributes (use best-effort-sanitize-sql for HoneySQL, never raw SQL)
  • No new tracing namespaces created (add to tracing.core instead)
  • No DO_NOT_ADD_NEW_FILES_HERE.txt violations in the target directory
  • Run clj-kondo --lint <files> to verify 0 errors, 0 warnings
  • Add or update tests in the corresponding test/ path (see Testing section below)
  • Run tests: clojure -X:dev:test :only <test-ns>

The with-span Macro

(tracing/with-span group span-name attrs & body)
  • group - A keyword selecting which trace group this span belongs to (e.g., :tasks, :sync)
  • span-name - A string identifying the span in traces (e.g., "search.execute")
  • attrs - A map of span attributes (e.g., {:db/id 42})
  • body - The code to execute inside the span

When disabled: zero overhead -- single atom deref + boolean check, body runs directly. When enabled: creates OTel span AND injects trace_id/span_id into Log4j2 MDC for log-to-trace correlation.

Trace Groups

Groups are registered in src/metabase/tracing/core.clj. Check that file for the current list. The general rule: match the group to the domain, not the call site. If code runs inside a Quartz job but is logically search work, use :search, not :tasks.

To add a new group:

;; In src/metabase/tracing/core.clj
(register-group! :my-domain "Description of what this covers")

Users enable groups via MB_TRACING_GROUPS=tasks,search,sync (comma-separated, or "all").

Naming Conventions

Span Names

Use dot-separated hierarchical names: "domain.subsystem.operation". The domain prefix should match the group name:

search.execute              -- `:search` group
sync.fingerprint.table      -- `:sync` group
task.session-cleanup.delete -- `:tasks` group
db-app.collection-items     -- `:db-app` group

Attributes

Use namespaced keywords. The namespace groups related attributes:

:db/id              -- Database ID (integer)
:db/engine          -- Database engine name (string)
:db/statement       -- Sanitized SQL (string, via best-effort-sanitize-sql)
:search/engine      -- Search engine name (string)
:search/query-length -- Query string length (integer)
:sync/table         -- Table name (string)
:sync/step          -- Sync step name (string)
:task/name          -- Task name (string)
:http/method        -- HTTP method (string)
:http/url           -- Request URL (string)

Invent new namespaced attributes as needed (e.g., :pulse/id, :transform/count). Keep values as primitives (strings, numbers, booleans) -- no maps or collections.

Step-by-Step: Adding a Span

1. Check module boundaries

Look up the module for your namespace in .clj-kondo/config/modules/config.edn. If tracing is not in the module's :uses set, add it (keep alphabetically sorted):

my-module
{:team "MyTeam"
 :uses #{analytics config tracing util}}

2. Add the require

(ns metabase.my-module.thing
  (:require
   [metabase.tracing.core :as tracing]
   [metabase.util :as u]))

best-effort-sanitize-sql is available from tracing.core — no additional require needed.

3. Identify the I/O boundary

Only wrap code at meaningful I/O boundaries:

DO trace:

  • External API calls (embedding APIs, metabot, webhooks)
  • Database queries (both app DB and user DB)
  • Network requests (HTTP calls to external services)
  • Heavy batch processing (batch indexing, batch embedding)
  • Top-level orchestration functions that coordinate multiple sub-operations

DO NOT trace:

  • Pure computation (sorting, filtering, mapping)
  • Simple single-row lookups (t2/select-one :model/Setting :key k)
  • Every function in a call chain (only boundaries matter)
  • Trivial operations (string formatting, hash calculations)

4. Wrap with with-span

;; Simple span (no attributes needed)
(tracing/with-span :search "search.init-index" {}
  (do-expensive-thing))

;; Span with static attributes
(tracing/with-span :sync "sync.fingerprint.table"
                   {:db/id (:db_id table)
                    :sync/table (:name table)}
  (fingerprint-fields! table fields))

;; Span with computed attributes
(tracing/with-span :search "search.execute"
                   {:search/engine       (name (:search-engine ctx))
                    :search/query-length (count (:search-string ctx))}
  (search.engine/results ctx))

;; Span with sanitized SQL (for dynamic HoneySQL queries)
(let [hsql {:delete-from [(t2/table-name :model/Session)]
            :where [:< :created_at oldest-allowed]}]
  (tracing/with-span :tasks "task.session-cleanup.delete"
                     {:db/statement (tracing/best-effort-sanitize-sql hsql)}
    (t2/query-one hsql)))

;; Sub-spans breaking a function into I/O phases
(let [embedding (tracing/with-span :search "search.semantic.embedding"
                                   {:search.semantic/provider (:provider model)}
                  (get-embedding model search-string))
      results   (tracing/with-span :search "search.semantic.db-query" {}
                  (into [] xform reducible))]
  (process results))

;; Per-item iteration spans
(doseq [e (search.engine/active-engines)]
  (tracing/with-span :search "search.ingestion.update" {:search/engine (name e)}
    (search.engine/update! e batch)))

5. Add tests

Create or update tests in the corresponding test/ path. Follow the patterns in existing tracing tests:

  • Reference tests: test/metabase/tracing/quartz_test.clj, test/metabase/server/middleware/trace_test.clj
  • Use tracing/init-enabled-groups! / tracing/shutdown-groups! with try/finally to manage group lifecycle
  • Test both enabled and disabled paths (verify zero overhead when group is off)
  • Use reify mocks for Java interfaces (Connection, PreparedStatement, JobListener, etc.)
  • Add (set! *warn-on-reflection* true) and type-hint proxy/reify calls to avoid reflection warnings
(deftest my-span-enabled-test
  (testing "when group is enabled, span is created"
    (try
      (tracing/init-enabled-groups! "my-group" "INFO")
      ;; ... test that span behavior occurs ...
      (finally
        (tracing/shutdown-groups!)))))

(deftest my-span-disabled-test
  (testing "when group is disabled, code runs without tracing"
    (tracing/shutdown-groups!)
    ;; ... test that code still works, no wrapping applied ...
    ))

6. Lint and run tests

# Lint modified source and test files — expect 0 errors, 0 warnings
clj-kondo --lint path/to/modified/file.clj path/to/test/file.clj

# Run tests (requires Java 21+)
  clojure -X:dev:test :only my-ns.test-ns

Expect: all tests pass, 0 failures, 0 errors, no reflection warnings from your files.

Sanitizing SQL for Attributes

When including SQL in span attributes, always use tracing/best-effort-sanitize-sql. This converts HoneySQL maps to parameterized SQL strings where values become ? placeholders -- no data leaks.

(let [hsql {:delete-from [:core_session]
            :where [:< :created_at some-timestamp]}]
  (tracing/with-span :tasks "task.cleanup.delete"
                     {:db/statement (tracing/best-effort-sanitize-sql hsql)}
    (t2/query-one hsql)))
;; Trace attribute: db/statement = "DELETE FROM core_session WHERE created_at < ?"

Rules:

  • Never put raw SQL strings or user-provided values in attributes
  • Use best-effort-sanitize-sql only for app DB (HoneySQL) queries
  • For external/user DB queries, trace only timing and counts, not SQL content

Defjob and Root Spans

The defjob macro in metabase.task.impl automatically wraps every Quartz job with a :tasks root span:

(task/defjob ^{DisallowConcurrentExecution true} SessionCleanup [_]
  (cleanup-sessions!))
;; Automatically creates span: "task.SessionCleanup" {:task/name "SessionCleanup"}

You do NOT need a root span inside defjob bodies. Add child spans for I/O inside the job.

For code on plain Threads (not Quartz), add the root span manually:

(defn init! []
  (tracing/with-span :search "search.task.init" {}
    (search/init-index!)))

What NOT to Do

Span Usage Mistakes

;; WRONG - pure computation, no I/O
(tracing/with-span :search "search.format-results" {}
  (map format-result results))

;; WRONG - trivial single-row lookup
(tracing/with-span :db-app "db-app.get-setting" {}
  (t2/select-one :model/Setting :key "my-setting"))

;; WRONG - raw SQL in attributes (data leak)
(tracing/with-span :tasks "task.cleanup" {:db/statement raw-sql-string}
  (execute! raw-sql-string))

;; WRONG - wrong group (search work should use :search, not :tasks)
(tracing/with-span :tasks "search.execute" {} ...)

;; WRONG - redundant nesting (do-search already has a span)
(tracing/with-span :search "search.process" {}
  (let [results (do-search ctx)]
    (tracing/with-span :search "search.format" {}
      (format-results results))))

Architecture Mistakes

;; WRONG - creating a new tracing namespace
(ns metabase.tracing.my-feature ...)

;; WRONG - requiring internal tracing namespaces from outside the module
(ns metabase.my-module.thing
  (:require [metabase.tracing.attributes :as trace-attrs]      ;; internal!
            [metabase.tracing.settings :as tracing.settings]   ;; internal!
            [metabase.tracing.quartz :as tracing.quartz]))     ;; internal!

;; WRONG - adding compile-time requires to tracing/core.clj for settings or SDK
;; This creates cyclic load dependencies because tracing/core is widely required
(ns metabase.tracing.core
  (:require [metabase.tracing.settings :as settings]))  ;; causes cycle!

;; WRONG - dynamic symbol construction with requiring-resolve (kondo rejects it)
(requiring-resolve (symbol "metabase.tracing.settings" "tracing-enabled"))

Configuration

All settings are env-var-only (defined in src/metabase/tracing/settings.clj):

# Core
MB_TRACING_ENABLED=true              # Enable tracing (default: false)
MB_TRACING_ENDPOINT=host:4317        # OTLP collector endpoint (default: http://localhost:4317)
MB_TRACING_GROUPS=tasks,search,sync  # Comma-separated groups or "all" (default: all)
MB_TRACING_SERVICE_NAME=metabase     # Service name in traces (default: hostname)
MB_TRACING_LOG_LEVEL=DEBUG           # Log threshold for traced threads: TRACE/DEBUG/INFO (default: INFO)

# Batch span processor tuning
MB_TRACING_MAX_QUEUE_SIZE=2048       # Max spans queued for export; drops when full (default: 2048)
MB_TRACING_EXPORT_TIMEOUT_MS=10000   # Max wait for batch export to complete (default: 10000)
MB_TRACING_SCHEDULE_DELAY_MS=5000    # Delay between consecutive batch exports (default: 5000)
Install via CLI
npx skills add https://github.com/metabase/metabase --skill add-tracing
Repository Details
star Stars 47,704
call_split Forks 6,551
navigation Branch main
article Path SKILL.md
More from Creator