name: archestra-dev-observability description: Use when changing Archestra tracing, metrics, OpenTelemetry, Tempo, Grafana, Prometheus, LLM/MCP spans, observability labels, or local observability setup.
Archestra Observability
Use this skill before changing tracing, metrics, span naming, metric labels, or local observability setup.
Run commands from platform/ unless specifically instructed otherwise.
Naming new attributes and metrics
Before introducing any new span attribute or metric name, look it up — do not coin a name from intuition.
- Span attributes: search the OTEL semantic-convention registry and use the existing attribute verbatim if one fits. Registry: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/ (wider set: https://opentelemetry.io/docs/specs/semconv/registry/attributes/). Example: prompt-cache tokens are
gen_ai.usage.cache_read.input_tokensandgen_ai.usage.cache_creation.input_tokens, not a customarchestra.usage.*. Only use anarchestra.*name when nothing in the registry fits, and say why in a comment. - "Not yet stable" is not a reason to avoid a standard name. The whole
gen_ai.*namespace is Development-stability, including thegen_ai.usage.*attributes already emitted here — match that bar, don't custom-namespace to dodge it. - Metrics: match the existing
llm_*/ prom-client family and label names inmetrics/; don't introduce a new metric style. Add a label value to an existing metric only if it won't change what current aggregates mean — otherwise add a dedicated metric (cache tokens use a separatellm_cache_tokens_total, not newtypevalues onllm_tokens_total).
Local setup
tilt trigger observability
docker compose -f dev/docker-compose.observability.yml up -d
tilt trigger observability starts the full observability stack: Tempo, OTEL Collector, Prometheus, and Grafana.
The docker-compose command is an alternative local setup with pre-configured datasources.
Local URLs
- Tempo API:
http://localhost:3200/. - Grafana:
http://localhost:3002/. - Prometheus:
http://localhost:9090/. - Backend metrics:
http://localhost:9050/metrics.
Tracing
- Follow OTEL GenAI Semantic Conventions (see "Naming new attributes and metrics" — check the registry before adding any attribute): https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/.
- LLM spans use
gen_ai.agent.id,gen_ai.agent.name,gen_ai.provider.name,gen_ai.request.model,gen_ai.operation.name, andarchestra.agent.label.<key>for dynamic agent labels. - MCP spans use
gen_ai.tool.nameandmcp.server.name. - Team metadata uses the custom
archestra.<scope>.team.*namespace (no OTEL registry equivalent), where scope is the principal the teams belong to —agent(the executing agent's teams) oruser(the requesting user's teams).archestra.<scope>.team.ids/.namesare array-valued (a principal can belong to multiple teams), andarchestra.<scope>.team.label.<key>carries team labels merged per key across the principal's teams. Set viasetTeamAttributes(span, teams, scope)inobservability/tracing/attributes.ts; agent teams come fromAgentTeamModel.getTeamLabelInfoForAgentand user teams fromTeamModel.getTeamLabelInfoForUser, resolved once per request. - Session tracking uses
gen_ai.conversation.idfrom theX-Archestra-Session-Idheader. - Span names are
chat {model},generate_content {model}, andexecute_tool {tool_name}. - Agent label keys are fetched from the database on startup and included as resource attributes.
- Traces are stored in Grafana Tempo.
- User identity is tracked with
archestra.user.id,archestra.user.email, andarchestra.user.namewhen available. - LLM spans include
archestra.costin USD andgen_ai.usage.total_tokens.
Metrics
- Prometheus metrics
llm_request_duration_secondsandllm_tokens_totalincludeagent_id,agent_name,agent_type,external_agent_id, and dynamic agent labels as dimensions. agent_idis internal.external_agent_idcomes from the client-provided header and is used for agent execution metrics.- MCP metrics include
agent_id,agent_name, andagent_type. - Metrics are reinitialized on startup with current label keys from the database.