name: exploring-apm-traces description: > Investigates distributed application performance using PostHog APM (OpenTelemetry span) data via MCP. Use when the user asks about service traces, slow HTTP/database spans, error spans, trace IDs, or span attributes — not AI observability traces or product logs. Uses posthog:query-apm-spans, posthog:apm-trace-get, posthog:apm-services-list, posthog:apm-attributes-list, and posthog:apm-attribute-values-list.
Exploring APM traces (OpenTelemetry spans)
PostHog captures distributed traces from OpenTelemetry. Each trace is a tree of spans representing a request's path through services.
Disambiguation: This skill is for APM / OpenTelemetry traces. Do not confuse with AI observability traces (agent/model $ai_* events) or logs (posthog:query-logs, posthog:logs-*).
Available tools
| Tool | Purpose |
|---|---|
posthog:query-apm-spans |
Search and filter spans (compact list view) |
posthog:apm-trace-get |
Get the full span list for one hex trace_id |
posthog:apm-spans-aggregate |
Per-operation aggregates (count, p50/p95, errors) |
posthog:apm-spans-tree |
Call-tree aggregates per (parent, child) edge |
posthog:apm-services-list |
List distinct service names |
posthog:apm-attributes-list |
List span or resource attribute keys |
posthog:apm-attribute-values-list |
List values for a specific attribute key |
See references/spans-and-fields.md for the response schema and the kind/status_code enums.
Workflow: debug a trace from a URL
Step 1 — Fetch the trace
posthog:apm-trace-get
{
"trace_id": "<hex_trace_id>"
}
The response is { results: [span, span, …] } — a flat list of every span in the trace.
The list can be very large for fan-out request flows; when it exceeds the inline limit, Claude Code auto-persists it to a file.
From the result you get:
- Every span with
name,service_name,kind,status_code,parent_span_id,duration_nano,is_root_span - The
_posthogUrl— always include this in your response so the user can click through to the UI
Step 2 — Parse large results with scripts
When the result is persisted to a file (traces with hundreds of spans across services), use the parsing scripts to explore it.
Start with the summary to get the full picture, then drill into specifics:
# 1. Overview: services, span count, slowest spans, errors
python3 scripts/print_summary.py /path/to/persisted-file.json
# 2. Indented chronological tree (DFS by parent_span_id)
python3 scripts/print_timeline.py /path/to/persisted-file.json
# 3. Drill into a specific span by name
SPAN="HTTP GET /api/users" python3 scripts/extract_span.py /path/to/persisted-file.json
# 4. Search for a keyword across span names, services, IDs
SEARCH="keyword" python3 scripts/search_spans.py /path/to/persisted-file.json
# 5. When the JSON shape looks unfamiliar
python3 scripts/show_structure.py /path/to/persisted-file.json
All scripts support MAX_LEN=N env var to control truncation (0 = unlimited).
Tree reconstruction (parent_span_id → span_id)
The flat span list is a tree. Each span carries:
trace_id— same on every span in the tracespan_id— this span's unique hex IDparent_span_id— points to the parent'sspan_id(zero-padded hex000…000for the root)is_root_span— convenience flag for the trace entry
To rebuild the tree:
- Spans where
is_root_spanis true (orparent_span_id == "00000000…") are root spans. - Every other span is a child of the span whose
span_idmatches itsparent_span_id. - Group by
parent_span_id, walk from each root downward.
scripts/print_timeline.py does this for you and prints a DFS-indented tree.
Investigation patterns
"Where is time going?"
- Run
print_summary.py— it surfaces the top-5 slowest spans byduration_nano. - For a noisy trace, run
print_timeline.pyand scan the indented durations — you can see whether time is dominated by one child span or fan-out across many. - To dig into one slow span,
SPAN="<name>" python3 scripts/extract_span.py FILE.
"Where did the error happen?"
print_summary.pylists every span withstatus_code == 2(Error). Each entry shows service, span name, and parent context.- Walk up the tree from an error span via
parent_span_idto see what request path led there. - Error detail lives in each span's
attributesmap (e.g.exception.message,exception.type), which is returned in the trace payload — read it directly off the error span.apm-attribute-values-listis for discovering values across spans, not a prerequisite for reading one span's attributes.
"Did the request hit service X?"
- Run
print_summary.py— it prints the set of services involved in the trace. - If service X is missing, the request never reached it (or instrumentation is missing — check
apm-services-listto confirm X has emitted spans recently at all).
"Did the fan-out look right?"
print_timeline.pyshows the indentation — wide trees mean parallel calls, deep trees mean sequential dependencies.- Look for spans of kind
Client(3) followed by matchingServer(2) spans on the called service — that's a synchronous downstream call.
Searching by attribute (e.g. http.method=POST)
Each span carries an attributes map (span-level OTel attributes like http.method, db.statement) in the payload — so for a span you already have, just read it. Resource attributes (k8s labels, service.version) are not in the payload. To filter the whole dataset by an attribute:
- Use
apm-attributes-list/apm-attribute-values-listto discover keys and values (resource attributes especially). - Re-issue
query-apm-spanswith afilterGroupentry of typespan_attributeorspan_resource_attribute.
Constructing UI links
apm-trace-get and query-apm-spans return _posthogUrl — always surface this to the user so they can verify in the PostHog UI.
When presenting findings, include the relevant PostHog URL.
Finding traces
Use posthog:query-apm-spans to search and filter spans. Note this returns spans, not a tree — pass query.traceId or grab a trace_id from the results and feed it to apm-trace-get for the tree.
Discover before filtering
Before constructing filters, discover what's actually in the project:
- Confirm services exist — call
apm-services-listto see which services have emitted spans. - Find filterable attributes — call
apm-attributes-listwithattribute_type: "span"or"resource". - Get actual values — call
apm-attribute-values-listwith a key to see the real values in use.
Only then construct query-apm-spans filters. Custom attributes vary per project and cannot be guessed.
By filters
posthog:query-apm-spans
{
"query": {
"serviceNames": ["api-gateway"],
"dateRange": {"date_from": "-1h"},
"filterGroup": [
{"key": "http.status_code", "operator": "gt", "type": "span_attribute", "value": "499"}
]
}
}
By trace ID (when known)
posthog:apm-trace-get
{
"trace_id": "0123456789abcdef0123456789abcdef"
}
Common gotchas
- Durations are nanoseconds. 1 second =
1_000_000_000. Filter values inquery-apm-spansfordurationare also nanoseconds. status_code == 2is Error.0is Unset,1is OK. UseOKto match{0, 1}in the UI filter.kindis an integer 0–5: 0 Unspecified, 1 Internal, 2 Server, 3 Client, 4 Producer, 5 Consumer.parent_span_idof a root span is"0000000000000000"(16 zero hex chars, matching the 8-byte span ID width — not the 16-byte trace ID width), not null.
Parsing large trace results
Trace tool results are JSON. When too large to read inline, Claude Code persists them to a file.
Persisted file format
[{ "type": "text", "text": "{\"results\": [...], \"_posthogUrl\": \"...\"}" }]
Every script in scripts/ unwraps this envelope before parsing.
Trace JSON structure
results (array of span dicts)
└── each span:
├── uuid, trace_id, span_id, parent_span_id (hex strings)
├── name, kind (int 0–5), service_name
├── status_code (int 0–2), is_root_span (bool)
├── timestamp, end_time (ISO 8601)
├── duration_nano (int, nanoseconds)
├── attributes (map of span-level OTel attributes, e.g. db.statement, http.url)
└── matched_filter (0/1 — 1 if this span matched the query-apm-spans filter, 0 if it
only shares a trace with a match; always present, only meaningful from query-apm-spans)
Available scripts
| Script | Purpose | Usage |
|---|---|---|
print_summary.py |
Trace metadata, services, slowest spans, errors | python3 scripts/print_summary.py FILE |
print_timeline.py |
DFS-indented tree from parent_span_id walk |
python3 scripts/print_timeline.py FILE |
extract_span.py |
Full row + parent/children for spans matching a name | SPAN="name" python3 scripts/extract_span.py FILE |
search_spans.py |
Find a keyword across name, service_name, IDs | SEARCH="kw" python3 scripts/search_spans.py FILE |
show_structure.py |
Show JSON keys and types without values | python3 scripts/show_structure.py FILE |
Tips
- Always set
dateRangeonquery-apm-spans— queries without a time range are slow. Default is-1h; widen only when needed. - Always include the
_posthogUrlin your response so the user can click through. - Span-level attributes are in the
apm-trace-get/query-apm-spanspayload (each span'sattributesmap). Resource attributes are not — useapm-attributes-list(typeresource) andapm-attribute-values-listfor those. is_root_spanis the cheap way to find the trace entry — don't string-match00000000….- For aggregates (p95 by operation, slowest children of a span), use
apm-spans-aggregatefor a flat view orapm-spans-treefor parent→child edges — don't reach for SQL.