debug-missing-spans

star 6

Troubleshoot when expected OpenTelemetry spans don't reach the backend. Walks the chain top-to-bottom — code → SDK init → processor → exporter → network → backend ingest — with concrete tests at each step. Covers head sampling, ctx.waitUntil drops on Cloudflare, init-order races, runtime detection failures, propagation breaks, exporter auth errors, and silent ratelimits.

jagreehal By jagreehal schedule Updated 5/5/2026

name: debug-missing-spans description: > Troubleshoot when expected OpenTelemetry spans don't reach the backend. Walks the chain top-to-bottom — code → SDK init → processor → exporter → network → backend ingest — with concrete tests at each step. Covers head sampling, ctx.waitUntil drops on Cloudflare, init-order races, runtime detection failures, propagation breaks, exporter auth errors, and silent ratelimits. license: MIT

Debug missing spans

When a span you expect isn't in the backend, the cause is somewhere in this chain:

code → SDK init → head sampler → processor → exporter → network → backend ingest → backend index

This skill walks each link in order with a quick check you can run. Don't skip steps — the cause is rarely where you'd guess.

Step 0: Reproduce locally with the pretty exporter

Before chasing remote backends, confirm the span exists at all:

init({
  service: 'my-app',
  debug: 'pretty', // hierarchical colourised output to stdout
});

If you see the span in stdout, the SDK + sampler are fine — skip to "exporter / network". If you don't, keep reading.

Step 1: Is the SDK actually initialised?

Common failure: init() runs after the first request because of import-order.

import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('autotel-debug');
console.log(
  '[autotel-debug] tracer is no-op:',
  tracer.constructor.name === 'NoopTracer',
);

If true, init() ran too late. Move it to the very top of the entry file (or to instrumentation.ts for Next.js).

Step 2: Head sampler

Print the effective head rate:

import { getActiveConfig } from 'autotel-edge';
console.log('[autotel-debug] sampling:', getActiveConfig()?.sampling);

Common gotchas:

  • sampling.rates: { server: 5 } — 5 % means 95 % of spans never start.
  • Inheriting OTEL_TRACES_SAMPLER_ARG=0.01 from the environment via the OTel default sampler.
  • Your test happens to hit the unsampled branch — instrument with sampling: { rates: { server: 100 } } while reproducing.

To force sampling for one request, send a traceparent with the sampled flag set:

traceparent: 00-<traceid>-<spanid>-01

(-01 at the end = sampled.) autotel's parent-based sampler will respect it.

Step 3: Cloudflare Workers — ctx.waitUntil

The single biggest cause of missing spans on the edge: the response returned before the exporter flushed.

If you're using addEventListener('fetch', …) or a hand-rolled fetch in a module worker without wiring ctx.waitUntil(…) to the export call, async drains drop silently.

Fix — switch to defineWorkerFetch or wrapModule, both of which wire waitUntil automatically:

import { defineWorkerFetch } from 'autotel-cloudflare';

export default defineWorkerFetch(
  { service: { name: 'edge' } },
  async (request, env, ctx, log) => {
    // log.set / spans here all flush via ctx.waitUntil before response returns
    return new Response('ok');
  },
);

Step 4: Processor pipeline

Print what's wired:

import { trace } from '@opentelemetry/api';
const provider = trace.getTracerProvider();
console.log('[autotel-debug] provider:', provider.constructor.name);
console.log(
  '[autotel-debug] processors:',
  (provider as any)._registeredSpanProcessors?.map(
    (p: any) => p.constructor.name,
  ),
);

Common issues:

  • A FilteringSpanProcessor excludes your span. Check the include / exclude predicates.
  • A TailSamplingProcessor dropped the trace (no error, no slow root, no debug header).
  • A composePostProcessors step returns [] for your span.

To bisect, temporarily strip post-processors:

init({
  service: 'my-app',
  exporter: { url: process.env.OTLP_ENDPOINT! },
  // no postProcessor, no tail sampler, no filter
});

If the span shows up now, add back the processors one at a time.

Step 5: Exporter

Tail the SDK's diagnostic log:

import { diag, DiagConsoleLogger, DiagLogLevel } from '@opentelemetry/api';
diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);

Look for:

@opentelemetry/api: ... OTLPExporter: failed to send 4 traces, status: 401, error: ...

Common exporter errors:

Status Meaning Fix
401 Bad / missing auth header Check OTLP_HEADERS / vendor token name
403 Token has no write scope Issue a token with the right scope
404 Wrong endpoint URL Check region (api.honeycomb.io vs api.eu1.honeycomb.io)
413 Batch too big Lower BatchSpanProcessor maxExportBatchSize
429 Rate-limited Reduce head/tail rates; honour retry-after
502/503/504 Upstream unhealthy Often transient; add retries; check backend status
Network error DNS / firewall curl -v <url> from the same network

Step 6: Network / TLS

For self-hosted Collectors:

curl -v -X POST $OTLP_ENDPOINT \
  -H 'content-type: application/json' \
  -H "$AUTH_HEADER" \
  -d '{"resourceSpans":[]}'

Should return 200. If it doesn't, the problem is between you and the Collector — not autotel.

For Cloudflare Workers, run wrangler tail and look for OTLPExporter errors.

Step 7: Backend ingest — silent rejection

Some backends accept the request with a 200 but drop the events:

  • Honeycomb: dataset must exist and the API key must have write access to it. Mismatched key/dataset → silent drop.
  • Datadog: check service is set (resource attribute service.name) — they ignore spans without it.
  • Sentry: SDK version mismatch on envelope → 200 but events disappear.
  • Grafana Cloud Tempo: spans without service.name go to a fallback service called unknown_service.

For each backend, the dataset / index / project where you'd expect the span:

Backend Where the span lands
Honeycomb dataset = service.name (auto-created)
Datadog service:<name> filter
Grafana Tempo search by traceId
Jaeger service dropdown = service.name
Sentry project linked to the DSN

Step 8: Backend index lag

After a 200, expect ingestion lag of:

Backend Typical lag
Honeycomb < 5 s
Datadog 30–60 s
Grafana Tempo 10–30 s
Sentry 30–120 s
Self-hosted Jaeger < 1 s

Don't conclude the span is missing until you've waited > 2× the expected lag.

Step-by-step checklist

[ ] Span shows in `debug: 'pretty'` stdout
[ ] `tracer.constructor.name !== 'NoopTracer'` (SDK initialised)
[ ] Head rate is high enough to allow the request
[ ] Workers handler uses defineWorkerFetch / wrapModule
[ ] No post-processor / tail sampler / filter strips it
[ ] Exporter logs no 4xx/5xx
[ ] Curl to OTLP endpoint returns 200
[ ] Backend has the right service.name / dataset / project
[ ] Waited 2× expected ingest lag

When the trace partially shows up

Some spans land, some don't:

  • Trace context broken between services — outbound HTTP calls aren't propagating traceparent. Confirm autotel's global fetch instrumentation is on (instrumentation.instrumentGlobalFetch: true, default).
  • Async boundary loses context — a setTimeout / queue callback ran outside the AsyncLocalStorage scope. Wrap with trace() or use context.with().
  • Cross-runtime call — Node service → Workers → browser; verify traceparent arrives at each leg via response headers / network panel.

When the SDK itself crashes

TypeError: Cannot read properties of undefined (reading 'startActiveSpan')

Usually means the API version (@opentelemetry/api) and SDK version (@opentelemetry/sdk-trace-base) drifted. Run:

pnpm why @opentelemetry/api

There should be exactly one resolved version. If there are two, dedup via pnpm.overrides.

Anti-patterns to fix as you debug

Anti-pattern Why it loses spans
init() after the first import that uses tracing Spans before init() are no-ops
addEventListener('fetch', …) on Workers Pre-module-worker style; no ctx.waitUntil to wire
Single OTLP_ENDPOINT env var with ? chars URL-encoded Auth gets parsed as part of the path
Importing both @sentry/tracing and autotel Double-instrumentation eats spans
process.exit(0) immediately after the work The exporter never flushed; call await provider.shutdown() first
Install via CLI
npx skills add https://github.com/jagreehal/autotel --skill debug-missing-spans
Repository Details
star Stars 6
call_split Forks 2
navigation Branch main
article Path SKILL.md
More from Creator