stack-health

star 25

Check observability stack component health, verify data ingestion, and troubleshoot common issues.

opensearch-project By opensearch-project schedule Updated 3/22/2026

name: stack-health description: Check observability stack component health, verify data ingestion, and troubleshoot common issues. allowed-tools: - Bash - curl

Stack Health and Troubleshooting

Overview

This skill provides health check commands, data verification queries, and troubleshooting guidance for the observability stack. Use it to verify that OpenSearch, Prometheus, the OTel Collector, and Data Prepper are running correctly, and to diagnose data flow problems.

Credentials are read from the .env file (default: admin / My_password_123!@#). All OpenSearch curl commands use HTTPS with -k to skip TLS certificate verification for local development.

Connection Defaults

Variable Default Description
OPENSEARCH_ENDPOINT https://localhost:9200 OpenSearch base URL
OPENSEARCH_USER admin OpenSearch username
OPENSEARCH_PASSWORD My_password_123!@# OpenSearch password

Health Checks

OpenSearch Cluster Health

Check the overall cluster status (green, yellow, or red):

curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" "$OPENSEARCH_ENDPOINT/_cluster/health?pretty"

A healthy cluster returns "status": "green" or "status": "yellow" (yellow is normal for single-node development clusters).

Prometheus Health

Verify Prometheus is running and healthy:

curl -s "$PROMETHEUS_ENDPOINT/-/healthy"

Returns Prometheus Server is Healthy. when operational.

OTel Collector Metrics

Check the OpenTelemetry Collector's internal metrics to verify it is receiving and exporting telemetry:

curl -s http://localhost:8888/metrics

Look for otelcol_receiver_accepted_spans_total, otelcol_exporter_sent_spans_total, and otelcol_exporter_send_failed_spans_total in the output to confirm data flow. (OTel Collector metrics use the _total suffix for counters.)

OpenSearch Index Listing

List all indices to verify data ingestion has created the expected trace, log, and service map indices:

curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" "$OPENSEARCH_ENDPOINT/_cat/indices?v"

You should see indices matching otel-v1-apm-span-*, logs-otel-v1-*, and otel-v2-apm-service-map if data is flowing.

Data Verification

Trace Document Count

Verify trace data exists by counting documents in the trace index:

curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
  -X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
  -H 'Content-Type: application/json' \
  -d '{"query": "source=otel-v1-apm-span-* | stats count()"}'

Log Document Count

Verify log data exists by counting documents in the log index:

curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
  -X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
  -H 'Content-Type: application/json' \
  -d '{"query": "source=logs-otel-v1-* | stats count()"}'

A count of 0 in either query indicates no data has been ingested for that signal. See the Troubleshooting section below.

Docker Compose Diagnostics

Check Container Status

View the status of all stack containers:

docker compose ps

All services should show Up or Up (healthy). If a service is restarting or exited, check its logs.

View Service Logs

View logs for a specific service:

docker compose logs <service-name>

Data Prepper Logs

Check Data Prepper for pipeline errors or OpenSearch connection issues:

docker compose logs data-prepper

OTel Collector Logs

Check the OTel Collector for receiver, processor, or exporter errors:

docker compose logs otel-collector

Troubleshooting Common Failures

OpenSearch Unreachable

Symptoms: Connection refused on port 9200, curl commands timeout or fail.

Diagnostic steps:

  1. Check if the OpenSearch container is running:
    docker compose ps opensearch
    
  2. Verify port 9200 is exposed and listening:
    docker compose ps | grep 9200
    
  3. Check the OpenSearch health endpoint directly:
    curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" "$OPENSEARCH_ENDPOINT/_cluster/health?pretty"
    
  4. Check OpenSearch container logs for startup errors:
    docker compose logs opensearch
    
  5. If the container is restarting, check for memory issues — OpenSearch requires at least 512MB heap. Verify OPENSEARCH_JAVA_OPTS in docker-compose.yml.

No Data in Indices

Symptoms: Index listing shows no otel-v1-apm-* indices, or document counts are 0.

Diagnostic steps:

  1. Verify the OTel Collector is receiving data — check its metrics:
    curl -s http://localhost:8888/metrics | grep otelcol_receiver_accepted_spans_total
    
  2. Check the Data Prepper pipeline for errors:
    docker compose logs data-prepper | grep -i error
    
  3. Verify the OTLP endpoint is reachable from your application. The OTel Collector listens on:
    • gRPC: localhost:4317
    • HTTP: localhost:4318
  4. Send test telemetry and verify it appears:
    curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" "$OPENSEARCH_ENDPOINT/_cat/indices?v"
    
  5. Check that Data Prepper can connect to OpenSearch — look for authentication or TLS errors in Data Prepper logs.

Data Prepper Pipeline Errors

Symptoms: Data reaches the OTel Collector but does not appear in OpenSearch indices.

Diagnostic steps:

  1. Check Data Prepper logs for pipeline processing errors:
    docker compose logs data-prepper
    
  2. Look for OpenSearch connection failures, authentication errors, or index creation failures in the logs.
  3. Verify Data Prepper is receiving data from the OTel Collector on port 21890.
  4. Restart Data Prepper if configuration was changed:
    docker compose restart data-prepper
    

OTel Collector Export Failures

Symptoms: Applications send telemetry but data does not reach Data Prepper or Prometheus.

Diagnostic steps:

  1. Check the OTel Collector's internal metrics for export failures:
    curl -s http://localhost:8888/metrics | grep otelcol_exporter_send_failed
    
  2. Check OTel Collector logs for exporter errors:
    docker compose logs otel-collector
    
  3. Verify the collector can reach Data Prepper (data-prepper:21890) and Prometheus (prometheus:9090) on the Docker network.
  4. Check for batch processor backpressure or memory limiter drops in the collector metrics.

Port Reference

Component Port Protocol
OpenSearch 9200 HTTPS
OTel Collector (gRPC) 4317 gRPC
OTel Collector (HTTP) 4318 HTTP
Data Prepper 21890 HTTP
Prometheus 9090 HTTP
OpenSearch Dashboards 5601 HTTP

PPL Diagnostic Commands

Describe Index Mappings

Use the PPL describe command to inspect the field mappings and types of an index. This is useful for verifying which fields are available for querying:

curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
  -X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
  -H 'Content-Type: application/json' \
  -d '{"query": "describe otel-v1-apm-span-*"}'

Explain Query Execution Plan

Use the PPL _explain endpoint to debug query execution plans. This shows how OpenSearch will execute a PPL query without actually running it:

curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
  -X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl/_explain" \
  -H 'Content-Type: application/json' \
  -d '{"query": "source=otel-v1-apm-span-* | head 10"}'

This is useful for diagnosing slow queries, understanding how filters are applied, and verifying that field names resolve correctly.

Dynamic Index Discovery

List All Observability Indices

Discover which observability indices exist and their sizes:

curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
  "$OPENSEARCH_ENDPOINT/_cat/indices/otel-*,logs-otel-*?format=json&h=index,health,docs.count,store.size&s=index"

Get Index Field Mappings

Discover available fields in each index dynamically instead of relying on hardcoded field names:

curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
  "$OPENSEARCH_ENDPOINT/otel-v1-apm-span-*/_mapping?pretty"
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
  "$OPENSEARCH_ENDPOINT/logs-otel-v1-*/_mapping?pretty"

PPL Describe for Field Discovery

Use PPL describe to list all fields and types in an index:

curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
  -X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
  -H 'Content-Type: application/json' \
  -d '{"query": "describe otel-v1-apm-span-000001"}'
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
  -X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
  -H 'Content-Type: application/json' \
  -d '{"query": "describe logs-otel-v1-000001"}'

References

  • PPL Language Reference — Official PPL syntax documentation. Fetch this if queries fail due to OpenSearch version differences or new syntax.

AWS Managed Variants

Amazon OpenSearch Service Health Check

Replace the local endpoint and authentication with AWS SigV4:

curl -s --aws-sigv4 "aws:amz:REGION:es" \
  --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
  https://DOMAIN-ID.REGION.es.amazonaws.com/_cluster/health?pretty

Index listing on AWS managed OpenSearch:

curl -s --aws-sigv4 "aws:amz:REGION:es" \
  --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
  https://DOMAIN-ID.REGION.es.amazonaws.com/_cat/indices?v
  • Endpoint format: https://DOMAIN-ID.REGION.es.amazonaws.com
  • Auth: --aws-sigv4 "aws:amz:REGION:es" with --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY"
  • No -k flag needed — AWS managed endpoints use valid TLS certificates

Amazon Managed Service for Prometheus Health

Check Prometheus health on Amazon Managed Service for Prometheus (AMP):

curl -s --aws-sigv4 "aws:amz:REGION:aps" \
  --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
  https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query \
  --data-urlencode 'query=up'
  • Endpoint format: https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query
  • Auth: --aws-sigv4 "aws:amz:REGION:aps" with --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY"
  • PromQL query syntax is identical to local Prometheus; only the endpoint and authentication differ
Install via CLI
npx skills add https://github.com/opensearch-project/observability-stack --skill stack-health
Repository Details
star Stars 25
call_split Forks 27
navigation Branch main
article Path SKILL.md
More from Creator
opensearch-project
opensearch-project Explore all skills →