opensearch

name: opensearch description: Guide for efficiently querying OpenSearch logs via MCP tools with minimal context consumption

OpenSearch MCP Usage Guide

You have access to an OpenSearch MCP server that queries container logs via OpenSearch Dashboards. The cluster has billions of documents. Always include time ranges to avoid timeouts.

Available Tools

Tool	Purpose
`opensearch_search`	Search logs with Lucene/KQL query syntax (primary tool)
`opensearch_search_raw`	Raw Query DSL for advanced queries
`opensearch_aggregate`	Aggregations (counts, terms, histograms)
`opensearch_get_indices`	List indices with doc counts
`opensearch_get_mappings`	Get field names/types from a sample doc
`opensearch_cluster_health`	Basic cluster health
`opensearch_switch_cluster`	Switch to a different cluster on-the-fly (no restart needed)
`opensearch_get_active_cluster`	Show currently active cluster name, URL, and cookie age

Context Optimization Strategy (CRITICAL)

The MCP server applies operations to reduce response size. Always optimize for minimal context usage.

Step 1: Start with summary_only to get counts

opensearch_search(index="container-logs-*", query_string="...", summary_only=true)

This returns only total_hits and time_range — costs ~100 tokens.

Step 2: If you need actual logs, use field filtering

opensearch_search(
  index="container-logs-*",
  query_string="...",
  fields=["@timestamp", "log", "kubernetes.namespace_name", "kubernetes.pod_name"],
  size=10
)

Only returns specified fields — saves 70-80% context vs full documents.

Step 3: For high-volume analysis, use aggregations instead of fetching docs

opensearch_aggregate(
  index="container-logs-*",
  aggs={"by_namespace": {"terms": {"field": "kubernetes.namespace_name.keyword", "size": 20}}},
  query={"bool": {"must": [...], "filter": [{"range": {"@timestamp": {"gte": "now-1h"}}}]}}
)

Key Parameters for opensearch_search

Parameter	Default	Purpose
`summary_only`	false	Set true to get only hit count, no documents
`auto_prune`	true	Strips kubernetes.labels and kubernetes.annotations automatically
`fields`	null	Array of specific fields to return (e.g., `["log", "@timestamp"]`)
`max_chars_per_hit`	2000	Truncates individual hits exceeding this size
`size`	100	Number of docs to return (max 1000)
`time_from`	now-15m	Start time (ISO 8601 or relative like `now-1h`)
`time_to`	now	End time

Reading _meta Flags

Every response includes a _meta.applied_operations array showing what the server did:

Flag	Meaning
`summary_only`	Only counts returned, no documents
`field_filter:field1,field2`	Only these fields were returned
`auto_prune:kubernetes.labels,kubernetes.annotations`	Verbose k8s fields were removed
`hits_truncated:N/M`	N out of M hits exceeded max_chars_per_hit and were truncated
`partial_results:100_of_50000`	Only 100 of 50000 total hits returned
`response_truncated_at_15KB`	Entire response exceeded 15KB and was cut off

When you see `response_truncated_at_15KB`:

Reduce size (e.g., size=5)
Use fields to select only needed fields
Use summary_only=true if you only need counts
Use opensearch_aggregate for analysis instead

When you see `partial_results`:

The query matched more documents than returned. If the user needs broader analysis, use aggregations.

When you see `hits_truncated`:

Individual log entries were too large. Use fields to pick only the fields you need, or increase max_chars_per_hit.

CRITICAL: Searching the `log` Field (Cluster-Specific Strategy)

The log field search strategy depends on the cluster type:

OnPrem Clusters (dev-onprem-, stg-onprem-, prod-onprem-*)

For onprem clusters, use query_string with analyze_wildcard: true and quoted wildcard patterns:

Search errors in logs (OnPrem)

opensearch_search_raw(
  index="container-logs-*",
  body={
    "query": {"bool": {"must": [
      {"query_string": {
        "query": "log:\"*level*error*\"",
        "analyze_wildcard": true,
        "time_zone": "Asia/Colombo"
      }}
    ], "filter": [
      {"range": {"@timestamp": {"gte": "now-5m", "lte": "now"}}}
    ]}},
    "size": 20,
    "_source": ["@timestamp", "log", "kubernetes.namespace_name", "kubernetes.pod_name"]
  }
)

Search by trace/request ID (OnPrem)

opensearch_search_raw(
  index="container-logs-*",
  body={
    "query": {"bool": {"must": [
      {"query_string": {
        "query": "log:\"*1d4867ac-65cb-4de8-8d46-aaef62f6b5fb*\"",
        "analyze_wildcard": true,
        "time_zone": "Asia/Colombo"
      }}
    ], "filter": [
      {"range": {"@timestamp": {"gte": "now-1h", "lte": "now"}}}
    ]}},
    "size": 100,
    "sort": [{"@timestamp": "asc"}],
    "_source": ["@timestamp", "log", "kubernetes.namespace_name", "kubernetes.pod_name", "kubernetes.container_name"]
  }
)

Key points for OnPrem:

Use query_string with analyze_wildcard: true
Wrap the pattern in double quotes: "*pattern*" not *pattern*
Include time_zone: "Asia/Colombo" for consistency with dashboard
This approach matches what OpenSearch Dashboards UI does

Cloud Clusters (AWS/Azure: dev-aws-, prod-azure-, stg-azure-*)

For cloud clusters (AWS/Azure), the log field is mapped as keyword (not analyzed text). Use wildcard queries:

Search errors in logs (Cloud)

opensearch_search_raw(
  index="container-logs-*",
  body={
    "query": {"bool": {"must": [
      {"range": {"@timestamp": {"gte": "now-5m", "lte": "now"}}},
      {"wildcard": {"log": "*level*error*"}}
    ]}},
    "size": 20,
    "_source": ["@timestamp", "log", "kubernetes.namespace_name", "kubernetes.pod_name"]
  }
)

Search by trace/request ID (Cloud)

opensearch_search_raw(
  index="container-logs-*",
  body={
    "query": {"bool": {"must": [
      {"range": {"@timestamp": {"gte": "now-1h", "lte": "now"}}},
      {"wildcard": {"log": "*77e71a17-2e52-404a-86d2-eed997fd2a57*"}}
    ]}},
    "size": 20,
    "_source": ["@timestamp", "log", "kubernetes.namespace_name", "kubernetes.pod_name"]
  }
)

Key points for Cloud:

Use wildcard query (NOT query_string)
No quotes needed around the pattern
query_string with log:*pattern* returns 0 hits on these clusters

Aggregate error logs by namespace (Works on both)

opensearch_aggregate(
  index="container-logs-*",
  query={"bool": {"must": [
    {"range": {"@timestamp": {"gte": "now-5m", "lte": "now"}}},
    {"wildcard": {"log": "*level*error*"}}
  ]}},
  aggs={"namespaces": {"terms": {"field": "kubernetes.namespace_name", "size": 10}}}
)

How to Determine Cluster Type

Check the active cluster name using opensearch_get_active_cluster:

If cluster name contains onprem → use query_string with analyze_wildcard: true
Otherwise (AWS/Azure) → use wildcard queries

Note: query_string and opensearch_search work fine for non-keyword fields like kubernetes.namespace_name, stream, kubernetes.pod_name, etc. on all clusters.

Common Query Patterns (non-log fields — use opensearch_search)

Search by namespace

query_string: 'kubernetes.namespace_name:"my-namespace"'

Search by pod name

query_string: 'kubernetes.pod_name:"my-pod-abc123"'

Search stderr logs

query_string: 'stream:stderr'

Combine namespace filter with log content search (use opensearch_search_raw)

opensearch_search_raw(
  index="container-logs-*",
  body={
    "query": {"bool": {"must": [
      {"range": {"@timestamp": {"gte": "now-5m", "lte": "now"}}},
      {"term": {"kubernetes.namespace_name": "my-namespace"}},
      {"wildcard": {"log": "*timeout*"}}
    ]}},
    "size": 20,
    "_source": ["@timestamp", "log", "kubernetes.pod_name"]
  }
)

Useful Fields for `fields` Parameter

Field	Description
`@timestamp`	Log timestamp
`log`	The actual log message
`stream`	stdout or stderr
`kubernetes.namespace_name`	K8s namespace
`kubernetes.pod_name`	K8s pod name
`kubernetes.container_name`	Container name
`kubernetes.host`	Node name
`kubernetes.pod_ip`	Pod IP address
`kubernetes.labels.organization_id`	Org ID (when auto_prune=false or use aggs)
`kubernetes.labels.env_name`	Environment name
`kubernetes.labels.component_name`	Component name

Time Handling

The cluster stores timestamps in UTC
The user is in IST (UTC+5:30) — convert accordingly
IST 10:45 AM = UTC 05:15 AM
Use relative times when possible: now-5m, now-1h, now-24h

Cluster Map

IMPORTANT: When the user mentions a cluster, first read clusters.py to get the available clusters and their short names. The cluster registry is built from the .env file (URLs) and the DESCRIPTIONS dict in clusters.py.

Example Cluster Format

Users configure their cluster URLs in .env (copied from .env.example). Example entries:

CLUSTER_DEV_AWS_EU_CP=https://opensearch-dashboard.dev.example.com
CLUSTER_PROD_AWS_EU_CDP=https://opensearch-dashboard.prod.example.com

At runtime, clusters.py loads these into the CLUSTERS dict as (url, description) tuples.

Common aliases

When the user says any of these, map to the corresponding cluster:

"prod" / "production" → prod-azure-us-cdp
"dev" / "development" → dev-azure-us-cdp
"stg" / "staging" → stg-us-cdp
"dev onprem" → dev-onprem-cp or dev-onprem-dp (ask which)
"prod eu" → prod-eu-cdp
"dev eu" → dev-eu-cdp
"tenant-a" → prod-tenant-a-userprod
"tenant-c" / "tc" → prod-tenant-c
"tenant-d" → prod-tenant-d
"tenant-b" → prod-tenant-b

If the user asks to query a cluster that has No OpenSearch, inform them it uses Azure Log Analytics Workspace instead and is not queryable through this MCP.

Switching Clusters

When the user wants to query a different cluster, use the opensearch_switch_cluster tool:

opensearch_switch_cluster(cluster="prod-azure-eu-cdp")

This automatically fetches cookies via headless SSO and switches all subsequent queries to the new cluster. No restart needed.

If the tool returns an error (SSO session expired), instruct the user to run:

cd /path/to/opensearch-agent/opensearch-mcp
./get-cookies.py <cluster-short-name>

This opens a browser for manual login. After login, retry — no restart needed.

Use opensearch_get_active_cluster to check which cluster is currently active before switching.

Cookie Management and 401 Handling

The MCP server has automatic cookie refresh. Here's how it works:

Auto-refresh (transparent to you)

When a request gets 401, the server automatically launches a headless Playwright browser
It uses the cached Azure AD SSO session to get fresh cookies
Saves them to cookies.json and retries the request
You (Claude) never see the 401 — it's handled internally

Cluster switching also refreshes cookies

When you call opensearch_switch_cluster, it fetches fresh cookies for the target cluster via headless SSO. If that fails, the tool returns an error with manual instructions.

When auto-refresh fails

If the SSO session itself has expired (user hasn't logged in via browser recently), auto-refresh fails. The server returns a structured error with action_required and a command to run.

When you see this error, instruct the user:

The OpenSearch cookies have expired and automatic refresh failed (SSO session expired).
To fix, run:

  cd /path/to/opensearch-agent/opensearch-mcp
  ./get-cookies.py <cluster-name>

This opens a browser for you to log in. After login completes, cookies are
saved automatically. **No Claude Code restart needed** — just retry the query.

Available clusters: ./get-cookies.py --list

Important: After the user runs the script, you CAN retry the query immediately — no restart needed. The server reads cookies.json fresh on every request.

Cost-Conscious Query Plan

For any user request, follow this order:

Count first — summary_only=true to understand volume
Sample if large — size=5, fields=[...] to understand shape
Aggregate if analytical — use opensearch_aggregate for breakdowns
Full fetch only when needed — small result sets with field filtering