elasticsearch-esql - SKILL.md Agent Skill

name: elasticsearch-esql description: > Execute ES|QL (Elasticsearch Query Language) queries, use when the user wants to query Elasticsearch data, analyze logs, aggregate metrics, explore data, or create charts and dashboards from ES|QL results. metadata: author: elastic version: 0.1.1

Elasticsearch ES|QL

Execute ES|QL queries against Elasticsearch.

What is ES|QL?

ES|QL (Elasticsearch Query Language) is a piped query language for Elasticsearch. It is NOT the same as:

Elasticsearch Query DSL (JSON-based)
SQL
EQL (Event Query Language)

Prerequisite: ES|QL requires _source to be enabled on queried indices. Indices with _source disabled (e.g., "_source": { "enabled": false }) will cause ES|QL queries to fail.

Version Compatibility: ES|QL was introduced in 8.11 (tech preview) and became GA in 8.14. Features like LOOKUP JOIN (8.18+), MATCH (8.17+), and INLINE STATS (9.2+) were added in later versions. On pre-8.18 clusters, use ENRICH as a fallback for LOOKUP JOIN (see generation tips). INLINE STATS and counter-field RATE() have no fallback before 9.2. Check references/esql-version-history.md for feature availability by version.

Cluster Detection: Use the GET / response to determine the cluster type and version:

build_flavor: "serverless" — Elastic Cloud Serverless. Reports version 8.11.0 but supports all latest ES|QL features. Ignore the version number and assume all GA and preview features are available.

build_flavor: "default" — Self-managed or Elastic Cloud Hosted. Use version.number for feature availability.

Snapshot builds have version.number like 9.4.0-SNAPSHOT. Strip the -SNAPSHOT suffix and use the major.minor for version checks. Snapshot builds include all features from that version plus potentially unreleased features from development — if a query fails with an unknown function/command, it may simply not have landed yet. Elastic employees commonly use snapshot builds for testing.

Environment Configuration

Elasticsearch connection is configured via environment variables. Run node scripts/esql.js test to verify the connection. If the test fails, suggest these setup options to the user, then stop. Do not try to explore further until a successful connection test.

Option 1: Elastic Cloud (recommended for production)

export ELASTICSEARCH_CLOUD_ID="deployment-name:base64encodedcloudid"
export ELASTICSEARCH_API_KEY="base64encodedapikey"

Option 2: Direct URL with API Key

export ELASTICSEARCH_URL="https://elasticsearch:9200"
export ELASTICSEARCH_API_KEY="base64encodedapikey"

Option 3: Basic Authentication

export ELASTICSEARCH_URL="https://elasticsearch:9200"
export ELASTICSEARCH_USERNAME="elastic"
export ELASTICSEARCH_PASSWORD="changeme"

Option 4: Local Development with start-local

For local development and testing, use start-local to quickly spin up Elasticsearch and Kibana using Docker or Podman:

curl -fsSL https://elastic.co/start-local | sh

After installation completes, Elasticsearch runs at http://localhost:9200 and Kibana at http://localhost:5601. The script generates a random password for the elastic user and an API key, both stored in the .env file inside the created elastic-start-local folder.

To configure the environment variables for this skill, source the .env file and export the connection settings:

source elastic-start-local/.env
export ELASTICSEARCH_URL="$ES_LOCAL_URL"
export ELASTICSEARCH_API_KEY="$ES_LOCAL_API_KEY"

Then run node scripts/esql.js test to verify the connection.

Optional: Skip TLS verification (development only)

export ELASTICSEARCH_INSECURE="true"

Usage

Get Index Information (for schema discovery)

node scripts/esql.js indices                    # List all indices
node scripts/esql.js indices "logs-*"           # List matching indices
node scripts/esql.js schema "logs-2024.01.01"   # Get field mappings for an index
node scripts/esql.js sample "filebeat-*"        # Show recent docs with populated fields
node scripts/esql.js sample "filebeat-*" 'kubernetes.namespace == "production"'  # With filter

Execute Raw ES|QL

node scripts/esql.js raw "FROM logs-* | STATS count = COUNT(*) BY host.name | SORT count DESC | LIMIT 5"

Execute with TSV Output

node scripts/esql.js raw "FROM logs-* | STATS count = COUNT(*) BY component | SORT count DESC" --tsv

TSV Output Options:

--tsv or -t: Output as tab-separated values (clean, no decorations)
--no-header: Omit the header row

Test Connection

node scripts/esql.js test

Guidelines

Detect deployment type: Always run node scripts/esql.js test first. This detects whether the deployment is a Serverless project (all features available) or a versioned cluster (features depend on version). The build_flavor field from GET / is the authoritative signal — if it equals "serverless", ignore the reported version number and use all ES|QL features freely.
Ask about environment: If the user asks about logs for a specific service but doesn't specify the environment (e.g., production, staging), ask which environment they mean before querying. Different environments are typically separated by kubernetes.namespace (e.g., production, staging).
Discover schema (required — never guess index or field names):
```
node scripts/esql.js indices "pattern*"
node scripts/esql.js schema "index-name"
```
Always run schema discovery before generating queries. Index names and field names vary across deployments and cannot be reliably guessed. Even common-sounding data (e.g., "logs") may live in indices named logs-test, logs-app-*, or application_logs. Field names may use ECS dotted notation (source.ip, service.name) or flat custom names — the only way to know is to check.
Discover populated fields with sample: Different applications store log content in different fields. The message field may be null if logs are structured JSON — in that case the content is in json.message, json.level, json.logger_name, json.stack_trace, etc. Always run sample first to see which fields have data:
```
node scripts/esql.js sample "filebeat-*" 'kubernetes.namespace == "production" AND kubernetes.pod.name LIKE "myapp*"'
```
This shows recent docs with only the populated fields, so you immediately know which fields to use in queries.

Common Kubernetes log fields:
- @timestamp — log timestamp
- message — log message body (may be null for structured JSON logs)
- json.message — log message in structured JSON logs (Spring, Logback, etc.)
- json.level — structured log level (e.g., INFO, ERROR, WARN)
- json.logger_name — logger class name
- json.stack_trace — exception stack trace
- kubernetes.namespace — k8s namespace (e.g., production, staging)
- kubernetes.pod.name — pod name (contains service identity, e.g., pandora-spring-xxx)
- kubernetes.container.name — container name (generic, e.g., spring, celery, redis)
- kubernetes.deployment.name — deployment name
- kubernetes.labels.app — app label
Important: The service identity is usually in kubernetes.pod.name (e.g., pandora-spring-*), not kubernetes.container.name (which is generic like spring). Use pod name with LIKE for service-specific queries.

Always filter by time first: Unfiltered queries on large indices (hundreds of millions of docs) will time out, especially with LIKE or RLIKE on the message field. Always add a @timestamp filter as the first WHERE condition to narrow the scan range before applying text filters.

// GOOD: time filter first, then text filter
FROM filebeat-* | WHERE @timestamp > NOW() - 2 hours AND kubernetes.namespace == "production" AND message LIKE "*ERROR*"

// BAD: text filter on entire index — will time out
FROM filebeat-* | WHERE message LIKE "*ERROR*"

Choose the right ES|QL feature for the task: Before writing queries, match the user's intent to the most appropriate ES|QL feature. Prefer a single advanced query over multiple basic ones.
- "find patterns," "categorize," "group similar messages" → CATEGORIZE(field)
- "spike," "dip," "anomaly," "when did X change" → CHANGE_POINT value ON key
- "trend over time," "time series" → STATS ... BY BUCKET(@timestamp, interval) or TS for TSDB
- "search," "find documents matching" → MATCH, QSTR, KQL
- "count," "average," "breakdown" → STATS with aggregation functions
Read the references before generating queries:
- Generation Tips - key patterns (TS/TBUCKET/RATE, per-agg WHERE, LOOKUP JOIN, CIDR_MATCH), common templates, and ambiguity handling
- Time Series Queries - read before any TS query: inner/outer aggregation model, TBUCKET syntax, RATE constraints
- ES|QL Complete Reference - full syntax for all commands and functions
- ES|QL Search Reference — for full-text search queries (MATCH, QSTR, KQL, scoring)
Generate the query following ES|QL syntax:
- Start with FROM index-pattern (or TS index-pattern for time series indices)
- Add WHERE for filtering (use TRANGE for time ranges on 9.3+)
- Use EVAL for computed fields
- Use STATS ... BY for aggregations
- For time series metrics: TS with SUM(RATE(...)) for counters, AVG(...) for gauges, and TBUCKET(interval) for time bucketing — see the TS section in Generation Tips for the three critical syntax rules
- For detecting spikes, dips, or anomalies, use CHANGE_POINT after time-bucketed aggregation
- Add SORT and LIMIT as needed

Execute with TSV flag:

node scripts/esql.js raw "FROM index | STATS count = COUNT(*) BY field" --tsv

ES|QL Quick Reference

Version availability: This section omits version annotations for readability. Check ES|QL Version History for feature availability by Elasticsearch version.

Basic Structure

FROM index-pattern
| WHERE condition
| EVAL new_field = expression
| STATS aggregation BY grouping
| SORT field DESC
| LIMIT n

Common Patterns

Filter and limit:

FROM logs-*
| WHERE @timestamp > NOW() - 24 hours AND level == "error"
| SORT @timestamp DESC
| LIMIT 100

Aggregate by time:

FROM metrics-*
| WHERE @timestamp > NOW() - 7 days
| STATS avg_cpu = AVG(cpu.percent) BY bucket = DATE_TRUNC(1 hour, @timestamp)
| SORT bucket DESC

Top N with count:

FROM web-logs
| STATS count = COUNT(*) BY response.status_code
| SORT count DESC
| LIMIT 10

Text search: Use MATCH, QSTR, or KQL for full-text search instead of LIKE/RLIKE — they are significantly faster and support relevance scoring. See ES|QL Search Reference for the full guide.

FROM documents METADATA _score
| WHERE MATCH(content, "search terms")
| SORT _score DESC
| LIMIT 20

Log categorization (Platinum license): Use CATEGORIZE to auto-cluster log messages into pattern groups. Prefer this over running multiple STATS ... BY field queries when exploring or finding patterns in unstructured text.

FROM logs-*
| WHERE @timestamp > NOW() - 24 hours
| STATS count = COUNT(*) BY category = CATEGORIZE(message)
| SORT count DESC
| LIMIT 20

Change point detection (Platinum license): Use CHANGE_POINT to detect spikes, dips, and trend shifts in a metric series. Prefer this over manual inspection of time-bucketed counts.

FROM logs-*
| STATS c = COUNT(*) BY t = BUCKET(@timestamp, 30 seconds)
| SORT t
| CHANGE_POINT c ON t
| WHERE type IS NOT NULL

Time series metrics:

// Counter metric: SUM(RATE(...)) with TBUCKET(duration)
TS metrics-tsds
| WHERE TRANGE(1 hour)
| STATS SUM(RATE(requests)) BY TBUCKET(1 hour), host

// Gauge metric: AVG(...) — no RATE needed
TS metrics-tsds
| STATS avg_cpu = AVG(cpu) BY service.name, bucket = TBUCKET(5 minutes)
| SORT bucket

Data enrichment with LOOKUP JOIN: Use RENAME when the join key has a different name in the source vs the lookup index. Use multiple fields in ON when the lookup table has a composite key.

// Field name mismatch — RENAME before joining
FROM support_tickets
| RENAME product AS product_name
| LOOKUP JOIN knowledge_base ON product_name

// Multi-field join (9.2+)
FROM application_logs
| LOOKUP JOIN service_registry ON service_name, environment
| KEEP service_name, environment, owner_team

Multivalue field filtering: Use MV_CONTAINS to check if a multivalue field contains a specific value. Use MV_COUNT to count values.

// Filter by multivalue membership
FROM employees
| WHERE MV_CONTAINS(languages, "Python")

// Find entries matching multiple values
FROM employees
| WHERE MV_CONTAINS(languages, "Java") AND MV_CONTAINS(languages, "Python")

// Count multivalue entries
FROM employees
| EVAL num_languages = MV_COUNT(languages)
| SORT num_languages DESC

Change point detection (alternate example): Use when the user asks about spikes, dips, or anomalies. Requires time-bucketed aggregation, SORT, then CHANGE_POINT.

FROM logs-*
| STATS error_count = COUNT(*) BY bucket = DATE_TRUNC(1 hour, @timestamp)
| SORT bucket
| CHANGE_POINT error_count ON bucket AS type, pvalue

Full Reference

For complete ES|QL syntax including all commands, functions, and operators, read:

ES|QL Complete Reference
ES|QL Search Reference - Full-text search: MATCH, QSTR, KQL, MATCH_PHRASE, scoring, semantic search
ES|QL Version History - Feature availability by Elasticsearch version
Query Patterns - Natural language to ES|QL translation
Generation Tips - Best practices for query generation
Time Series Queries - TS command, time series aggregation functions, TBUCKET
DSL to ES|QL Migration - Convert Query DSL to ES|QL

Error Handling

When query execution fails, the script returns:

The generated ES|QL query
The error message from Elasticsearch
Suggestions for common issues

Common issues:

Field doesn't exist → Always use get_schema and list_indices before writing a query. Never guess field or index names — they vary across deployments.
Type mismatch → Use type conversion functions (TO_STRING, TO_INTEGER, etc.)
Syntax error → Review ES|QL reference for correct syntax. Always use double quotes for strings, never single quotes.
No results → Check time range and filter conditions
Wrong function name → ES|QL uses underscored names: STD_DEV() not STDDEV(), MEDIAN_ABSOLUTE_DEVIATION() not MAD(). Use CONCAT() for strings, not +. Use CASE(cond, val, ...) not CASE WHEN...THEN...END.
Wrong date part → DATE_EXTRACT uses ES|QL part names: "hour_of_day" not "hour", "day_of_month" not "day", "month_of_year" not "month". Use DATE_DIFF("day", start, end) for date arithmetic, not subtraction.

Examples

# Schema discovery
node scripts/esql.js test
node scripts/esql.js indices "logs-*"
node scripts/esql.js schema "logs-2024.01.01"
node scripts/esql.js sample "filebeat-*" 'kubernetes.namespace == "production"'

# Execute queries
node scripts/esql.js raw "FROM logs-* | STATS count = COUNT(*) BY host.name | LIMIT 10"
node scripts/esql.js raw "FROM metrics-* | STATS avg = AVG(cpu.percent) BY hour = DATE_TRUNC(1 hour, @timestamp)" --tsv