name: cx-data-pipeline description: > Use this skill when the user asks to "set up parsing", "create parsing rule", "extract fields from logs", "regex extraction", "log parsing", "enrich logs", "add context to logs", "custom enrichment table", "lookup table", "geo enrichment", "create metric from logs", "events to metrics", "convert logs to metrics", "generate metrics from events", "recording rule", "precomputed metrics", "PromQL recording", "configure data pipeline", "transform log data", "data processing rules", "rule group", "enrichment settings", "E2M definition", "labels cardinality", "bulk delete rules", "enrichment limits", "search enrichment table", "what should I convert to metrics", "E2M not producing metrics", "E2M no series", "reduce log cost with E2M", "logs to metrics aggregation", "spans to metrics", or wants to configure how Coralogix processes, enriches, or transforms ingested data. metadata: version: "0.1.0"
Data Pipeline Skill
Use this skill when configuring how Coralogix processes, enriches, and transforms data. It covers parsing rules (extract structured fields from raw logs), enrichments (add context from lookup tables), Events2Metrics (derive metrics from log/span events), and recording rules (precompute PromQL expressions).
CLI Commands
| Command | Subcommands | Purpose |
|---|---|---|
cx parsing-rules |
list, get, create, update, delete, bulk-delete, usage-limits |
Manage log parsing rules |
cx enrichments |
list, add, remove, overwrite, limit, settings |
Manage enrichment rules |
cx enrichments custom |
list, get, create, update, delete, search |
Manage custom enrichment tables |
cx e2m |
list, get, create, update, delete, labels-cardinality, limits |
Manage Events2Metrics definitions |
cx recording-rules |
list, get, create, update, delete |
Manage Prometheus recording rule groups |
Key flags:
- All create/update operations use
--from-file <path>(or-for stdin) - All commands support
-o jsonfor structured output and-p <profile>for profile selection cx parsing-rules updateandcx recording-rules updaterequire both--from-fileand the rule group IDcx enrichments custom searchrequires--id <table-id>and--query <text>cx parsing-rules bulk-deleterequires--ids <id1> <id2> ...
Working with JSON Payloads
These commands use complex JSON structures. Always template from an existing resource to avoid format errors:
# 1. Get an existing resource as a template
cx parsing-rules get <rule-group-id> -o json > template.json
# 2. Modify the template (change fields, remove the ID for create operations)
# 3. Create or update
cx parsing-rules create --from-file template.json
cx parsing-rules update --from-file template.json <rule-group-id>
This pattern applies to all create/update operations across all 4 commands. It prevents payload format errors that are the #1 cause of failed attempts.
Parsing Rules Workflow
1. List Existing Rules
cx parsing-rules list -o json
cx parsing-rules list -o json | jq '[.[] | {id, name, enabled, rule_count: (.rules | length)}]'
2. Get a Template
cx parsing-rules get <existing-rule-group-id> -o json > rule-template.json
3. Create New Rule Group
Edit the template for your new service, then:
cx parsing-rules create --from-file rule-template.json
4. Verify Parsing
Query recent logs to confirm fields are extracted (load cx-telemetry-querying for log querying):
cx logs 'source logs | filter $d.subsystem == "my-service" | limit 10' -o json
5. Check Usage Limits
cx parsing-rules usage-limits -o json
Enrichment Workflow
1. List Enrichment Rules
cx enrichments list -o json
cx enrichments settings -o json
cx enrichments limit -o json
2. Create Custom Enrichment Table (if needed)
cx enrichments custom list -o json
cx enrichments custom create --from-file table-definition.json
table-definition.json must use the v5 JSON shape (inline file content, not multipart file=@...):
{
"name": "IP Lookup",
"description": "Maps IPs to locations",
"file": {
"textual": "ip,city\n1.2.3.4,London",
"extension": "csv",
"name": "lookup.csv",
"size": 24
}
}
For updates, include customEnrichmentId (number) plus the same fields.
3. Add Enrichment Rules
cx enrichments add --from-file enrichment-rules.json
enrichment-rules.json must use requestEnrichments (not enrichments from list output). Each enrichmentType is an object, not a string:
{
"requestEnrichments": [
{
"fieldName": "sourceIPs",
"enrichmentType": { "geoIp": { "withAsn": true } }
}
]
}
Other types: {"aws": {"resourceType": "ec2"}}, {"suspiciousIp": {}}, {"customEnrichment": {"id": 1}}.
4. Search Custom Table Data
cx enrichments custom search --id <table-id> --query "search term"
5. Verify Enriched Fields
Query logs on hot storage (FrequentSearch tier) to confirm enriched fields appear. Avoid querying archive for verification - ingestion delays can cause false negatives.
cx logs 'source logs | filter $d.enriched_field != null | limit 5' -o json
Events2Metrics Workflow
E2M derives Prometheus metrics from log/span events. See references/e2m-schemas.md for the full JSON wire format, enums, and cardinality rules.
How E2M is computed (read this first)
E2M aggregates events as they stream through the real-time ingestion pipeline into metric series (~1-min resolution). It is forward-only — metrics start from the moment the E2M is created; there is no backfill.
All ingested data flows through the pipeline; a TCO policy routes each stream into a tier, and the tier decides what's possible:
| TCO tier | Storage | E2M / alerts / dashboards |
|---|---|---|
| High | Frequent Search (hot, OpenSearch) | ✅ available |
| Medium | S3 archive (not hot storage) | ✅ available — still processed by the pipeline |
| Low | Compliance only | ❌ no aggregation features |
| Blocked | dropped | ❌ |
The axis is tier / processing level — NOT "Frequent Search vs archive" (Medium is archive and E2M works on it). Do not tell users to "point E2M at archive instead of Frequent Search" — that is incorrect.
1. Design the metric
Choose logs2metrics vs spans2metrics, the source field(s) + aggregations, and labels (with cardinality in mind — see references/e2m-schemas.md). To scope the E2M to a dataset, set the optional dataSource field to "<dataspace>/<dataset>"; this requires the account feature e2m_dataset_source_enabled (otherwise the API rejects it with "dataSource is not enabled for this company"). Omit it for the standard logs/spans stream.
2. Size it: check limits & cardinality
cx e2m limits -o json # account E2M count limit + used
cx e2m labels-cardinality -o json # see caveat below
The labels-cardinality endpoint is a draft forecast — given proposed labels + query it returns the per-day distinct-permutation count over the last 7 days, so you can size a design before creating it. But cx e2m labels-cardinality currently takes no arguments, so it sends no draft and returns an empty list (a CLI gap — it can't forecast yet). Until that's wired up, forecast via the UI or estimate permutations manually (product of distinct label values) and set permutationsLimit. Never use high-cardinality fields (IDs, raw URLs, IPs) as labels. Note the forecast only sees Frequent-Search (High-tier) data.
3. Template from an existing definition
Only cx e2m get returns the full payload ({"e2m": {...}}); list prints a summary. Extract .e2m and drop read-only fields:
cx e2m get <existing-e2m-id> -o json | jq '.e2m | del(.id, .permutations, .createTime, .updateTime, .metricName)' > e2m.json
4. Create the E2M
cx e2m create --from-file e2m.json
5. Verify the metric
Confirm series are being produced (load cx-telemetry-querying for metrics querying):
cx metrics search --name "<targetBaseMetricName>"
cx metrics query "<target_metric_name>" --time now
Troubleshooting: E2M produces no metric series
- Check the source data's TCO tier — if it's routed to Low/compliance (or blocked), E2M cannot run. Fix with a TCO change (
cx tco list/cx-cost-optimization), not an E2M change. - Verify the query matches streaming data — run the E2M's
lucenefilter as a livecx logs/cx spansquery and confirm it returns recent results. Notecx logsqueries Frequent-Search (High-tier) by default; for a Medium-tier (archive) source add--tier archive, since the data won't appear in a default Frequent-Search query even though E2M still produces series. - Remember it's forward-only — no series exist for data ingested before the E2M was created.
Cost optimization: convert High-tier logs to metrics
When the aggregated/metric view is what the customer most cares about, convert High-tier logs → metrics, then downgrade the raw logs High → Medium. Medium still supports E2M/alerts/dashboards and costs less (S3 archive, no hot storage) — you keep cheap, detailed metrics while dropping expensive Frequent-Search retention.
- Find high-volume High-tier sources:
cx usage summary/cx tco list(seecx-cost-optimization). - Confirm which fields drive dashboards/alerts (see
cx-telemetry-querying). - Build + verify the E2M first (steps above).
- Then change the TCO policy to move the raw logs High → Medium. Keep data on High or Medium (both support E2M); do not drop it to Low/compliance if metrics or alerts are still needed.
Recording Rules Workflow
1. List Existing Recording Rules
cx recording-rules list -o json
cx recording-rules list -o json | jq '[.[] | {id, name, rules: [.rules[]?.record]}]'
2. Get a Template
cx recording-rules get <existing-id> -o json > recording-rule-template.json
3. Create Recording Rule Group
cx recording-rules create --from-file recording-rule-group.json
4. Verify with PromQL
Confirm the precomputed metric is available (load cx-telemetry-querying for metrics querying):
cx metrics query "new_precomputed_metric" --time now
Key Principles
- Always template from existing -
cx <command> get <id> -o json > template.jsonbefore any create - Verify after create - query logs/metrics to confirm the pipeline change took effect
- Use
-o json- all payload inspection and creation should use JSON output - Check limits first -
cx parsing-rules usage-limitsandcx e2m limitsbefore creating to avoid hitting caps - Bulk operations - use
cx parsing-rules bulk-delete --idsfor cleanup, not individual deletes
Additional Resources
Reference Files
references/e2m-schemas.md- Complete Events2Metrics JSON wire format:type/aggTypeenum values,logsQuery/spansQueryfilters, metric labels & fields, the TCO-tier compute model, cardinality/permutations sizing, and gotchas
Related Skills
cx-telemetry-querying- discover what data is available before configuring pipeline, and verify parsing results, enriched fields, and E2M metric series via log/metrics queriescx-cost-optimization- find high-volume High-tier sources worth converting to metrics, and move the raw logs High→Medium (TCO) after the E2M is verified