name: logql-observability description: "Queries workload logs with LogQL on Control Plane. Use to troubleshoot a workload from its logs, or for log search, access or egress logs, cron run logs, missing or truncated logs, retention, or logs in Grafana."
LogQL & Log Observability
Tool availability: some MCP tools named here live in the
fulltoolset profile — if one is not advertised on this connection, tell the user to reconnect the MCP server with?toolsets=full(or use thecplnCLI fallback). Reads and deletes work on every profile via the genericlist_resources/get_resource/delete_resourcetools.
Control Plane stores workload stdout/stderr in Loki and queries it with LogQL. The org is the Loki tenant — it comes from the endpoint path, so org is never a query label and queries cannot cross orgs. Reading logs requires the org-level readLogs permission, and the in-pod CPLN_TOKEN cannot authenticate to the logs endpoint — use a user or service-account token (see the workload skill). The recurring agent failure is passing a raw query to the MCP tool alongside structured params: a raw query replaces them entirely (the tool rejects the combination), so a raw query must embed every label itself.
Two ways to query
- MCP (primary for agents):
mcp__cpln__get_workload_logs— structured paramsgvc(required),workload,container,location,filter(literal substring, LogQL|=, not regex); windowsince(default1h) orfrom/to(ISO 8601,frominclusive,toexclusive);limit(default 30, max 999, single request —truncated: truemeans narrow the window or filter harder);order(oldest_firstdefault, ornewest_first). For regex, parsers, or thereplica/stream/versionlabels, pass a rawquery(max 500 chars) instead of the structured selectors. - CLI:
cpln logs '<LOGQL>'— interactive debugging (live--tail), scripts, CI.
# Defaults: --since 1h, --limit 30, --direction forward
cpln logs '{gvc="GVC", workload="WORKLOAD"}' --org ORG
cpln logs '{gvc="GVC", workload="WORKLOAD"} |= "error"' --limit 100
cpln logs '{gvc="GVC", workload="WORKLOAD", container="main"}' --since 7d
cpln logs '{gvc="GVC", workload="WORKLOAD"}' --tail # live follow; the server ends a tail session after 30m
cpln logs '{gvc="GVC", workload="WORKLOAD"}' \
--from 2026-06-01T00:00:00Z --to 2026-06-02T00:00:00Z # ISO 8601 only; from inclusive, to exclusive
cpln logs '{gvc="GVC", workload="WORKLOAD"} |= "error"' --since 24h --limit 0 # 0 = unlimited, auto-paginates
cpln logs '{gvc="GVC", workload="WORKLOAD"}' -o jsonl # one JSON object per line; -o raw = bare lines
--sincetakes relative durations (ms s m h d w mo y, compound like1h30m).--from/--totake ISO 8601 timestamps only —--from now-2dand--from 7dfail withCannot parse ... into a valid date.cpln workload eventlog WORKLOAD(aliascpln workload log) is resource event history, not container output — for application logs always usecpln logs.
Labels
| Label | Value |
|---|---|
gvc |
GVC name |
workload |
Workload name |
container |
Container name, or a built-in stream below |
location |
Deployment location, e.g. aws-us-east-1 |
provider |
Cloud provider |
replica |
Replica (pod) name — unique per cron execution |
stream |
stdout or stderr |
version |
Workload deployment version that wrote the line |
At least one non-empty matcher is required; regex matchers work — {gvc=~".+"} spans every GVC in the org.
Filters and LogQL features
| Operator | Meaning | Example |
|---|---|---|
|= "text" |
contains | |= "error" |
!= "text" |
does not contain | != "health" |
|~ "regex" |
matches regex | |~ "timeout|crash" |
!~ "regex" |
does not match | !~ "debug|trace" |
Loki is current (3.x), so full LogQL works: parsers (| json, | logfmt, | pattern, | regexp), post-parse label filters (| latency > 100), line_format, and metric queries (count_over_time, rate, sum ... by). The CLI and MCP tool print log lines only — run metric queries in Grafana: the Explore on Grafana link on the console Logs page opens it with the query prefilled (the org grafanaAdmin permission grants the Grafana Admin role; everyone else is Viewer).
{gvc="GVC", workload="WORKLOAD"} |= "error" != "health" # errors minus noise
{gvc="GVC", workload="WORKLOAD"} |~ "panic|fatal|exception" # crashes and stack traces
{gvc="GVC", workload="WORKLOAD", container="_accesslog"} |= "\" 50" # HTTP 5xx in access logs
sum(count_over_time({gvc="GVC", container="_accesslog"} |= "\" 50"[1m])) by (workload) # 5xx rate (Grafana)
Built-in log streams
| Selector | Contents |
|---|---|
container="_accesslog" |
Inbound requests (Envoy access-log format) on the workload's ports |
container="_requestlog" |
The workload's outbound (egress) requests through the sidecar, same format |
workload="_loadbalancer" |
Access logs of the GVC's dedicated load balancer |
container="_alerts" |
Threat-detection (Falco) alerts, with extra labels rule, priority, source |
Platform health probes and unroutable-request noise are filtered out of _accesslog by design, so probe traffic never shows up there. Sidecar and system containers (istio-init, istio-validation, cpln-*, debugger-*) are never collected.
Why logs go missing (pipeline limits)
- Lines over 16 KiB are cut at 16 KiB with a
... [truncated N bytes]suffix; empty lines are dropped. - Per-replica rate limit: each replica+container pair is capped (roughly 10,000 lines/s on managed clusters; effectively unlimited on BYOK). Excess lines are dropped, and when collection resumes one marker line appears:
#### Replica logs were rate-limited: N lines in the last Xs were not collected ####. - Retention: org spec
observability.logsRetentionDays, default 30 (0 turns log collection off entirely; theorg-managementskill owns org spec edits). Queries beyond retention return nothing, not an error. - Server caps: one query may span at most 31 days and times out after 2 minutes; tail sessions end after 30 minutes.
Cron workloads: logs for one execution
{gvc=, workload=} on a cron workload interleaves every past run. Each execution runs in its own replica, so scope to one run with the replica label plus the execution's time window.
1. List executions. mcp__cpln__list_deployments with location returns the full deployment JSON; status.jobExecutions[] holds the per-execution metadata (without location the tool returns only a readiness summary). CLI:
cpln workload get-deployments WORKLOAD --gvc GVC -o json | jq '
[(.items // .)[] as $d | $d.status.jobExecutions[]? | . + {location: $d.name}]
| sort_by(.startTime)[] | {location, name, status, startTime, completionTime, replica}'
Schema facts (nodelibs cronjob.ts) that bite scripts:
statusis one ofsuccessful | failed | active | pending | invalid | removed(defaultpending). Running meansstatus == "active"AND nocompletionTime; a missingcompletionTimealone proves nothing (the schema warns it is not an indication of success).completionTime: nullis stripped server-side andstartTimemay be absent for never-started runs — each field is present-with-value or absent, so guard flags with${VAR:+--from "$VAR"}and jq// empty.replicais optional: when missing, the run never got a pod and there are no logs — diagnose with the execution'scontainersmap andmessagefield (aggregated pod events) andmcp__cpln__get_workload_events.
2. Query that replica, time-bounded. MCP: raw query embedding ALL labels (structured params must be omitted) plus from/to. CLI:
cpln logs '{gvc="GVC", workload="WORKLOAD", location="LOCATION", replica="REPLICA-ID"}' \
--org ORG --from START_TIME --to COMPLETION_TIME
jobExecutions timestamps are ISO 8601 and pass straight through. Pad the window a minute or two on each side, and widen it before concluding logs do not exist. For a live run (active, no completionTime), drop --to and add --tail.
Quick reference
| Tool | Use |
|---|---|
mcp__cpln__get_workload_logs |
LogQL queries — structured params or raw query |
mcp__cpln__list_deployments |
Deployment health; cron status.jobExecutions (pass location) |
mcp__cpln__get_workload_events |
Probe failures, scheduling, restarts — events, not app logs |
CI/CD and headless use: set CPLN_TOKEN and run cpln logs directly (no profile needed); the principal must hold org readLogs.
Troubleshooting
| Symptom | Cause and fix |
|---|---|
403 requires permission "readLogs" in org |
Grant readLogs on the org via policy (it implies view) |
Cannot parse now-2d into a valid date |
--from/--to are ISO 8601 only; relative lookback is --since |
| Empty result for known activity | Window outside retention (default 30d), span over 31 days, or filters too narrow; app may not write to stdout/stderr |
#### Replica logs were rate-limited ... #### marker |
Per-replica cap was hit; lines in that interval are gone — reduce log volume |
Line ends with ... [truncated N bytes] |
16 KiB per-line cap — emit smaller lines |
Health checks absent from _accesslog |
Filtered by design; probe failures surface in mcp__cpln__get_workload_events |
MCP rejects raw query combined with workload etc. |
A raw query replaces the structured params — embed all labels in the query itself |
cpln workload log shows no app output |
That is the eventlog alias; use cpln logs |
Related skills
| Skill | Owns |
|---|---|
workload |
Deploy and diagnose flow, injected CPLN_* env vars, canonical URLs |
metrics-observability |
PromQL, default metrics, Grafana alert rules, Prometheus federation |
external-logging |
Shipping logs to S3, Datadog, Coralogix, and other providers |
mk8s-byok |
mk8s cluster logs add-on (cluster_name / namespace labels) |