name: dynatrace-dashboard description: Create and update Dynatrace dashboards for DSOA telemetry license: MIT compatibility: opencode metadata: audience: developers
Skill: Dynatrace Dashboard Creation and Deployment
Use this skill to create, update, convert, and deploy Dynatrace dashboards for DSOA telemetry visualisation.
File Locations
| Artefact | Path |
|---|---|
| Dashboard YAML source | docs/dashboards/<dashboard-name>/<dashboard-name>.yml |
| Dashboard readme | docs/dashboards/<dashboard-name>/readme.md |
| Screenshot placeholder | docs/dashboards/<dashboard-name>/img/.gitkeep |
| Dashboards index | docs/dashboards/README.md |
| Workflow YAML source | docs/workflows/<workflow-name>/<workflow-name>.yml |
| Workflow readme | docs/workflows/<workflow-name>/readme.md |
Dashboard names use a descriptive slug, not necessarily the plugin name,
since dashboards may span multiple plugins (e.g. snowpipes-monitoring,
tasks-pipelines, budgets-finops).
Metric / Attribute Reference
Before writing any DQL query, consult the plugin's semantic dictionary:
src/dtagent/plugins/<plugin-name>.config/instruments-def.yml
This is the authoritative source for:
- Metric keys (e.g.
snowflake.pipe.files.pending) - Dimensions (e.g.
snowflake.pipe.name,db.namespace) - Log/event attributes
- Telemetry types (metrics, logs, events, bizevents, spans)
All DSOA telemetry carries these standard dimensions on every record:
db.system == "snowflake"deployment.environment— Snowflake account identifierdsoa.run.plugin— plugin name (e.g."snowpipes")dsoa.run.context— context name (e.g."snowpipes_copy_history")
DQL Rules (Lessons Learned)
These rules come from real debugging sessions — follow them strictly:
Determine the actual telemetry type before writing any DQL. DSOA plugins can emit logs, events, metrics, or bizevents — and the plugin source is the only authoritative answer. Before writing a tile query, check the plugin's
_log_entries()call insrc/dtagent/plugins/<plugin>.py:- No
report_timestamp_events=Trueand noevent_payload_prepare→ logs only → usefetch logs report_timestamp_events=True→ timestamp events in addition to logs → usefetch eventsfor event tilesreport_all_as_events=True→ all rows as events → usefetch events- Metrics in
instruments-def.ymlwith a metric key → usetimeseries
Known emission types (verified):
tasksplugin (task_history,task_versions,serverless_taskscontexts): logs + metricsdynamic_tablesplugin (dynamic_tables,dynamic_table_refresh_history,dynamic_table_graph_history): logs + metricssnowpipesplugin: logs + events (timestamp events viareport_timestamp_events=True)sharesplugin (inbound_shares,outbound_sharescontexts): logs only — usefetch logs; thesharescontext usesreport_timestamp_events=Trueso those summary events go tofetch events, but per-share/per-grant detail rows are logsusersplugin (users,users_all_roles,users_all_privileges,users_direct_roles,users_removed_direct_rolescontexts): logs only in practice — althoughinstruments-def.ymlhasevent_timestamps, the EVENT_TIMESTAMPS contain stale dates (e.g.last_alteredfrom 2021), causing Dynatrace to silently drop events. Always usefetch logs. See rules 15-17.resource_monitorsplugin: logs + events + metrics (timeseriesfor credits metrics)query_historyplugin: logs + metrics (fetch logsfor detail rows,timeseriesfor execution time metrics)active_queriesplugin: logs only — reads INFORMATION_SCHEMA in real-timedata_volumeplugin: metrics + timestamp events (timeseriesfor storage/row metrics; timestamp events for table update/DDL dates) — usetimeseries-based variables (rule 22);fetch eventsworks for DDL timestamp events onceeventsis in the telemetry config- Default assumption for any new plugin context: logs only, unless
instruments-def.ymlor_log_entries()call explicitly showsreport_timestamp_events=Trueorreport_all_as_events=True
- No
No
fetch metricsfor DSOA data. DSOA does not use the standard Dynatrace metric ingestion pipeline. Usetimeserieswith a metric key frominstruments-def.yml, orfetch logsfor log-based aggregations.Prefer
filter: {}over post-pipe| filterfortimeseriesdimension filtering. Inline filters insidefilter: {}are applied before data is split byby:, so you avoid creating unnecessary series for dimensions you then immediately discard. Only dimensions used for display grouping should appear inby:. Post-pipe| filteris still required for dimensions that are inby:but not filterable inline (e.g. computed fields), but avoid it for raw dimension variables.For variable-driven dimension filters inside
filter: {}, usein()witharray($Var)— this works inside thefilter:block as of DQL 1.38:
timeseries v = sum(metric), by: { snowflake.task.name, deployment.environment }
, filter: {
db.system == "snowflake" and
in(deployment.environment, array($Accounts)) and
in(db.namespace, array($Database))
}
Null-or-match pattern for optional dimensions: Some records legitimately have
NULL for a dimension (e.g. Snowflake-internal serverless tasks have no
db.namespace). If you filter strictly with in(), those records are silently
dropped and cannot be seen even with the wildcard default. Use:
(isNull(db.namespace) or in(db.namespace, array($Database))) and
(isNull(snowflake.schema.name) or in(snowflake.schema.name, array($Schema)))
This preserves unattributed records when the variable is set to wildcard (*),
while still allowing the user to filter to a specific database/schema.
percentile()does not support iterative expressions fromtimeseries. Instead of:
timeseries v = avg(metric), by: { dim }
| summarize { p95 = percentile(v[], 95) }
Use fetch logs with percentile() directly, or use summarize with
avg() / max() which do support array aggregation.
Honeycomb tiles need scalar values, not timeseries arrays. Use
fetch logs | summarize ... by: { dim }— nottimeseries.Variable filters after
timeseriesmust usearray()wrapper:
timeseries v = sum(metric), by: { snowflake.pipe.name, deployment.environment }
| filter in(deployment.environment, array($Accounts))
| filter in(snowflake.pipe.name, array($Pipe))
$Variablein threshold expressions needstoDouble():
| filter value > toDouble($Threshold_Latency_Warning)
Pipe/task/table status is a string dimension, not a numeric metric. Query it from logs via
fetch logs | summarize, not from a metric series.dtctl auth logincan be run by the AI agent — it opens a browser tab for OAuth. Run it wheneverdtctl applyreturns a token/auth error.All dashboard tiles must apply the same global variable filters consistently. If a dashboard has
$Accounts,$Database,$Schema(or similar) variables, every data tile must filter by all of them — not just the ones that are "obviously relevant". Inconsistent filtering makes the dashboard feel broken (user selects a database and some tiles ignore it). If a telemetry context does not populate a dimension (e.g.db.namespaceis empty for some records), still apply the filter — real user data will be populated and the wildcard default (*) will pass all records through anyway. Fortimeseriestiles, add the dimension toby:and then apply| filter in(dim, array($Var))after thetimeseriesstep (rule 3 above). Document any known empty-field cases in the tile description or readme rather than silently dropping the filter.Always add
unitsOverridesfor every byte (data-size) metric field. Dynatrace does not auto-detect byte units from metric keys — if you omit aunitsOverridesentry, values are rendered as raw numbers (e.g.947121664) instead of human-readable storage (e.g.903.0 MiB). The correctunitCategoryfor storage metrics is"data"(not"data-information"). Apply this to every output field that carries bytes — including intermediate computed fields likevthat come from atimeseriesstep and are displayed in a bar chart or table:unitsOverrides: - identifier: total_bytes # the DQL field name, not the metric key unitCategory: data baseUnit: byte displayUnit: null decimals: 2 suffix: "" delimiter: false added: 1 # unique integer, use 1/2/3/... per tileFor
timeseriestiles that expose both a summarised field and the raw series array (v), add an override for each:unitsOverrides: - identifier: size # summarised / computed field unitCategory: data baseUnit: byte displayUnit: null decimals: 2 suffix: "" delimiter: false added: 1 - identifier: v # raw timeseries array shown in chart hover unitCategory: data baseUnit: byte displayUnit: null decimals: null suffix: "" delimiter: false added: 2Tiles that MUST have byte
unitsOverridesin a data-volume dashboard:- Any
singleValuetile summingsnowflake.data.size - Any
lineChart/barChartshowingsnowflake.data.sizeor its aliases - Any
tabletile with a column derived from a byte metric
- Any
davis.componentStatemust NOT appear on data tiles — only on markdown tiles. Thedavisblock shape differs by tile type:# ✅ CORRECT — markdown tile type: markdown davis: componentState: inputData: null # ✅ CORRECT — data tile type: data davis: enabled: false davisVisualization: isAvailable: true # ❌ WRONG — data tile with componentState causes "unable to load" crash type: data davis: enabled: false davisVisualization: isAvailable: true componentState: # ← DELETE THIS from all data tiles inputData: nullA dashboard with
componentStateon any data tile shows "Something went wrong / We were unable to load this dashboard" — even if the JSON structure and queries are otherwise valid. Always verify after writing tiles: data tiles have exactlyenabled+davisVisualization; markdown tiles have exactlycomponentState.honeycombdataMappingsis an object, not an array. Colouring goes incoloring.colorRules.# ✅ CORRECT visualizationSettings: honeycomb: shape: square legend: position: right dataMappings: value: state_code # object with single key "value" displayedFields: - snowflake.task.name - state labels: showLabels: true coloring: colorRules: - color: "var(--dt-colors-charts-apdex-excellent-default, #2a7453)" colorMode: single-color comparator: "=" field: state_code type: long # "long" for numeric, "string" for text value: 1 # ❌ WRONG — array dataMappings, thresholds at wrong level visualizationSettings: honeycomb: dataMappings: - valueField: state_code # ← wrong: array with valueField/labelField/colorField labelField: name colorField: status thresholds: # ← wrong level: thresholds here crashes the dashboard - field: status rules: [...]categoricalBarChartaxis fields are strings, not arrays.# ✅ CORRECT visualizationSettings: chartSettings: truncationMode: middle legend: hidden: true categoryOverrides: {} categoricalBarChartSettings: categoryAxis: snowflake.pipe.name # string categoryAxisLabel: Pipe valueAxis: count # string valueAxisLabel: Count thresholds: [] # ❌ WRONG — arrays crash the dashboard categoricalBarChartSettings: categoryAxis: - snowflake.pipe.name # ← must be a plain string valueAxis: - countUse
toBoolean()for boolean attribute comparisons — it handles both native booleans and strings. DSOA attributes likesnowflake.user.is_disabled,snowflake.user.has_mfa,snowflake.user.has_rsa,snowflake.user.has_patmay arrive as native booleans or as strings depending on the plugin and OpenPipeline processing. UsingtoBoolean()is the universal pattern that works for both types.# ✅ CORRECT — toBoolean() works for both native booleans and string "true"/"false" | fieldsAdd status = if(toBoolean(snowflake.user.is_disabled), "Disabled", else: "Active") | filter toBoolean(snowflake.user.has_mfa) | filter NOT toBoolean(snowflake.user.has_rsa) # ❌ WRONG — == "true" fails silently for native boolean attributes | fieldsAdd status = if(snowflake.user.is_disabled == "true", "Disabled", else: "Active") # ⚠️ FRAGILE — == true fails for string-typed boolean attributes | filter snowflake.user.has_mfa == trueUsers plugin: all contexts share
dsoa.run.context == "users"— distinguish by attribute presence. The users plugin passes a singlecontext_name="users"for ALL its views (V_USERS_INSTRUMENTED,V_USERS_ALL_ROLES_INSTRUMENTED,V_USERS_ALL_PRIVILEGES_INSTRUMENTED,V_USERS_DIRECT_ROLES_INSTRUMENTED,V_USERS_REMOVED_DIRECT_ROLES_INSTRUMENTED). The context namesusers_all_roles,users_all_privileges,users_removed_direct_rolesfrominstruments-def.ymldo not appear asdsoa.run.contextvalues in Dynatrace.To filter for specific user data subsets, use attribute presence:
# All roles data | filter dsoa.run.context == "users" and isNotNull(snowflake.user.roles.all) # All privileges data | filter dsoa.run.context == "users" and isNotNull(snowflake.user.privilege) # Removed direct roles | filter dsoa.run.context == "users" and isNotNull(snowflake.user.roles.direct.removed) # Base user info (login status, MFA, RSA, type) | filter dsoa.run.context == "users" and isNotNull(snowflake.user.is_disabled)Events with stale timestamps are silently dropped by Dynatrace — prefer
fetch logs. Dynatrace's OpenPipeline Events API silently rejects events whose timestamps fall outside the ingestion window (typically ±24h). Plugins that useevent_timestampsreferencing historical dates (e.g.last_altered,created_on) will show non-zero send counts in the agent logs, but the events will not appear infetch events. The same data is always available viafetch logs(which uses the current timestamp).Diagnostic pattern: If a tile using
fetch eventsreturns 0 rows but the agent reports sending events successfully, switch tofetch logs— the data is there.Known affected plugins:
users(all contexts — EVENT_TIMESTAMPS containlast_altereddates from months/years ago).Never use legacy
coalesce(dsoa.run.context, snowagent.run.context, service.namespace)fallbacks. Early DSOA versions usedsnowagent.run.contextandservice.namespaceas attribute names before standardising ondsoa.run.context. The coalesce pattern was a migration shim. All current agents emitdsoa.run.contextexclusively. New dashboards and dashboard updates must usedsoa.run.contextdirectly — no coalesce, noorfallback.Dashboard variables must not depend on a single plugin context. Variables like
$Environmentand$Accountpopulate dropdown filters used by every tile on the dashboard. If the variable query is restricted to one context (e.g.dsoa.run.context == "login_history"), and that context has no data in the selected timeframe, the variable returns empty. An empty variable causesin(deployment.environment, array($Environment))to evaluate toNULL/false, blanking all tiles — even those whose contexts do have data.Always use a broad filter that matches any DSOA data:
# ✅ CORRECT — works as long as ANY Snowflake data exists in the timeframe fetch logs | filter db.system == "snowflake" | filter isNotNull(deployment.environment) | fields deployment.environment | dedup deployment.environment | sort deployment.environment asc # ❌ WRONG — fails when login_history has no data, blanking entire dashboard fetch logs | filter dsoa.run.context == "login_history" | fields deployment.environment | dedup deployment.environmentThe same principle applies to
$Accountand any other global filter variable.Always use
unitsOverridesfor time fields — drop(ms)postfixes from field names. When a DQL field represents a time duration in milliseconds, do not append(ms)to the field alias. Instead, use a clean name (e.g.Compilation,Execution,Fastest,Slowest,Avg) and add aunitsOverridesentry for each field:unitsOverrides: - identifier: Compilation unitCategory: time baseUnit: millisecond displayUnit: null decimals: null suffix: "" delimiter: false added: 1Dynatrace renders the appropriate unit automatically in tables, charts, and tooltips. Keeping
(ms)in the field name leads to redundant display like30 s (ms).Multi-select variables: use
multiple: true, nodefaultValue, andin()in queries. For query-type variables that should allow selecting multiple values:- Set
multiple: trueon the variable definition. - Do not set
defaultValue: "*"— Dynatrace automatically adds a "select all" option. - Use
dedup+sortinstead ofcollectDistinct+array("*", ...). - In tile queries, use
in(field, array($Variable))instead of$Var == "*" or $Var == field.
# ✅ CORRECT — multi-select variable definition - key: Account type: query visible: true editable: true multiple: true input: |- fetch logs | filter db.system == "snowflake" | fieldsAdd snow_account = deployment.environment | filter isNotNull(snow_account) | fields snow_account | dedup snow_account | sort snow_account # ✅ CORRECT — tile query using in() | filter in(deployment.environment, array($Account)) # ❌ WRONG — old single-select pattern | filter $Account == "*" or deployment.environment == $AccountWhen a downstream variable depends on an upstream multi-select variable, use
in()in its query as well:# Warehouse depends on Account - key: Warehouse type: query multiple: true input: |- fetch logs | filter db.system == "snowflake" | filter in(deployment.environment, array($Account)) | fields snowflake.warehouse.name | dedup snowflake.warehouse.name | sort snowflake.warehouse.name- Set
For metrics-only plugins, derive variable values from
timeseries, notfetch logs/events. When a plugin's primary telemetry is metrics (no logs, no events, or events are stale), variable queries that usefetch logsorfetch eventswill return empty — causing all dashboard tiles to blank out. Usetimeseriesto populate variable dropdowns instead:# ✅ CORRECT — works for any metrics-emitting plugin timeseries avg(snowflake.data.rows), by: { deployment.environment } , filter: { db.system == "snowflake" and dsoa.run.context == "data_volume" } | fields deployment.environment | dedup deployment.environment | sort deployment.environment ascThe
timeseriesquery returns one row per dimension value seen in the metric series; piping to| fields+| dedup+| sortproduces a clean list suitable for a multi-select variable (multiple: true).When to use
timeseries-based variable queries:- Plugin config
telemetrylist containsmetricsbut NOTlogs - Plugin config
telemetrylist containsmetricsbut NOTevents, and events are only timestamp-based (fromEVENT_TIMESTAMPS) — those may be stale and get silently rejected - Dashboard uses
timeseriestiles as the primary visualisation
When to use
fetch logs-based variable queries:- Plugin emits logs as primary telemetry (
logsin telemetry config) - Dashboard has a mix of log tiles and metric tiles — prefer
fetch logsfor breadth
Confirmed metrics-only plugins that require timeseries-based variables:
data_volume— emits metrics + timestamp events (table update/DDL dates); primary visualisation istimeseries; usetimeseries avg(snowflake.data.rows)for variables
- Plugin config
timeseriesinline filter is the single source of truth — do NOT repeat as post-pipe| filter. When dimensions are declared inby:and used as filters, put them in the, filter: {}block. A post-pipe| filteron the same dimension is redundant, wastes processing, and should be deleted. The only valid use of a post-pipe| filteris for computed fields that do not exist as metric dimensions (e.g.| fieldsAddresults):# ✅ CORRECT — filter once, inline timeseries total_bytes = sum(snowflake.data.size) , by: { db.namespace, deployment.environment } , filter: { db.system == "snowflake" and dsoa.run.context == "data_volume" and in(deployment.environment, array($Accounts)) and (isNull(db.namespace) or in(db.namespace, array($Database))) } # ❌ WRONG — filter duplicated as post-pipe timeseries total_bytes = sum(snowflake.data.size) , by: { db.namespace, deployment.environment } , filter: { db.system == "snowflake" and dsoa.run.context == "data_volume" and ($Accounts == "*" or deployment.environment == $Accounts) and ($Database == "*" or isNull(db.namespace) or db.namespace == $Database) } | filter $Accounts == "*" or deployment.environment == $Accounts # ← DELETE | filter $Database == "*" or isNull(db.namespace) or db.namespace == $Database # ← DELETEDrop legacy coalesce backwards-compatibility fallbacks for standard attributes. Stop using
coalesce(deployment.environment, service.name)— usedeployment.environmentdirectly. Theservice.namefallback was needed during early DSOA versions beforedeployment.environmentwas standardised. The same applies to:coalesce(db.name, db.namespace)— keep only where both genuinely appearcoalesce(db.collection.name, db.sql.table)— keep only where both genuinely appearcoalesce(db.statement, db.query.text)— keep only where both genuinely appear
For
deployment.environmentspecifically, always use it directly — never wrap in coalesce.
version Field — Server-Managed, Never Touch
CRITICAL: Never increment the version field in dashboard YAML files.
The version field is the Dynatrace server's optimistic locking token — a server-assigned
value that changes on every write. It is NOT a schema version, file revision, or anything you control.
- The value in the YAML reflects the last exported/deployed state from the platform.
- When you
dtctl apply, the platform ignores your submitted version and assigns its own counter. - The outer API envelope has a separate top-level
version(e.g.101) — that is also server-managed. - The
content.version(your YAMLversion:field) tracks what the platform stored inside content.
Rule: When exporting a dashboard from the platform and saving to YAML, preserve the version as-is. Do NOT bump it in PRs, commits, or as part of change tracking. Treat it as read-only metadata.
# ✅ CORRECT — preserve version exactly as exported
version: 26 # server-assigned; do not change
# ❌ WRONG — never manually increment
version: 27 # incrementing this accomplishes nothing and creates confusion
YAML Dashboard Format
# DASHBOARD: <Human-readable title>
# DESCRIPTION: <One-line description>
# OWNER: DSOA Team
# PLUGINS: <comma-separated plugin names>
# TAGS: snowflake, dsoa, <domain>
id: <uuid> # assigned after first deploy; omit on initial creation
name: <Human-readable title> # REQUIRED — must match dashboard display name
version: 15 # server-assigned optimistic lock token — do not change
variables:
- key: Accounts
type: query
multiple: true
# DO NOT set defaultValue: "*" — Dynatrace automatically adds "*" (select all)
# as the first option for multi-select variables. Explicitly setting it creates
# a duplicate and causes the literal string "*" to appear as a selected value,
# which is NOT translated as "all" in DQL filters.
# DO NOT add array("*", <var>) in the query for the same reason — DT handles it.
input: |
fetch logs
| filter db.system == "snowflake"
| filter dsoa.run.plugin == "<plugin>"
| summarize collectDistinct(deployment.environment)
# ... additional variables
tiles:
"0":
title: ""
type: markdown
content: |
## Section Title
Description of this section.
"1":
title: Tile Title
type: data
query: |
fetch logs
| filter db.system == "snowflake"
| ...
visualization: singleValue # singleValue | lineChart | barChart | honeycomb | table
visualizationSettings:
singleValue:
label: "Label"
# colorThresholds: [{value: 0, color: "red"}, {value: 1, color: "green"}]
querySettings:
timeframe: now-2h
davis:
enabled: false
layouts:
"0": {x: 0, y: 0, w: 24, h: 2}
"1": {x: 0, y: 2, w: 6, h: 4}
# ... grid uses 24 columns
settings:
autoRefresh:
enabled: true
interval: 300
annotations: {}
YAML → JSON Conversion
Always convert before uploading. The project provides a conversion script:
./scripts/tools/yaml-to-json.sh docs/dashboards/<name>/<name>.yml > /tmp/<name>.json
Validate the JSON before uploading:
jq . /tmp/<name>.json > /dev/null && echo "JSON valid" || echo "JSON INVALID"
For workflows, the same script applies:
./scripts/tools/yaml-to-json.sh docs/workflows/<name>/<name>.yml > /tmp/<name>.json
Deploying with deploy_dt_assets.sh (Recommended)
Always use scripts/deploy/deploy_dt_assets.sh to deploy dashboards and workflows.
This script handles YAML → JSON conversion, envelope building, dtctl apply, URL printing,
and automatic ID write-back — all in one step.
# Deploy all dashboards and workflows
./scripts/deploy/deploy_dt_assets.sh
# Deploy only dashboards
./scripts/deploy/deploy_dt_assets.sh --scope=dashboards
# Deploy a single dashboard (recommended during iterative development)
./scripts/deploy/deploy_dt_assets.sh --scope=dashboards --name=<dashboard-name>
# Deploy only workflows
./scripts/deploy/deploy_dt_assets.sh --scope=workflows
# Preview without applying
./scripts/deploy/deploy_dt_assets.sh --dry-run
# Add environment label to log output
./scripts/deploy/deploy_dt_assets.sh --env=test-qa
On success the script prints a clickable [URL] line for each deployed asset:
[OK] Updated: Data Volume & Storage
[URL] https://mytenant.apps.dynatracelabs.com/ui/apps/dynatrace.dashboards/dashboard/fdd7c1db-ffc0-4c75-adea-f60cadc120ad
ID write-back: For new dashboards (no id: in YAML), the script automatically
inserts the assigned ID into the YAML file after deployment. This ensures future
runs update the same dashboard rather than creating a duplicate.
YAML requirements for the script to work correctly:
# DASHBOARD: <Human-readable name>comment at the top → used as display nameid: <uuid>top-level field → present after first deploy (written back automatically)name: <Human-readable name>top-level field → required fordtctlround-trips (also written back bydtctl getexports). If absent, the script uses the comment.
Via deploy.sh (opt-in, never part of default all):
./scripts/deploy/deploy.sh <env> --scope=dt_assets
Deploying with dtctl Directly (Manual / Fallback)
Use this approach only when deploy_dt_assets.sh is unavailable or you need
fine-grained control (e.g. deploying a single dashboard by hand).
dtctl apply expects the same envelope structure that dtctl get returns —
not a flat JSON file. The correct shape is:
{
"id": "<uuid>",
"name": "<Dashboard Name>",
"type": "dashboard",
"content": { ...full dashboard JSON from YAML conversion... }
}
CRITICAL envelope rules
idandnamemust be popped from the inner content before wrapping — they must appear at envelope level only. Leaving them insidecontentcauses the dashboard to fail to load: "We were unable to load this dashboard."- Do NOT pass the flat converted JSON directly to
dtctl apply— that causesdtctlto double-wrap the content, producingtiles: 0.
To produce the envelope correctly:
./scripts/tools/yaml-to-json.sh docs/dashboards/<name>/<name>.yml > /tmp/inner.json
python3 -c "
import json
inner = json.load(open('/tmp/inner.json'))
# CRITICAL: pop id/name OUT of content — they belong only at envelope level.
dashboard_id = inner.pop('id', None)
dashboard_name = inner.pop('name')
envelope = {'name': dashboard_name, 'type': 'dashboard', 'content': inner}
if dashboard_id:
envelope['id'] = dashboard_id
json.dump(envelope, open('/tmp/<name>-apply.json', 'w'), indent=2)
"
dtctl apply -f /tmp/<name>-apply.json
Verify after apply that tiles count is correct (not 0):
dtctl get dashboard <id> -o json | python3 -c "
import sys, json; d=json.load(sys.stdin)
print('tiles:', len(d.get('content',{}).get('tiles',{})))
"
If tiles count is 0 after apply, the envelope was wrong. Rebuild and reapply.
For new dashboards (no ID yet):
Omit id from the envelope. dtctl apply assigns one. Record it and add it to
the YAML as id: <uuid> so future runs update rather than create a duplicate.
(deploy_dt_assets.sh does this automatically.)
Preview / diff / round-trip:
dtctl apply --dry-run -f /tmp/<name>-apply.json
dtctl apply --show-diff -f /tmp/<name>-apply.json
dtctl get dashboard <id> -o yaml > /tmp/<name>-current.yaml
Full Deployment Sequence
CRITICAL — MANDATORY GATES: Steps 0 and A are hard blocking gates. You MUST complete them before writing a single line of YAML. Skipping them produces a dashboard that cannot be validated and is not done. These gates have been violated before — do not repeat the mistake.
=== GATE 0: Ask the user whether DSOA is deployed ===
!! THIS IS THE VERY FIRST THING TO DO — before reading any files, before writing
any YAML, before doing anything else.
Ask the user this question (use the question tool):
"Is DSOA already deployed and running on the target environment (e.g. test-qa)?
If yes, is the <plugin> plugin already enabled?"
Possible outcomes:
A) "Yes, deployed and plugin enabled" → skip to GATE A step 2
B) "Yes, deployed but plugin not enabled" → proceed to GATE A step 4
C) "No, not deployed" → STOP. Tell the user:
"DSOA base installation must be done by a human first (privileged scopes).
Please run:
./scripts/deploy/deploy.sh <env> --scope=all --options=skip_confirm
Then come back and I will continue."
Do NOT proceed until the user confirms deployment is done.
!! IMPORTANT: NEVER run --scope=all, init, admin, or apikey yourself.
These scopes are HUMAN-ONLY. The AI agent may only run --scope=plugins,config
(and agents when Python code changes).
=== GATE A: Synthetic Data Setup ===
!! THIS GATE IS MANDATORY. Do NOT write dashboard YAML until it is complete.
A dashboard built against an empty dataset cannot be validated. It is not done.
1. Read instruments-def.yml for all required plugins.
Identify every metric, dimension, and attribute used by dashboard tiles.
Identify which dsoa.run.context values correspond to each data area.
2. Write test/tools/setup_test_<plugin>.sql (using the snowflake-synthetic skill).
The script must cover EVERY tile's data requirements — every attribute,
every metric, every edge case referenced in the dashboard YAML.
Apply it:
snow sql --connection snow_agent_<env> -f test/tools/setup_test_<plugin>.sql
3. Verify synthetic objects exist and grants are correct:
snow sql --connection snow_agent_<env> -q "SHOW <OBJECTS> IN SCHEMA DSOA_TEST_DB.<PLUGIN>;"
snow sql --connection snow_agent_<env> -q "SHOW GRANTS TO ROLE DTAGENT_QA_VIEWER;" | grep DSOA_TEST_DB
4. If plugin is not yet enabled: update conf/config-<env>.yml (is_enabled: true).
Build and redeploy:
./scripts/dev/build.sh
./scripts/deploy/deploy.sh <env> --scope=plugins,config --options=skip_confirm
5. Trigger a manual DSOA run to get telemetry immediately (bypasses the task
scheduler — no need to wait for the next scheduled cycle):
IMPORTANT: DSOA connection profiles intentionally leave role/database/warehouse
blank (they may not exist yet at deploy time). You MUST pass them explicitly:
snow sql --connection snow_agent_<env> \
--role DTAGENT_<TAG>_VIEWER \
--database DTAGENT_<TAG>_DB \
--warehouse DTAGENT_<TAG>_WH \
-q "CALL APP.DTAGENT(ARRAY_CONSTRUCT('<plugin_name>'))"
Where <TAG> matches the environment tag (e.g. QA for test-qa, DEV for dev-094).
You can pass multiple plugins: ARRAY_CONSTRUCT('plugin_a', 'plugin_b')
or omit the argument entirely to run ALL enabled plugins.
IMPORTANT — plugin-specific latency caveats:
- Most plugins: telemetry arrives in Dynatrace within ~1-2 min after the
CALL returns.
- query_history: ACCOUNT_USAGE.QUERY_HISTORY has a ~45 min ingestion lag
in Snowflake. Even after CALL DTAGENT returns successfully, the log/span
records for queries run by the simulation script will NOT be visible in
Dynatrace until that lag clears. Plan ~45-60 min of wait time before
verifying Section 4 (operator stats) tiles.
Metrics and biz_events derived from ACCOUNT_USAGE share the same lag.
- active_queries: reads INFORMATION_SCHEMA (real-time) — no lag.
After the CALL returns, run a spot-check DQL to confirm records are flowing:
fetch logs
| filter db.system == "snowflake"
| filter dsoa.run.plugin == "<plugin>"
| filter deployment.environment == "<ENV>"
| limit 10
Do NOT proceed until records are returned.
=== PHASE B: Dashboard Authoring and Deployment ===
6. Write dashboard YAML in docs/dashboards/<name>/<name>.yml
7. Convert: ./scripts/tools/yaml-to-json.sh ... > /tmp/<name>.json
8. Validate: jq . /tmp/<name>.json
9. Deploy (single dashboard — recommended):
./scripts/deploy/deploy_dt_assets.sh --scope=dashboards --name=<name>
Or deploy all dashboards:
./scripts/deploy/deploy_dt_assets.sh --scope=dashboards
10. Record the returned ID — embed it in the YAML as `id: <uuid>`
11. Re-convert, inject id/name/type with python3, re-deploy to update in place:
python3 -c "
import json
with open('/tmp/<name>.json') as f:
d = json.load(f)
d['id'] = '<uuid>'
d['name'] = '<Human-readable title>'
d['type'] = 'dashboard'
with open('/tmp/<name>-apply.json', 'w') as f:
json.dump(d, f)
"
dtctl apply -A -f /tmp/<name>-apply.json
12. Verify tiles: dtctl get dashboard <id> -o json | python3 -c \
"import sys,json; d=json.load(sys.stdin); print('tiles:',len(d.get('content',{}).get('tiles',{})))"
Expected: tiles == 14 (or however many your dashboard has). If 0, envelope was wrong.
13. Verify every tile renders real data in the Dynatrace UI
=== PHASE C: Documentation ===
14. Write docs/dashboards/<name>/readme.md (see dashboard-docs skill)
15. Update docs/dashboards/README.md index
16. Request screenshots (see dashboard-docs skill)
Dynatrace MCP Server
The dt-oss-aym-mcp MCP server can be used as a reference to:
- Inspect existing dashboards and workflows
- Run DQL queries to validate metric availability before writing tiles
- Check what data is actually present for a given
deployment.environment
Prefer dtctl for create/update operations (it is faster and more scriptable).
Use the MCP server for read/query/exploration.