nkda-observability-contract

name: nkda-observability-contract description: Inspects, validates, and amends a feature specification to ensure complete, decision-driven observability coverage across metrics, traces, and logs. Fails if minimum standards cannot be met.

Skill: Observability Contract Enforcement

Enforce a complete, decision-driven observability contract on specifications, design documents, or implemented code. This skill is deterministic and idempotent — running it twice on the same input produces the same output.

Invocation modes:

Mode	How to invoke	Input	Output
Document	Run against a markdown file (spec, plan, design doc)	File path or currently open file	Injects/rewrites `## Observability` section in the document
Codebase	Run against a project, folder, or feature area	Project path, folder path, namespace, or feature name	Produces an Observability Audit Report as a standalone markdown section or file
SpecKit hook	Automatic via `after_specify` in `.specify/extensions.yml`	The spec being generated	Same as Document mode

When invoked manually, pass the target file or folder path. If no path is given, use the currently open file or its containing project. The skill auto-detects the mode:

If the target is a .md file → Document mode.
If the target is a .cs file, .csproj, folder, or namespace → Codebase mode.
If invoked by SpecKit hook → Document mode against the spec.

Argument Parsing

The command MUST parse --stage before any other step.

Valid values: spec | plan | tasks | implement

If --stage is present and valid: execute Stage Precedence Rule below. Do NOT execute Step 0.
If --stage is absent: fall through to Step 0 (auto-detection).

If --stage is present but invalid: STOP and emit:

OBSERVABILITY CONTRACT FAILURE: Invalid --stage value "<value>".
Expected one of: spec, plan, tasks, implement

Stage Precedence Rule

When --stage is provided it is the single source of truth. It overrides all auto-detection.

DO NOT execute Step 0.
DO NOT infer the target from file type or current editor context.
Resolve target and enforcement behaviour exclusively from the Stage Resolution table below.
--stage implement is a mutating execution path. It overrides the generic audit-only behavior of Codebase mode.

Stage Resolution (Authoritative)

Stage	Target (resolved from current feature dir)	Enforcement Focus	Required Action
`spec`	`<feature-dir>/spec.md`	`## Observability` section: operations, decisions, metrics, traces, logging, correlation, validation queries	Amend `spec.md` directly — inject/rewrite the section. Do not stop at listing gaps.
`plan`	`<feature-dir>/plan.md`	Observability Contract section: O-1 spans, O-2 metrics, O-3 logs, O-4 progress events, CLI visibility per operation	Amend `plan.md` directly — inject/rewrite the section. Do not stop at listing gaps.
`tasks`	`<feature-dir>/tasks.md`	Every user story phase contains explicit tasks for O-1 `ActivitySource.StartActivity`, O-2 `IMigrationMetrics`, O-3 structured `ILogger`, O-4 `IProgressSink.EmitAsync`	Amend `tasks.md` directly — insert missing observability tasks into each phase. Block until gap-free.
`implement`	All `.cs` files created or modified in the current execution (see below)	O-1 span on every operation method; O-2 metrics at every boundary; O-3 structured logging (Info start/end, Warn skips/errors, Debug per-item); O-4 EmitAsync at start, per ≤50-item batch, and completion	Write missing instrumentation directly into source files. Produce gap table, fix every row to ✅.

If the resolved target file does not exist, STOP and emit:

OBSERVABILITY CONTRACT FAILURE: Target file not found: <path>.
Ensure the target artefact exists before running this stage.

Stage `implement` — target resolution

The target is, in priority order:

All .cs files explicitly passed as arguments.
All .cs files created or modified in the current SpecKit implementation session (from the task execution log).
If neither is available: all .cs files in the current feature namespace or project folder that contain a class implementing IModule, IJob, ICommandHandler, or a service/tool class.

Do not scan the entire solution. Scope is the feature under implementation.

Resolve, do not just report. For every stage:

Document stages (spec, plan, tasks): write the missing content into the target file. Produce the gap table, then immediately fix every gap.
Codebase stage (implement): write the missing instrumentation code into the correct source files. Produce the gap table, then immediately implement every fix.

Execution Guard

Before performing any stage action:

Confirm the resolved target file or file set exists (see Stage Resolution above).
Attempt to extract at least one operation from the target (Step 2 below).

If either check fails: STOP execution and emit the appropriate failure message. Do not attempt inference or fallback.

Role

When this skill is active, inspect the target for observability completeness and fail explicitly if minimum standards cannot be satisfied.

Document mode: Inject or rewrite the ## Observability section to meet the mandatory contract. Modifies the document directly.
Codebase mode without --stage implement: Scan the actual source code for existing instrumentation, compare it against the mandatory contract, and produce an audit report listing gaps, violations, and required changes. This path is audit-only and does not modify source code.
Stage implement (--stage implement): Scan the scoped source files, produce the same gap table, then write the missing instrumentation directly into those files until each gap is resolved or escalated as a blocker.

Preconditions

Before executing, read the following context files:

.agents/30-context/domains/telemetry-model.md — Three-layer model, metric naming, span inventory
src/DevOpsMigrationPlatform.Abstractions/Telemetry/WellKnownMetricNames.cs — Existing metric names
src/DevOpsMigrationPlatform.Abstractions/Telemetry/WellKnownActivitySourceNames.cs — Existing ActivitySource names
src/DevOpsMigrationPlatform.Abstractions/Telemetry/WellKnownMeterNames.cs — Existing meter names

These files establish the naming conventions and existing instrumentation. New metrics and spans defined by this skill MUST be consistent with them.

Execution Steps

Execute the following steps in order. Do not skip steps. Do not proceed past a FAIL gate.

If --stage was provided, begin at Step 1 using the target resolved by the Stage Resolution table. Skip Step 0 entirely.

Step 0 — Detect Mode (only when `--stage` is absent)

Examine the target path or context.
If the target is a .md file, or the skill was invoked by a SpecKit hook → enter Document mode.
If the target is a .cs file, .csproj file, folder, or the user named a namespace/feature → enter Codebase mode.
Record the mode. All subsequent steps are executed in both modes unless marked otherwise.

Step 1 — Locate or Create Observability Section

Document mode:

Search the document for a heading ## Observability (case-insensitive).
If it does not exist, insert an empty ## Observability section immediately before ## Requirements (or at the end if ## Requirements is absent).
If it exists, read its full content for evaluation in subsequent steps.

Codebase mode:

Prepare an empty audit report structure. No document is modified in this step.
Identify the scope: if a single file, scope to that file's class/namespace; if a folder or project, scope to all .cs files within.

Step 2 — Extract Operations

Document mode: Scan the entire document and extract every operation that the feature introduces or modifies.

Codebase mode: Scan the source code within scope and extract operations by identifying:

Classes implementing IModule, IJob, ICommandHandler, or similar platform interfaces
Controller actions or endpoint mappings
Methods decorated with [Command], [HttpGet], [HttpPost], etc.
Public async methods that represent workflow steps (e.g. ExecuteAsync, RunAsync, ProcessAsync)
Message/event handlers (classes implementing IHandler<T>, IConsumer<T>, etc.)
Background services (BackgroundService, IHostedService)

An operation is any of:

API endpoint (REST, gRPC, SignalR)
CLI command or subcommand
Background job or scheduled task
Module entry point (export, import, inventory, discovery)
Workflow or orchestration step
Event handler or message consumer

For each operation, record:

Field	Value
Name	A short, unambiguous identifier (e.g. `workitems.export`, `job.enqueue`)
Type	One of: `api`, `command`, `job`, `module`, `workflow`, `handler`
Entry point	The class or method that initiates the operation (from spec or actual source file + line)
Dependencies	External calls, store operations, SDK calls the operation makes

Codebase mode — additional extraction:

For each operation found in code, also record:

Field	Value
File	Source file path
Has ActivitySource?	Whether the class creates or uses an `ActivitySource` / `Activity`
Has Metrics?	Whether the class injects or uses `IMigrationMetrics`, `IDiscoveryMetrics`, `Counter`, `Histogram`, etc.
Has Structured Logging?	Whether the class uses `ILogger` with structured templates (not string concatenation)
Has Correlation?	Whether `Activity.Current`, `traceId`, or `operationId` is propagated

FAIL GATE: If zero operations can be extracted or inferred, emit the following and stop:

OBSERVABILITY CONTRACT FAILURE: No operations could be determined from the input.
The target must contain at least one concrete operation (API, command, job, module, workflow, or handler)
before observability can be assessed. Amend the spec or verify the correct code scope.

Step 3 — Define Operator Decisions per Operation

For each operation, define the decisions an operator must be able to make using telemetry. Every signal (metric, span, log) must map to at least one decision.

Standard decision categories:

Decision	Question it answers
Is it working?	Are requests succeeding at an acceptable rate?
Is it fast enough?	Is latency within SLO bounds?
Is it overloaded?	Is concurrency or queue depth exceeding capacity?
What failed?	Which specific operation failed and why?
Where is it slow?	Which dependency or step is the bottleneck?
Is it correct?	Do output counts match input counts? Are invariants maintained?

For each operation, assign at minimum: Is it working?, Is it fast enough?, Is it overloaded?, and What failed?.

Step 4 — Define Metrics (OpenTelemetry aligned)

For each operation, define the mandatory metrics. Use the naming convention:

<domain>.<capability>.<operation>.<measure>

Where:

<domain> = migration or discovery (matching WellKnownMeterNames)
<capability> = module or subsystem name (e.g. export, import, inventory)
<operation> = specific action (e.g. workitem, attachment, revision)
<measure> = what is measured (e.g. count, duration_ms, errors, in_flight)

Mandatory metrics per operation:

Metric	Instrument	Unit	Decision served
Throughput	`Counter<long>`	`{operation}`	Is it working?
Latency	`Histogram<double>`	`ms`	Is it fast enough?
Outcome (success)	`Counter<long>`	`{operation}`	Is it working?
Outcome (failure)	`Counter<long>`	`{operation}`	What failed?
In-flight / queue depth	`UpDownCounter<long>` or `ObservableGauge<int>`	`{operation}`	Is it overloaded?

Validation rules:

Every metric name MUST follow the four-segment naming convention.
Every metric MUST map to at least one decision from Step 3.
Metrics that do not support any operator decision MUST be rejected and removed.
Metrics MUST represent business activity. Infrastructure-level metrics (CPU, memory, GC) are out of scope for the spec — they are provided by the runtime.
If the operation is a batch, add a batch-size Histogram metric.
Check existing WellKnownMetricNames — reuse existing names where the semantics match; do not invent duplicates.

Document mode: Write the metrics table into the Observability section under ### Metrics.

Codebase mode: For each required metric, check whether it already exists in the code:

Search the scoped files for Counter<, Histogram<, UpDownCounter<, ObservableGauge< instrument creation.
Search WellKnownMetricNames and WellKnownDiscoveryMetricNames for matching constant names.
For each mandatory metric: mark as Present (with file + line), Partial (instrument exists but wrong type/name), or Missing.
Record findings in the audit report.

Step 5 — Define Traces (end-to-end)

For each operation, define the span hierarchy. Every operation MUST have a root span. Every dependency call MUST be a child span.

Span table format:

Component	Span Name	Tags	Parent	Decision served
`<class>`	`<operation.name>`	`<tag list>`	Root or `<parent span>`	`<decision>`

Validation rules:

Every operation MUST have exactly one root span.
Every dependency identified in Step 2 MUST appear as a child span.
Context propagation method MUST be stated (automatic via Activity hierarchy, or explicit W3C TraceContext header).
Span names MUST use lowercase dot-separated segments matching the metric domain.
Check the existing span inventory in telemetry-model.md — reuse existing spans where semantics match.
Tags MUST include: job.id, operation, module at minimum for root spans.
Child spans MUST include the entity identifier (e.g. wi.id, attachment.name).

FAIL GATE: If any operation has no root span defined, or any dependency has no child span, emit:

OBSERVABILITY CONTRACT FAILURE: Trace coverage is incomplete.
Operation '<name>' is missing a root span, or dependency '<dep>' has no child span.
Every operation must have end-to-end trace coverage.

Document mode: Write the span table into the Observability section under ### Traces.

Codebase mode: For each required span, check whether it already exists:

Search for ActivitySource field declarations and StartActivity calls in the scoped files.
Match span names to the required span table.
Verify child spans exist for each identified dependency.
Verify tags include required fields (job.id, operation, module, entity identifiers).
Mark each span as Present, Partial (exists but missing tags or children), or Missing.
Record findings in the audit report.

Step 6 — Define Structured Logging

For each operation, define the mandatory log events. Logs are structured (key-value pairs), not free-text.

Mandatory log events per operation:

Event	Level	Fields	Decision served
Operation started	`Information`	operationId, operation, input summary	Is it working?
Operation completed	`Information`	operationId, operation, outcome, durationMs, output summary	Is it working? / Is it fast enough?
Operation failed	`Error`	operationId, operation, errorType, errorMessage, durationMs	What failed?
Dependency call slow	`Warning`	operationId, dependency, durationMs, threshold	Where is it slow?
Retry attempt	`Warning`	operationId, operation, attempt, maxAttempts, delay	Is it overloaded?
Step detail	`Debug`	operationId, step, detail	(diagnostic only)
Wire-level detail	`Trace`	operationId, payload summary	(diagnostic only)

Validation rules:

Debug and Trace level logs MUST be declared as optional and disabled by default.
Log fields MUST NOT contain raw customer data (field values, project names, org URLs, attachment paths) without DataClassification.Customer scope — per project guardrails.
Work item IDs are integer identifiers and are NOT customer data.
Every log event MUST include operationId for correlation.
No unstructured string concatenation in log templates.

Document mode: Write the log events into the Observability section under ### Logging.

Codebase mode: For each required log event, check whether it already exists:

Search for ILogger usage, LogInformation, LogWarning, LogError, LogDebug, LogTrace calls.
Verify structured templates (message templates with {Parameter} placeholders, not string concatenation or interpolation).
Verify each operation has start, completion, and error log events.
Check that log fields do not contain raw customer data without DataClassification.Customer scope.
Mark each log event as Present, Partial (exists but unstructured or missing fields), or Missing.
Record findings in the audit report.

Step 7 — Define Correlation Model

Define the correlation identifiers that MUST be present on all telemetry (metrics tags, span tags, log fields) for this feature.

Mandatory correlation fields:

Field	Source	Scope
`operationId` / `traceId`	`Activity.Current.TraceId` or generated GUID	All telemetry
`parentId`	`Activity.Current.ParentSpanId`	Spans and logs within a parent context
`job.id`	Job context	All telemetry within a job
Domain identifiers	Feature-specific (e.g. `wi.id`, `project.name`, `org.url`)	Where applicable

FAIL GATE: If correlation cannot be established (e.g. no job context, no activity source), emit:

OBSERVABILITY CONTRACT FAILURE: Correlation model is missing.
Feature '<name>' does not define how telemetry will be correlated across metrics, traces, and logs.
Define the correlation identifiers and their sources.

Document mode: Write the correlation model into the Observability section under ### Correlation.

Codebase mode: Verify correlation in the actual code:

Check that Activity.Current is used or ActivitySource.StartActivity establishes trace context.
Check that ILogger scopes or log templates include operationId / traceId.
Check that metric tag lists include job.id and domain identifiers.
Mark correlation as Present, Partial, or Missing per operation.
Record findings in the audit report.

Step 8 — Generate Validation Queries

For each operator decision defined in Step 3, generate a KQL-style validation query that proves the decision can be answered using the defined signals.

Required query categories:

Category	Proves	Source
Failure identification	Failures can be identified by operation and cause	Metrics (outcome.failure) + Logs (Error)
Latency analysis	P50/P95/P99 latency can be computed per operation	Metrics (latency histogram)
Load observation	In-flight concurrency or queue depth is visible	Metrics (in_flight / queue_depth)
End-to-end trace	A single request can be traced from entry to all dependencies	Traces (root + child spans)
Error diagnosis	Root cause can be determined from logs + traces	Logs (Error) joined with Traces

Query format:

// <Category>: <What it proves>
<table>
| where <filter>
| summarize <aggregation> by <dimensions>

FAIL GATE: If any required query category cannot be expressed using the defined metrics, traces, and logs, the observability contract is incomplete. Emit:

OBSERVABILITY CONTRACT FAILURE: Validation query '<category>' cannot be derived.
The defined signals do not support answering: '<decision question>'.
Add the missing metric, span, or log event to close the gap.

Document mode: Write the queries into the Observability section under ### Validation Queries.

Codebase mode: Generate the same queries, but annotate each with whether the required signals are Present or Missing in the actual code. If a query cannot be satisfied because the underlying instrument or span does not exist in code, flag it as a gap.

Step 9 — Idempotency Check

Before writing the final Observability section:

Compare the generated content with any existing ## Observability section.
If the existing section already satisfies all validation rules, do not modify it.
If the existing section is partially correct, preserve correct structures and amend only the gaps.
Do not duplicate metrics, spans, or log events that already exist and are correct.
Do not reorder existing content unless required for correctness.

Step 10 — Write Output

Document mode: Replace or insert the ## Observability section with the validated content.

Codebase mode without --stage implement: Produce the Observability Audit Report (see Codebase Output Format below). Do not modify source files automatically. The report is the deliverable.

Stage implement (--stage implement): Produce the same audit report structure, but the deliverable is both:

the report, and
the corresponding instrumentation edits applied to the scoped source files.

For --stage implement, do not stop at listing gaps when the fix is mechanical and local to the scoped files. Apply the fix, rerun the relevant checks, and leave only true blockers unresolved.

Document Output Structure

The ## Observability section MUST follow this structure:

## Observability

### Operations

| Name | Type | Entry Point | Dependencies |
|---|---|---|---|
| ... | ... | ... | ... |

### Operator Decisions

| Operation | Decision | Question |
|---|---|---|
| ... | ... | ... |

### Metrics

| Metric Name | Instrument | Unit | Operation | Decision |
|---|---|---|---|---|
| ... | ... | ... | ... | ... |

### Traces

| Component | Span Name | Tags | Parent | Decision |
|---|---|---|---|---|
| ... | ... | ... | ... | ... |

**Context propagation:** <method>

### Logging

| Event | Level | Fields | Operation | Decision |
|---|---|---|---|---|
| ... | ... | ... | ... | ... |

> Debug and Trace levels are disabled by default.

### Correlation

| Field | Source | Scope |
|---|---|---|
| ... | ... | ... |

### Validation Queries

#### Failure Identification
\```kql
<query>
\```

#### Latency Analysis
\```kql
<query>
\```

#### Load Observation
\```kql
<query>
\```

#### End-to-End Trace
\```kql
<query>
\```

#### Error Diagnosis
\```kql
<query>
\```

Codebase Output Structure

When in Codebase mode, produce the audit report in the following format:

## Observability Audit Report

**Scope:** `<project, folder, or file path>`
**Date:** `<date>`
**Operations found:** `<count>`

### Operations Discovered

| Name | Type | Entry Point | File | Has Spans | Has Metrics | Has Logging | Has Correlation |
|---|---|---|---|---|---|---|---|
| ... | ... | ... | ... | ✅/❌/⚠️ | ✅/❌/⚠️ | ✅/❌/⚠️ | ✅/❌/⚠️ |

Legend: ✅ = Present and correct, ⚠️ = Partial (exists but incomplete), ❌ = Missing

### Gaps — Metrics

| Operation | Required Metric | Status | Detail |
|---|---|---|---|
| ... | ... | Missing/Partial | <what is wrong or missing, file + line if partial> |

### Gaps — Traces

| Operation | Required Span | Status | Detail |
|---|---|---|---|
| ... | ... | Missing/Partial | <what is wrong or missing> |

### Gaps — Logging

| Operation | Required Event | Status | Detail |
|---|---|---|---|
| ... | ... | Missing/Partial | <what is wrong or missing> |

### Gaps — Correlation

| Operation | Required Field | Status | Detail |
|---|---|---|---|
| ... | ... | Missing/Partial | <what is wrong or missing> |

### Required Changes

Prioritised list of changes needed to reach full observability coverage:

1. **[CRITICAL]** `<file>` — `<what to add/fix>` (closes gap: `<which metric/span/log>`)
2. **[HIGH]** ...
3. **[MEDIUM]** ...

### Validation Queries

<same query format as Document mode, annotated with signal availability>

### Verdict

**PASS** — All operations have complete observability coverage.
— or —
**FAIL** — `<N>` operations have incomplete coverage. See Required Changes above.

If all operations pass, the verdict is PASS. If any operation has a Missing metric, span, or correlation field, the verdict is FAIL.

Enforcement Summary

Condition	Action
No `## Observability` section	Create it
Section exists but incomplete	Amend missing parts
Section exists and complete	No modification (idempotent)
Zero operations extractable	FAIL — spec must define operations
Missing root span for any operation	FAIL — trace coverage incomplete
Missing correlation model	FAIL — correlation is mandatory
Validation query cannot be derived	FAIL — signals are insufficient
Metric without a mapped decision	REJECT — remove the metric
Log with raw customer data and no classification scope	REJECT — redact or classify
Duplicate of existing correct structure	SKIP — do not duplicate

Completion Criteria

Both modes

All operations extracted or inferred from input.
Every operation has at least four operator decisions assigned.
Every operation has all five mandatory metric types defined.
Every metric follows <domain>.<capability>.<operation>.<measure> naming.
Every metric maps to at least one operator decision.
Every operation has a root span defined.
Every dependency has a child span defined.
Context propagation is stated.
Every operation has start, completion, and error log events.
Debug/Trace logs are marked optional and disabled by default.
No raw customer data in log fields without DataClassification.Customer.
Correlation model defines operationId, parentId, job.id, and domain identifiers.
All five validation query categories are present and derivable.

Document mode only

Observability section follows the mandatory structure.
Existing correct content is preserved (idempotent).
No TODO placeholders remain in the Observability section.

Codebase mode only

Every operation has been checked for existing instrumentation.
Every gap is marked with status (Missing/Partial) and file location.
Required Changes list is prioritised (Critical > High > Medium).
Verdict is explicitly stated (PASS or FAIL).
Audit-only Codebase runs did not modify source files without explicit user instruction.
--stage implement runs modified only the scoped source files needed to close observability gaps.
Observability section follows the mandatory structure.
Existing correct content is preserved (idempotent).
No TODO placeholders remain in the Observability section.

The skill is not complete until all criteria are checked. Any unchecked criterion is a failure.

name: nkda-observability-contract description: Inspects, validates, and amends a feature specification to ensure complete, decision-driven observability coverage across metrics, traces, and logs. Fails if minimum standards cannot be met.

Skill: Observability Contract Enforcement

Argument Parsing

Stage Precedence Rule

Stage Resolution (Authoritative)

Stage implement — target resolution

Execution Guard

Role

Preconditions

Execution Steps

Step 0 — Detect Mode (only when --stage is absent)

Step 1 — Locate or Create Observability Section

Step 2 — Extract Operations

Step 3 — Define Operator Decisions per Operation

Step 4 — Define Metrics (OpenTelemetry aligned)

Step 5 — Define Traces (end-to-end)

Step 6 — Define Structured Logging

Step 7 — Define Correlation Model

Step 8 — Generate Validation Queries

Step 9 — Idempotency Check

Step 10 — Write Output

Document Output Structure

Codebase Output Structure

Enforcement Summary

Completion Criteria

Both modes

Document mode only

Codebase mode only

Stage `implement` — target resolution

Step 0 — Detect Mode (only when `--stage` is absent)