dogfood-target - SKILL.md Agent Skill

name: dogfood-target description: >- Technical behavior test harness for Processminer's Target / Transformation track. Takes a process that already has an As-Is and drives it through the to-be skills via the running app's UI — transformation-agent, council-review, area-summary — exercising every interaction path (Y / E / R, Accept / Reject / Reopen), asserting that responses reflect correctly in the UI and the process JSON, and benchmarking speed per turn. The DTP Enhancer skills (dtp-regenerate, dtp-compare, dtp-summary) are covered by the separate /dogfood-dtp harness. Run ONLY when the user explicitly invokes /dogfood-target in the CLI. Never auto-route to it from the app's chat.

Dogfood Target

You are an automated skill behavior test harness for the Processminer Target / Transformation track — the As-Is → To-Be development arc. You are the sibling of dogfood-run (which covers As-Is authoring); this harness covers everything downstream of a documented As-Is. You care about three things:

Interaction paths — every target skill's branching paths (Y / E / R on drafts, Accept / Reject / Reopen on council items) must be explicitly exercised and produce the correct app behavior.
UI + state reflection — skill responses must produce observable, correct changes in the UI and the backing state: target element arrays (to-be-design, transformation-decisions, gap-resolution, …) and their provenance/approval in wiki/processes/<slug>.json; and targetReview / summaries on the process JSON.
Execution speed — wall-clock per skill turn is recorded. Slow turns are findings; runaway turns are failures.

You are not a domain expert and you do not score content quality. The target you develop is a throwaway test artifact.

This skill is a test harness, invoked only by the user typing /dogfood-target. Never run it against a real process and never let the app's free chat route to it.

Two Claude instances — keep them straight

When you run this skill you are the outer Claude, running in the CLI with browser tools. The app's chat spawns its own claude worker (see SKILLS.md §2) that runs the wiki skills. You never call the target skills yourself — you type into the app's chat or click its CTAs and the app's worker runs them. You only ever drive the browser and write the report.

The precondition that makes this track testable

The Target track develops a To-Be from an existing As-Is. This harness does not author an As-Is — it needs one already on disk. On invocation:

With a slug argument (/dogfood-target funds-release-dogfood-v1) — use that process.
Without an argument — pick the most recently-modified process whose As-Is arrays are populated (process-steps ≥ 5 and at least one of pain-points / process-gaps / compliance-gaps present — the problems the target must cover). Good candidates are processes a prior dogfood-run built.

If no process has a usable As-Is, stop and tell the user to run /dogfood-run first (or name a slug). A target with nothing to transform is not a valid test.

Tools

Use the preview_* MCP tools for everything browser-related — never Bash for browser work, never the Chrome MCP. Use Bash only for: reading the LAN IP, git-status style checks, and reading wiki / runtime JSON files the run produced.

Available preview_* tools: preview_start, preview_list, preview_stop, preview_logs, preview_console_logs, preview_network, preview_screenshot, preview_click, preview_fill, preview_inspect, preview_eval. There is no preview_snapshot. Use preview_eval to query DOM state. Use preview_screenshot for your own live verification; evidence in the report is textual.

Screenshots cannot be written to disk. Record per-stage evidence as DOM snapshots from preview_eval, assertion logs, and console/network captures.

Preconditions and setup

Dev server. Start with preview_start bound to 0.0.0.0 (command: npm run dev -- -H 0.0.0.0). If one is already running on localhost only, stop it and restart.
LAN IP. ipconfig getifaddr en0 (fall back to en1). The report URL is http://<ip>:<port>/target-report.html.
Run id. YYYY-MM-DD-HHMM. Keys everything in this run.
Run folder. Create public/test-report-assets/target-<run-id>/ — holds state.json and any DOM snapshot text files.

Resumability

A full run takes time. Make it resumable.

After each stage, write public/test-report-assets/target-<run-id>/state.json: run id, target process slug, completed stage numbers, per-stage results (pass/fail, assertions, notes, timings).
On invocation with a run id argument (/dogfood-target target-2026-06-08-1234), load that state.json and skip completed stages.
With no run-id argument, start fresh. Never delete a previous run's work.

Timing

Record wall-clock time for every skill turn:

Start the timer just before preview_fill + submit (or the CTA click).
Stop it when the textarea placeholder returns to Message the assistant….
Log the elapsed seconds in the stage notes.

Speed thresholds:

≤ 60 s — fast (PASS)
61–180 s — acceptable (PASS, note it)
181–300 s — slow (PASS with WARNING)
300 s — runaway (FAIL; stop waiting after 600 s and mark the turn TIMEOUT)

Reading a chat turn

The chat textarea placeholder is Working… while a turn runs and Message the assistant… when idle. To run one turn:

preview_fill the textarea (or preview_click the CTA).
Submit — preview_click the send control (button.chat-send; while a turn runs it is button.chat-stop). Find selectors with preview_eval.
Poll preview_eval until placeholder is Message the assistant…. Stop polling after 600 s; if not done, mark the turn TIMEOUT.
Read the assistant's reply via preview_eval of the chat message list.
Run preview_console_logs and preview_network; record errors as findings.

Interaction matrix — target track

Path	Where	What to do	Expected behavior
Y — accept	transformation-agent draft	`[Y]`	Element stays `status: draft`; provenance updated
E — edit	transformation-agent draft	`[E]` + correction	Edit incorporated; edited heading provenance resets to `proposed`; not approved
R — reject	transformation-agent draft	`[R]`/`[N]` + reason	Skill redrafts (differs); element stays `in-progress`/`draft`; not approved
Accept	council item	click `Accept`	`targetReview.items[i].triage = "accepted"`
Reject	council item	click `Reject`	`triage = "rejected"`
Reopen	council item	click `Reopen`	`triage = "pending"`

UI + state assertions — what to check after each interaction

After each interaction, read from the DOM and the backing state:

Target elements — do new target-state (TS-…), transformation-decision (TD-…), gap (VG-…), requirement, dependency, assumption elements appear in their arrays in wiki/processes/<slug>.json?
Draft, not approved — everything the target skills write is meta.status: "draft" / meta.approval unset. Nothing should be approved (the SME approves later in the app).
Provenance — for any edited heading, is meta.provenance.<heading>.source proposed after an [E]?
Council review — process.targetReview has ran: [...] and items: [{ specialist, title, detail, targets, triage }]; triage clicks flip triage.
Coverage — CoverageRollup shows "{covered} / {total} open problems covered" and reflects the transformation-decisions' resolves links.
Summaries — process.summaries.<area> (area-summary) gets the four-section memo.

Fail the assertion if the expected state is not present. Record the raw DOM text or JSON field value as evidence.

The run — stages

Each stage is a checkpoint. Complete it, assert every item, capture DOM snapshots as text, write state.json, move on. A stage with any FAIL assertion is itself FAIL, but the run always continues to the next stage.

Stage 0 — Preflight & As-Is precondition

Open the app and select the target process (per "the precondition" above). Assert:

App loads without JS errors (console clean)
The chosen process exists and is open (URL ?p=<slug>)
Its As-Is is usable: wiki/processes/<slug>.json has process-steps ≥ 5 and at least one problem array (pain-points / process-gaps / compliance-gaps) with ≥ 1 element

Record the slug into state.json. If the precondition fails, mark Stage 0 FAIL, write the report, and stop (the rest of the run cannot be meaningful).

Stage 1 — transformation-agent skill

transformation-agent is chat-triggered (no CTA). Trigger it: Let's design the target state for this process.

Note on SME identity. The session scope preamble (which hands the SME's name/role to the skill) is only injected on the first turn of a fresh chat session. If you trigger transformation-agent in a chat that already has history, no identity is supplied and the skill will correctly ask for it (per its Phase 0 spec) — that prompt is expected, not a defect. To avoid it, start this stage from a fresh chat (clear the conversation) so the preamble lands; otherwise just answer the identity question and continue.

Paths to exercise: Y, E, R.

Walk the first few elements the skill drafts, cycling the paths:

Turn	Path	What to send
1	Y	`[Y]`
2	E	`[E]` + a concrete correction (e.g. "tighten the rationale to name the pain-point it closes")
3	Y	`[Y]` after the edit is incorporated
4	R	`[R] This decision doesn't say what it replaces — redraft with the As-Is steps it supersedes`
5	Y	`[Y]` after the redraft

After each path, assert:

Y: the element is written to its target array (to-be-design / transformation-decisions / gap-resolution / …) as status: draft; provenance present; not approved.
E: the edit is incorporated in the re-presented draft; the edited heading's provenance is proposed; the element is not approved.
R: the next turn's draft differs from the rejected one; the element stays in-progress / unapproved.

Then drive the skill through its later optional phases so the full target data model is exercised — these do not run unless you ask for them (the skill offers [N] close out after target-states + decisions, and stops there if you take it). Send, accepting each draft with [Y]:

Now capture the gap(s) for this target. → at least one gap (VG-…) in gap-resolution.
Now capture the key requirements the target state implies. → at least one requirement in requirements.
Now capture the dependencies and assumptions behind this target. → at least one dependency in dependencies and one assumption in assumptions.

Then Wrap up the target design for this test.

Assert: at least one element exists in each of to-be-design, transformation-decisions, gap-resolution, requirements, dependencies and assumptions; all status: draft, none approved. (A phase the skill genuinely cannot populate — e.g. no dependencies apply — is a documented skip, not a fail: record which array stayed empty and the skill's stated reason.)

Speed: per-turn timing. Flag any turn > 180 s.

Stage 2 — Coverage + TO-BE synthesis (UI reflection)

No skill — assert the UI reflects Stage 1's writes.

Open the TO-BE Design section (nav key to-be-design). Assert the TargetSynthesis view renders one row per As-Is process-step, with target themes (or "Unchanged") mapped to each.
Open the Validation section (nav key validation). Assert the CoverageRollup renders "{covered} / {total} open problems covered by the target", and that covered reflects the resolves links on the transformation-decision elements written in Stage 1 (cross-check the count against the JSON).

Record the coverage line as evidence. (No timing — this is a render check.)

Stage 3 — council-review skill

In the Validation section, find the council CTA.

Paths to exercise: full council, then per-item Accept / Reject / Reopen.

Click ✦ Run full council (button.canvas-act). Wait for completion. Assert: process.targetReview.ran lists the five perspective specialists and targetReview.items[] is non-empty.
In the Council Review panel (TargetReviewPanel), on three different items: click Accept on the first, Reject on the second, Reopen on the third (button.act / button.act.ai). Assert each item's triage in process.targetReview becomes accepted / rejected / pending respectively.
Click an element-ref chip (button.link-chip-nav) on one item; assert the app navigates to that element (? URL or the element card opens).

Speed: time the full-council run (web/LLM heavy; 180–300 s acceptable; flag

300 s). Triage clicks are instant UI writes — assert, don't time.

Stage 4 — area-summary skill (Target area)

Open the Target area view (nav key __area:target). Click ✦ Generate executive summary (button.section-summary-btn.primary → generateAreaSummary("target")).

Paths to exercise: none interactive (the four parts are editable post-hoc; optionally exercise one Edit → Save on a part).

Assert:

process.summaries.target is written with the four required section headings (## Introduction, ## Current state, ## What stands out, ## Recommendation)
The SummaryPanel renders the four parts, each with an Edit affordance
(Optional) editing one part and saving persists the change to summaries.target

Speed: time from click to summary rendering.

Stage 5 — Report

Assemble the QA report and write it to:

public/target-report.html (latest)
public/target-report-<run-id>.html (archived)

The report

A self-contained pass/fail QA report — single HTML file, inline CSS, no external assets. Match DESIGN.md for type, colour, spacing.

Contents, in order:

Verdict header — run id, date, target process slug, overall PASS/FAIL, health score (% of assertions passed), total duration.
Stage table — every stage: name, PASS/FAIL, duration, assertion count passed/total, one-line note.
Per-stage detail — every assertion with PASS/FAIL and reason; DOM snapshot text as evidence; console / network errors; per-turn timing table (turn, path, elapsed s, speed rating).
Interaction path coverage — a matrix: every target path (Y / E / R, Accept / Reject / Reopen) × every skill exercised, EXERCISED / NOT EXERCISED. Every "NOT EXERCISED" is a finding.
Speed summary — per-skill average turn time, ranked slowest to fastest; slow (> 180 s) and runaway (> 300 s) highlighted.
State reflection findings — every case where a skill action did NOT produce the expected UI or state change (target element, provenance, targetReview triage, summary). Behavior bugs, not content issues.
Target artifact — slug, counts per target array (to-be-design, transformation-decisions, gap-resolution, …), targetReview item count + triage breakdown, which summaries exist.
Errors and findings — every console error, network failure, app exception, or unexpected behavior, with the stage it occurred in.

Close-out

Report to the user: overall verdict, health score, report URL (with real LAN IP), target process slug, interaction-path coverage (how many paths exercised / total), speed summary (any runaway turns?), and count of state-reflection failures.

If the run was stopped early, say which stages still need to run and how to resume (/dogfood-target target-<run-id>).