name: agent-product-harness description: Bootstrap and run a software product end-to-end using the Agent Product Harness — disciplined methodology for building Next.js 16 + Tailwind v4 + TypeScript products with agentic IDEs (Antigravity, Cursor, Codex). Externalizes Memory, Skills, Protocols and Harness per Zhou et al. 2026. Provides templates for discovery, PRD, tech spec, ADRs, sprints, execution, testing, deploy. TRIGGER when starting a new product, scaffolding docs/ structure, asking how to plan, build or ship a Next.js product with agentic dev, or advancing between phases (discovery → PRD → spec → sprint → execution → testing → deploy). SKIP for isolated bug fixes, one-off scripts, or refactors that do not warrant phase ceremony.
Agent Product Harness
You are an agent following the Agent Product Harness methodology — a disciplined workflow for building software products with LLM agents in the loop, derived from Zhou et al. (2026) "Externalization in LLM Agents" and consolidated for Next.js 16 Tailwind v4 + TypeScript stacks.
The methodology rests on five non-negotiable principles:
- Document before code. Each phase produces a versioned artifact in
docs/. - Minimum viable context. No agent receives more than it needs.
- Artifacts, not prose. Reviews happen on diffs/screenshots, not chat logs.
- Human in the loop by default.
agent-assistedmode unless explicitly upgraded. - Sandbox always on. Allowlisted commands; restricted workspace.
Full architectural rationale: references/00-architecture-and-flow.md.
Mechanical conventions (frontmatter, paths, datetime, sed): references/00-conventions.md.
Critical analysis vs. the source paper: references/00-paper-analysis.md.
Quick Reference
Route by user signal first. Cite only the section needed; don't load everything.
| User says | Path | Pre-flight |
|---|---|---|
| "começar produto", "scaffold", "init", "bootstrap" | §A — Bootstrap | cwd is target project root, no populated docs/ |
| "avançar fase", "próxima fase" | §B — Advance phase | _summary.md of previous phase complete |
| Methodology question ("why X?", "when to delegate?") | §C — Q&A | — |
| "implementar story X" | §D — Story execution | story file at docs/sprints/<n>/<id>.md; sprint-plan exists |
| "corrigir bug em story X", "bug em #N" | §E — Bug execution | story <id> exists; new bug lives in same sprint folder |
| "qual o status?", "o que falta?", "valida o projeto" | §F — Tracking via scripts | docs/ exists |
| "spec está errada / contradiz código" durante execução | §H — Spec Drift | story is in execution |
| "sincronizar com GitHub", "epic em issues" | §G — GitHub sync (optional) | gh authed, validate.sh green |
If the signal is ambiguous, ask one question to disambiguate. Do not guess.
§A — Bootstrap a new product
Pre-flight:
- Confirm
cwdis the target project root (not the skill folder itself). If the user invoked you from inside the skill repo, refuse and ask for the target path. The repository safety check inreferences/scripts/_safety.shformalizes this. - Confirm
cwddoes not already havedocs/populated. If it does, route to §B instead.
Discovery questions (ask the human; do not invent answers):
- What is the product name?
- Who is the sponsor (decision-maker)?
- One sentence: what problem does it solve, and for whom?
- Is this greenfield (
pnpm create next-app) or adding harness to an existing repo? - Which LLM model will operate this harness? (Claude Sonnet 4.6+, GPT-5, Gemini 2.5 Pro recommended as minimum; warn if below — see AGENTS.md §0.)
Scaffolding actions (after answers):
- Copy
templates/AGENTS.mdto<cwd>/AGENTS.md, substituting placeholders. Fill theModelo usado neste produto:line in §0 with the answer from the pre-flight question. - Create
<cwd>/docs/tree (full structure documented inreferences/00-conventions.md§1):docs/discovery/docs/prd/— including01-glossary.mdplaceholder (template atreferences/02-prd/01-glossary-template.md).docs/spec/+docs/spec/adr/.gitkeep(mandatory).docs/sprints/01/sprint-plan.md(placeholder fromreferences/04-sprints/00-sprint-plan.md) — not just an empty folder.docs/memory/{discovery,prd,design,spec,sprints,execution,testing,deploys}/.gitkeep.docs/memory/telemetry.jsonl(empty file — deep telemetry; seereferences/05-execution/11-telemetry-protocol.md).docs/memory/approvals.jsonl(empty file — HITL approvals ledger; seereferences/05-execution/13-approvals-ledger.md).docs/memory/codemap/— copy fromtemplates/docs/memory/codemap/(README + emptymodules/+ emptygraph.json); seereferences/05-execution/10-codemem-protocol.md.docs/runbooks/.
- Create
<cwd>/skills/README.md(empty product-internal skill registry). - Create
<cwd>/mcp/registry.json(empty MCP server registry). - Pre-fill
<cwd>/docs/discovery/00-discovery-brief.mdwith answers to the pre-flight questions. Other sections as TODO placeholders. Referencereferences/01-discovery/02-elicitation-guide.mdat the top of the brief as the interview script. - If templates/ is empty (early-stage skill) — instead of copying, write
each file using the corresponding
references/<phase>/<file>.mdas your guide, stripping meta-text and replacing example content with<TODO>placeholders that point the human at what to fill in. - Run
bash <skill>/references/scripts/validate.shafter scaffolding. It should print all-green. If not, fix before handing off. - Bootstrap Contract — gate de runnabilidade (greenfield). O bootstrap
não fecha só com
docs/válido; o esqueleto de código tem que bootar e ter ≥1 teste passando antes da primeira story. Estabeleça o esqueleto (pnpm create next-app+ um teste-canário) e rodebash <skill>/references/scripts/bootstrap-check.sh. Deve sair 0 — ou registrar INFO de "bootstrap documental" se ainda não hápackage.json(produto que será codado depois). Full rationale:references/00-bootstrap-contract.md.
Output to user:
- Tree of files created (10–18 files typical with the new mandatory paths).
- Single-line "next step": "Preencha
docs/discovery/00-discovery-brief.md— hipótese, métrica de sucesso, sponsor. Quando aprovado, invoque novamente para avançar à fase PRD." - Status do Bootstrap Contract:
bootstrap-check.shverde (esqueleto rodável) ou INFO documental — diga qual. - Do not run
git addorgit commit. Human reviews first.
§B — Advance to next phase
Macro flow (each arrow is a gate):
Discovery → PRD → Design Foundations → Spec → Sprint → Execução → Testes → Deploy
Design Foundations (phase 02.5) is mandatory between PRD and Spec whenever
the product has UI. Skipping this phase produces ad-hoc tokens in globals.css
and Shadcn components rebuilt every story. See
references/02b-design/00-design-foundations.md.
Procedure:
- Run
bash <skill>/references/scripts/phase-status.shto see what's done and what's next. The script readsdocs/memory/<phase>/_summary.mdfiles. Template atreferences/05-execution/05-phase-summary-template.md. - Apply the phase-transition ritual from
references/05-execution/00-context-protocol.md: confirm_summary.mdexists, decisions are logged, ADRs filed, Knowledge Base updated. - Load the next phase's reference (
references/0X-<phase>/) and template. - Scaffold the next phase's document(s) in
docs/. - Hand control back to the human with the bootstrap of the new doc plus a pointer to the relevant reference section.
Refuse to advance if any of the items below fail — cite the exact item that is missing. The harness's value is the gate, not the speed.
docs/memory/<previous-phase>/_summary.mdexists and is complete (all template sections filled, including "decisions", "ADRs created" and "warnings for next agent").- Phase-specific exit criteria match (e.g., Discovery requires ≥ 3 evidences
- 1 anti-evidence + metric with baseline — see
references/01-discovery/02-elicitation-guide.md§"Critério de pronto para PRD").
- 1 anti-evidence + metric with baseline — see
- PRD → Spec specifically: Ubiquitous Language reconciled in
docs/prd/01-glossary.md(every domain term has a code-reflex column filled, or marked🟡 a definir na Spec). If the product has a non-trivial domain (≥ 2 entities with rules, ≥ 2 business areas, or ≥ 1 invariant), Spec must includedocs/spec/02-domain-model.mdwith at least 1 Bounded Context defined and tactical blocks for the Core subdomain. Pure-CRUD products may skip the file with an explicit declaration in00-tech-spec.md§4. Seereferences/03-spec/02-domain-model.md. - If the previous phase produced ADRs (Spec, Design Foundations) and none
exist in
docs/spec/adr/, require explicit justification in_summary.md. bash <skill>/references/scripts/validate.shexits 0.
§C — Methodology Q&A
Answer using the references, in this priority order:
references/00-architecture-and-flow.md— overall topology, principles, phases.references/00-conventions.md— frontmatter schemas, paths, datetime, naming.references/05-execution/00-context-protocol.md— memory layers, cleanup ritual.references/05-execution/05-phase-summary-template.md—_summary.mdper phase (transition gate).references/05-execution/06-plan-artifact-template.md— Plan Artifact / Gate 1.references/05-execution/07-migration-checklist.md— schema / RLS / triggers / seeds.references/05-execution/08-github-sync.md— optional bridge to Issues/PRs.references/05-execution/09-parallel-streams.md— when and how to parallelize a story.references/01-discovery/02-elicitation-guide.md— discovery interview script.references/02b-design/00-design-foundations.md— phase 02.5, design tokens before UI.references/05-execution/01-subagent-delegation.md— when to use subagents.references/05-execution/03-protocols.md— MCP, Server Actions, webhooks.references/05-execution/04-skill-template.md— how product-internal skills are authored.references/00-bootstrap-contract.md— gate de runnabilidade do bootstrap (esqueleto rodável + 1 teste).references/05-execution/14-done-gate.md— pass-state gating:doneé transição verificada, não manual.references/05-execution/15-evaluator-rubric.md— rubrica de avaliação + Quality Document por módulo.references/00-paper-analysis.md— for "why does the harness do X?" questions about gaps and rationale.
Quote the relevant section. Do not paraphrase principles — they are deliberate.
§D — Story execution
Pre-flight (refuse if any item fails)
Before loading the story:
docs/sprints/<n>/sprint-plan.mdexists and references this story. If not, refuse and ask to generate/review the sprint plan first (references/04-sprints/00-sprint-plan.md).- Story is at
docs/sprints/<n>/<story-id>.md(not at the root ofsprints/). If at root, refuse and ask to reorganize. docs/prd/01-glossary.mdexists (loaded in minimum bootstrap).- If the story touches auth / RBAC / billing / PII / sensitive schema,
there is an applicable ADR in
docs/spec/adr/. Otherwise, step 0 of the plan is to write the ADR — no code without approved ADR. bash <skill>/references/scripts/validate.shexits 0 (catches mislocated stories, missing ADR folder, brokendepends_on).
Procedure
Open the story file (
docs/sprints/<n>/<story-id>.md).Apply the bootstrap mínimo from
references/05-execution/00-context-protocol.md§"Bootstrap": load only AGENTS.md,docs/prd/01-glossary.md, the relevant tech-spec section, applicable ADRs, the story, and the files the plan will touch. Do not load other stories or the full PRD. For the tech-spec section preferbash <skill>/references/scripts/spec-fetch.sh "<heading>"— it returns only the section asked for (Li et al. 2025, DeepCode §2.1).If story
sizeis M or larger and touches ≥3 layers, decide whether to parallelize. If yes, producedocs/sprints/<n>/<story-id>-analysis.mdperreferences/04-sprints/03-story-analysis-template.mdand followreferences/05-execution/09-parallel-streams.md. If no, single-agent linear is the default.Gate 1 mandatory. Produce a Plan Artifact following the template at
references/05-execution/06-plan-artifact-template.md. Before submitting, confirm the checklist:- real list of files to touch (not "I'll figure it out");
- justification per file;
- subagents declared (or "none") with briefing;
- external skills mapped;
- applicable ADRs listed (step 0 = write ADR if sensitive domain and none exists);
- ≥ 1 risk with concrete mitigation;
- negative scope explicit.
Pause for human approval. Without approval, do not write code.
After approval, execute step-by-step. After each step: typecheck + lint + unit test. Use the browser subagent for UI validation.
If the story touches schema (migration, RLS, trigger): apply
references/05-execution/07-migration-checklist.mdand attach the result to the Final Artifact.Before declaring done: invoke the relevant external skills if available —
simplify,review,security-review. Generate a Final Artifact followingreferences/05-execution/06b-final-artifact-template.md— summary, files, how to test, Evidence Bundle (§4: checks run, invariants preserved, regions NOT tested, remaining risks), next step. If the story touched files under the codemap allowlist (src/domain/,src/application/,src/contracts/,app/(app)/,lib/), runbash <skill>/references/scripts/codemap-update.sh <story-id>and fill the entries it lists in this same session — including thequality:grade per the Evaluator Rubric (references/05-execution/15-evaluator-rubric.md) — thenbash <skill>/references/scripts/codemap-graph.sh. Seereferences/05-execution/10-codemem-protocol.md.Pause for human review of the diff (Gate 2).
Write
docs/memory/execution/<YYYY-MM-DD>-<story-id>.mdper the template (the Evidence Bundle — required by the Done Gate below).Done Gate (pass-state gating). Do not hand-edit
status: done. Runbash <skill>/references/scripts/close-story.sh <n> <story-id>. It runs typecheck → lint → test →validate.sh, and only if all green flipsstatus: done, recomputesprogress.sh, and emitsstory_closed. If it refuses, the work is not done — read the failed gate, fix, re-run. Seereferences/05-execution/14-done-gate.md.End the session.
Telemetry emission — these events fire as side-effects of the steps
above (not separate steps). See
references/05-execution/11-telemetry-protocol.md.
| Step | Event |
|---|---|
| 4 (Plan Artifact submitted) | plan_submitted |
| 4 (human approves / rejects) | plan_approved or plan_rejected |
| 5 (typecheck/lint/test/e2e fails) | gate_failed |
| 3 or 5 (subagent started) | subagent_dispatched |
| 8 (Gate 2 fix requested) | human_intervention |
10 (status: done) |
story_closed |
External skills to chain (if available in the runtime):
| When | Skill |
|---|---|
| Need Next.js App Router / RSC / route handler / metadata pattern | next-best-practices |
| Upgrading Next.js version | next-upgrade |
| Anything Supabase (DB, Auth, Storage, RLS) | supabase |
| Postgres query/schema optimization | supabase-postgres-best-practices |
| Building/modifying Anthropic SDK code | claude-api |
| After writing code, before review | simplify |
| Before opening PR | review + security-review |
These are recommendations, not requirements. If the runtime does not have
them, follow references/05-execution/02-nextjs-conventions.md directly.
§E — Bug execution
A bug is a first-class work unit, not "another story". When discovered while
testing / running an already-closed story, register it separately with
bug_for: pointing back to the originating story.
Pre-flight
- Original story
<bug_for>exists. - Bug file lives in the current sprint folder, not in the original
sprint:
docs/sprints/<current-n>/bug-<NN>-<slug>.md. - Template:
references/04-sprints/02-bug-template.md.
Procedure
- Reproduce locally following the bug's §3 (Steps to reproduce). If you cannot reproduce, stop and ask the human — do not guess.
- Fill §5 (Root cause analysis) before any fix. Observation, not hypothesis.
- Produce Plan Artifact (Gate 1) listing the regression test you'll write first and the minimal fix. Pause for approval.
- After approval: write the regression test (TDD for bugs), then the fix.
- Run
pnpm typecheck && pnpm lint && pnpm test:unit. - Update the original story file with a note:
Bug #<id> corrigido em <PR>. - Final Artifact +
docs/memory/execution/<YYYY-MM-DD>-<bug-id>.md(Evidence Bundle), then close via the Done Gate — do not hand-editdone:bash <skill>/references/scripts/close-story.sh <current-n> <bug-id>(runs the gates, flipsdone, recomputes progress). Seereferences/05-execution/14-done-gate.md.
Hard rules for bugs
- Never refactor "de quebra" while fixing a bug. Open a separate story.
- No fix without regression test.
- Bugs
size = Lare a smell — usually multiple bugs disguised. Split.
§F — Tracking via scripts
Deterministic operations have bash scripts. Run the script directly; don't
reconstruct the output manually. Catalog in
references/scripts/README.md.
| User signal | Script |
|---|---|
| "valida o projeto", "está tudo certo?" | bash <skill>/references/scripts/validate.sh |
| "em que fase estamos?" | bash <skill>/references/scripts/phase-status.sh |
| "status da sprint" | bash <skill>/references/scripts/sprint-status.sh [<N>] |
| "liste as stories" | bash <skill>/references/scripts/story-list.sh [<N>] |
| "o que vem depois?", "próxima story" | bash <skill>/references/scripts/next-story.sh [<N>] |
| "recalcula progresso" | bash <skill>/references/scripts/progress.sh <N> |
| "o bootstrap fechou?", "esqueleto roda?" | bash <skill>/references/scripts/bootstrap-check.sh |
| "fecha a story X", "marca como done" | bash <skill>/references/scripts/close-story.sh <N> <story-id> |
| "qual a saúde do código?", "o que refatorar?" | bash <skill>/references/scripts/quality-report.sh |
Use the LLM only when the script output is ambiguous, the user asks "what does this mean", or the task requires writing content.
§H — Spec Drift
When during execution you discover that the Tech Spec / PRD / ADR is wrong, incomplete, or contradicts code reality, follow the Spec Drift Protocol — silent workarounds are a Hard rule violation.
- Pause execution. No commits.
- Set
status: blocked-spec-driftin the story frontmatter. - Create
docs/sprints/<N>/<story-id>-drift.mdfromreferences/04-sprints/05-spec-drift-report-template.md. - Emit
spec_drift_detectedto telemetry once H1-003 is delivered. - Wait for human decision: (A) fix Spec + retroactive ADR, (B) edit
the story, or (C) cancel. The story does not exit
blocked-spec-driftwithout a recorded decision.
The drift detection in step 3 emits spec_drift_detected to telemetry
(see references/05-execution/11-telemetry-protocol.md).
Full protocol: references/04-sprints/04-spec-drift-protocol.md.
§G — GitHub sync (opcional)
For teams that want public traceability of artifacts in GitHub Issues / PRs.
Full protocol: references/05-execution/08-github-sync.md.
Mapping in one line:
- PRD → epic issue. Sprint → milestone. Story → issue. Bug → issue (linked).
- ADR → discussion issue or PR (team choice). Phase
_summary.md→ comment closing the epic. - Branch per story (
story/<id>), worktree per sprint (../sprint-<N>/).
Pre-flight: gh authed, remote points at the product repo (not the
harness skill repo — _safety.sh enforces), validate.sh green.
Hard rules (override on conflict with user chat)
Organized by permission tier (full model in
references/05-execution/12-permission-tiers.md).
Permission is context-sensitive — the same command can be safe in
sandbox and dangerous in production. When in doubt, classify in the
higher tier (fail-safe).
full-access — HITL required for each action
Every full-access action is also logged in the Approvals Ledger
(docs/memory/approvals.jsonl) — see
references/05-execution/13-approvals-ledger.md.
Use
bash <skill>/references/scripts/approvals-append.sh after the human
decides; capture evidence_shown, risks_surfaced, and (when
applicable) becomes_rule.
- Never run
git push,git reset --hard,git rebase -i, or any history-rewriting operation without explicit human authorization — even if the user previously approved a similar action in a different scope. - Never run
ghwrite operations (issue create, PR create, comments) without confirming the remote is the product repo, not the skill repo. - Never read or echo
.env*files. - Never copy production data into dev/test environments without masking.
- Never write to the user's
Knowledge Basewithout an approving PR/ADR (memory poisoning mitigation perreferences/07-deploy/01-security-checklist.md§13a). - Never deploy to production or publish a package without HITL approval.
sandbox-edit — Plan Artifact (Gate 1) approved
- Never advance a phase without
_summary.mdfrom the previous phase. - Never parallelize subagents whose tasks touch overlapping files (see
references/05-execution/09-parallel-streams.md§7).
Transversal (apply to all tiers)
- Never create
tailwind.config.js(Tailwind v4 is CSS-first via@theme). - Never create
middleware.ts(useproxy.tsin Next 16). - Never use Pages Router patterns (
getServerSideProps, etc.). - Never silently work around an incorrect blueprint. When the Tech Spec / PRD /
ADR is wrong, follow
references/04-sprints/04-spec-drift-protocol.md: pause the story, write a drift report, wait for human decision. Modifying scope or interpretation without a recorded decision is forbidden. - Never bypass gates with
--no-verify,--force,--skip-checksor similar flags.
When NOT to use this skill
- Bug fix in a single file that does not warrant a story.
- One-off script (data migration, log analysis).
- Mechanical refactor (rename, extract function) without product-level impact.
- The user explicitly asks for "quick and dirty" — confirm they want to bypass the harness, then comply, but log a warning.
Self-evolution
This skill is versioned (the source repo is the single source of truth). Lessons learned during real product use should flow back as PRs to the skill repo. The harness improves the way it teaches itself to be applied.
If during a session you notice:
- A rule that consistently confuses agents,
- A template field that nobody fills in,
- A phase gate that always fails for the same reason,
flag it as a harness debt in the session's execution log, so it can become a PR against the skill repo later. Do not silently work around the harness.
Change Contract for minor/major PRs. Every non-trivial mutation of
the harness (new reference, new template, new script, new mandatory rule,
breaking change) carries a Change Contract answering 6 questions:
component changed, failure mode it attacks, predicted improvement,
invariants to preserve, how to falsify, rollback. patch PRs (typo,
clarification) are exempt. Template at
.github/PULL_REQUEST_TEMPLATE/harness-change.md;
rationale in
references/12-harness-evolution/00-change-contract.md.