codd-evolve - SKILL.md Agent Skill

name: codd-evolve description: | Conversationally evolve an existing CoDD project. Use when the user describes a functional change in natural language ("add logout button", "change course model to master + delivery target", "remove daily log step") and you need to update requirements, design docs, lexicon, source code, and tests together while maintaining CoDD coherence. Brownfield modification, NOT greenfield generation, NOT pure bug fix.

CoDD Evolve — Conversational Brownfield Evolution

Take a Lord-style natural-language change request and automatically determine which design docs, lexicon entries, source files, and tests must move together to preserve CoDD coherence. The user expresses intent; this skill figures out scope.

When to Use

The user describes a functional change to an existing system, not a bug
Examples of trigger phrases:
- "Add a logout button to the admin nav"
- "Change course management — courses should be a shared master with separate delivery targets"
- "Restructure the learner list to drill down: facility → learner → course → progress"
- "Remove the daily log append step from the karo completion flow"
- "受講者管理を施設フィルター起点に変更"
The user does NOT want to think about which design docs, lexicon entries, or source files to touch
The project is already CoDD-initialized (has codd/codd.yaml and at least the design doc layout)

Do NOT use this for:

Greenfield generation from scratch — use /codd-init then /codd-generate
Pure bug fix where requirements and design are correct — use codd fix or codd fix [PHENOMENON]
Reverse-engineering an undocumented codebase — use codd extract (or /codd-restore)
Single-doc impact analysis only — use /codd-impact
Code-only refactoring with no behavioral change — use codd propagate

What This Does — Role Separation

This skill makes a single contract explicit:

Layer	Who decides	What
Intent	User	"I want X" in natural language
Strategic constraints	User	North star, hard prohibitions, breaking-change tolerance
Impact scoping	This skill	Which design docs, lexicon, source files, tests are affected
Doc updates	This skill + CoDD	Update requirements + every affected design doc in coherent order
Lexicon updates	This skill	Detect new terms, ask user before adding, then update lexicon
Implementation	CoDD CLI	`codd implement` from updated design
Verification	CoDD CLI	`codd verify` — must reach red 0
Coherence finalization	CoDD CLI	`codd propagate` for cross-doc consistency
Failure judgment	You (orchestrator)	Decide retry vs ask user vs abort
Final approval	User	Review the PR / diff post-hoc

The user must never have to choose which file to touch.

Workflow

Step 1 — Confirm prerequisites

Before starting, verify:

Current directory is a CoDD-initialized project (codd/codd.yaml exists)
Working tree is clean OR uncommitted changes are intentional (warn the user otherwise)
codd verify currently passes (red 0) — if not, the user should fix existing red first, per codd fix [PHENOMENON] prerequisite
If codd verify returns exit 0 with silent stdout (a known mode where build/test phases run but emit no text), fall back to codd dag verify for an explicit red-count readout. Use both signals when the baseline is uncertain.
If runtime verification_test nodes are unsuitable for the local environment, prefer an explicit config or skip over silent omission: set verify.verification_timeout.total_seconds / per_node_seconds, or run codd verify --runtime --runtime-skip verification-test and preserve the reported SKIP evidence.

If red exists, STOP and surface it. Do not attempt to layer new changes on a red baseline.

Step 2 — Parse intent and classify

Classify the user's request into one of:

Type	Marker	Likely affected docs
`add_feature`	"add X", "新規追加"	requirements + at least one design doc + lexicon (new terms) + new source + new tests
`change_behavior`	"change X to Y", "〜に変更"	requirements + affected design docs + lexicon (term-meaning shift) + modified source + updated tests
`change_data_model`	"data model", "schema", "table", "entity"	database_design + api_design + lexicon + migrations + source + tests
`change_ux`	"UI", "screen", "navigation", "画面"	ux_design + frontend source + frontend tests
`remove_feature`	"remove X", "削除"	requirements (mark removed) + design docs (remove sections) + source (remove) + tests (remove or update) + lexicon (deprecate term)
`cross_cutting`	Touches auth/permissions/i18n/tenancy	auth_design + every callsite + tests for every role

If classification is ambiguous, ask one clarifying question (see Step 3).

Drift detection against existing design

After classification, scan the affected design docs for pre-existing references to the proposed change. Examples:

The intent says "add logout button to admin nav," but ux_design.md already documents a logout entry in the tenant_admin / learner sidebars (impl never caught up).
The intent says "add delivery_target table," but database_design.md has an Open Question (OQ-DB-NN) that proposes the same structure under a different name.

When such drift is found, classify it:

Drift A — broader-than-intent: design covers more roles/cases than the intent. Either (a) flag to the user and ask whether to keep the narrower intent or align to the design, or (b) treat as a Step 3 gate-5 "ambiguous scope" trigger.
Drift B — design proposed, impl absent: an Open Question or TODO matches the new intent. Reuse the existing terminology and mark the OQ as resolved.
Drift C — contradiction: design says X, intent says not-X. Halt as a Step 3 gate-3 "structural impossibility."

Recording drift in the report prevents silent vocabulary divergence on subsequent evolutions.

Step 3 — Stop-and-ask gates

Stop and ask the user only when one of these triggers fires:

New lexicon term required. The change introduces a vocabulary not in project_lexicon.yaml. Ask: "I'll add <term> to the lexicon meaning <definition>. OK?"
Breaking change to existing behavior. Existing users / callers will see different output. Ask: "This changes how <X> behaves for existing users. Is breaking change acceptable?"
Coherence is structurally impossible. Requirements would contradict an existing invariant. Surface the contradiction; do not proceed silently.
Cross-cutting scope explosion. The change touches more design docs than the user likely realized (rule of thumb: >4 docs). Confirm scope before charging ahead.
Ambiguous role/scope. "Add logout" — for which role? Or all roles? Ask once.
1:N/N:N data model change -> UI page topology. When Step 2 classification == change_data_model and the change introduces a 1:N or N:N relation that does not yet have an operation_flow.ui_pattern declared for the parent/child pair, ask which UI topology should be used: (a) single screen with everything inline, (b) master-detail on the parent's detail page, (c) drilldown to a dedicated child page, or (d) defer to LLM auto-decision, which is discouraged and may trigger a ui_coherence warning. Record the answer to requirements/*.md operation_flow as a new Operation entry.

Pre-approved branch

If the orchestrator (multi-agent system, task YAML, prior conversation, etc.) already records explicit approval for one or more of these gates, treat them as immediately confirmed without re-asking. Examples:

A task YAML that states "lexicon delivery_target is pre-approved" → skip gate 1 prompt for that term.
A task YAML that states "breaking change accepted by stakeholder" → skip gate 2 prompt.
A handoff that names the role explicitly ("update only the central_admin nav") → skip gate 5 prompt; if drift detection (Step 2) finds the design covers more roles, surface the drift in the report instead of blocking.
A task YAML that states "ui_pattern for <child> is master_detail" → skip gate 6 prompt and record the pre-approved topology source in the report.

Always record in the report which gates were short-circuited and the source of the prior approval. Pre-approval never applies to gate 3 (structural impossibility) — that always halts.

Do not ask the user:

Which file to touch
Which doc to update
What order to do things in
What to name a commit
Which version to bump

Step 4 — Execute the coherence chain

Once intent is confirmed, execute in this order (each step's output feeds the next):

1. Update requirements/*.md
   - Append new requirement / modify existing / mark deprecated
   - Preserve frontmatter, traceability IDs, and Bloom levels

2. Update affected design/*.md docs (in dependency order)
   - Determine order via codd's existing CEG (depends_on graph)
   - Update each doc body to reflect the new requirement
   - Preserve frontmatter exactly

3. Update project_lexicon.yaml if needed
   - Only after user confirmed in Step 3
   - Keep alphabetical / grouped order if existing convention

4. Run codd implement to (re)generate source from updated design
   - For incremental change, codd implement updates only affected modules
   - For pure data model changes, also generate migration files if applicable
   - **Generated-code impact check**: if the project keeps codd-generated output under `src/generated/**` (or equivalent), classify whether the change requires regenerating those modules or whether it stays in the hand-edited area (handlers, UI, tests). Record the decision in the report so future evolutions know whether `src/generated/**` was intentionally untouched.
   - **Prerequisite (cmd_345 K-3)**: if any design doc declares `operation_flow`, ensure `codd.yaml` has `ai_commands.impl_step_derive` set. Without it, `operation_flow_hint()` is silently skipped and declared UI patterns will not influence generation. Verify with `grep impl_step_derive codd/codd.yaml`; CoDD also emits a `WARNING` on stderr when this gap is detected.

5. Update tests
   - Tests for new requirements MUST be added (no silent skip)
   - Tests for removed requirements MUST be removed
   - Tests for changed behavior MUST be updated

6. Run codd verify
   - MUST reach red 0
   - If red persists, see Step 5 (failure handling)

7. Run codd propagate
   - Final cross-doc consistency pass
   - Catches any drift between source-as-implemented and design-as-written

8. Runtime smoke verification (MANDATORY — not optional)
   - Run `codd verify --runtime` from the project root and paste or link the generated runtime smoke report.
   - `codd verify --runtime` automatically checks:
     a. Local DB up via `codd.yaml runtime_smoke.db_check.command`
     b. Dev server up via `runtime_smoke.dev_server.url`
     c. Smoke connectivity via `runtime_smoke.smoke_connectivity[]`
     d. Real-browser E2E via `runtime_smoke.e2e.command`
     e. Opt-in CRUD flow reflection via `runtime.crud_flow_targets[]`
   - For every visible or `operation_flow` command/control/action that mutates
     state or emits a business result, add or reuse a CRUD flow target, an
     `action-outcome` target, or an equivalent E2E proving: trigger → server
     acceptance/mutation → re-fetch or observable outcome → visible reflection,
     persistence, emitted event, expected output, or absence as appropriate. A
     green GET smoke alone is not enough for mutating actions.
   - All results are written with raw logs to `reports/runtime_smoke_{{timestamp}}.md` unless the project config overrides the path.
   - Self-reported runtime smoke is not acceptable evidence. If `--runtime-skip <category>` is used, including `--runtime-skip verification-test` or `--runtime-skip crud-flow`, the report must show the skipped category explicitly and it must never be described as passed.
   - If `codd verify --runtime` fails: the change is NOT done. Either fix forward or revert. Reporting done with the server down is a critical violation of CoDD coherence.

Never reorder these steps. Doc updates always precede source updates — that is the CoDD coherence invariant. Step 8 is the actual completion gate — Steps 1-7 produce coherent artifacts, Step 8 proves the user can actually use them.

Step 5 — Failure handling

If codd verify red persists after Step 4:

First retry: run codd fix once to let CoDD self-repair common issues
Second attempt: surface the failing test output, classify the cause
- Test outdated → update test
- Design contradicts requirement → ask user
- Implementation cannot match design → ask user whether design is wrong or impl approach is wrong
Do not loop more than 3 times. After 3 failed attempts, STOP and report to user with concrete diagnostics

Local database unavailable

change_data_model work often needs a migration command (e.g. prisma migrate dev) that requires a running local database. If the DB cannot be reached:

Do not apply the migration to any non-local target (e.g. staging, production VPS).
Author the migration by hand under the project's migrations directory (Prisma example: prisma/migrations/<timestamp>_<slug>/migration.sql). Mirror the conventions of existing files in that directory (column order, index naming, foreign-key style).
Validate the schema declaration alone — prisma validate (or the framework's equivalent) — to confirm the model file parses and matches expectations.
Note in the report that the migration is generated but unapplied, and call out what the user must run locally once the DB is back (prisma migrate deploy for hand-authored migrations).
Treat this as an acceptable verification path only when codd dag verify and codd propagate also pass; it is not a substitute for full codd verify when build/test phases are reachable.

Step 6 — Report

Generate a concise summary for the user:

Updated:
- requirements/foo.md (added: ログアウト機能)
- design/auth_design.md (added: NextAuth signOut handler)
- design/ux_design.md (added: 中央管理者ナビ Logout ボタン)
- src/components/AdminNav.tsx (added)
- src/app/api/auth/signout/route.ts (new)
- tests/e2e/logout.spec.ts (new)

Lexicon: no changes
Verify: red 0 ✅
Propagate: 0 drift ✅
Runtime smoke (Step 8):
  - `codd verify --runtime`: ✅
  - report: reports/runtime_smoke_20260517_210000.md
Done: ✅ (user can open the app and use the new feature)

If any Step 8 check is ❌, the change is NOT done. Either fix forward or revert; never report done with the runtime broken.

Suggest a commit message and offer to commit. Do not auto-commit unless the user confirms.

CoDD Commands This Skill Orchestrates

Command	When invoked	Why
`codd verify` (entry guard)	Step 1	Confirm clean baseline
`codd impact`	Step 2	Determine which design docs are downstream of the proposed change
`codd implement`	Step 4 (step 4)	Generate source from updated design
`codd verify`	Step 4 (step 6)	Confirm coherence after change
`codd fix`	Step 5 (retry)	Self-heal common verify failures
`codd propagate`	Step 4 (step 7)	Catch final source-design drift

Examples

Example 1 — Feature addition (clear scope)

User: "Add a logout button to the admin nav."

Skill:

Classify → add_feature, scope = auth + ux (single role: admin)
No new lexicon, no breaking change → no Step 3 questions
Execute chain → requirements + auth_design + ux_design + impl + tests + verify + propagate
Report → "Added logout for central_admin role. 6 files touched, verify ✅"

Example 2 — Data model change (asks once)

User: "Course management should be a shared master with separate delivery targets."

Skill:

Classify → change_data_model, scope = database + api + lexicon + migrations + ux
Step 3 trigger: new term delivery_target (配信先) not in lexicon → ASK
- "I'll add delivery_target to the lexicon as 'a tenant/facility to which a course is distributed; many-to-one with course'. OK? Also, this changes the existing 1-course-1-tenant structure — migration required. Breaking change for existing course records is acceptable?"
After user confirms → execute chain including migration generation
Report

Example 3 — Reject as bug-fix territory

User: "The login page sometimes times out on slow networks."

Skill:

This is a bug, not a functional change. Decline politely:
- "This sounds like a bug. Use codd fix \"login times out on slow networks\" instead — codd-evolve is for intentional design changes."

Example 4 — Reject as greenfield

User: "Start a new SaaS project for restaurant reservations."

Skill:

This is greenfield. Decline:
- "For new projects, use codd init then codd plan followed by codd generate. codd-evolve is for evolving existing CoDD projects."

Absolute Constraints

These are non-negotiable. Violating any of them defeats the purpose of CoDD:

Never edit source without a corresponding design doc update. If the change requires source modification, requirements and design must already reflect it.
Never silently introduce a new lexicon term. Always ask the user first.
Never proceed past a red codd verify. Either retry (max 3) or stop and ask.
Never reorder the chain. Requirements → design → lexicon → source → tests → verify → propagate → runtime smoke. No shortcuts.
Never bypass user approval for breaking changes. "Breaking" means: existing API contract changes, existing data semantics change, existing user-visible behavior changes.
Never skip tests for new requirements. A new functional requirement without a corresponding new test is incoherent.
Never commit without user approval. Stage and propose, but do not commit autonomously.
Never declare done without runtime smoke verification (Step 8). codd verify green is necessary but not sufficient. Run codd verify --runtime and keep the generated raw-log report. Reporting done while DB/dev server is down — or while a regression like migration conflict blocks startup — is a critical violation. Either bring the runtime up and prove it, or do not declare done.

Guardrails

Use the codd command, not python -m codd.cli
Run from the project root (where codd/codd.yaml lives)
Each invocation should handle one logical change. If the user bundles multiple unrelated changes ("add logout and also restructure the course list"), split into separate runs
Preserve all frontmatter exactly — only modify doc bodies and append/remove sections as needed
When updating docs, do not gratuitously reformat unchanged sections — minimal diff is a feature
If the user is on a project where codd verify has never passed, do not start by attempting to evolve; recommend codd extract + codd fix to establish a green baseline first

Troubleshooting

"I don't know which design doc is affected"
- Read every doc under docs/design/ and docs/requirements/ and classify by frontmatter modules / topic
- Use codd impact to compute downstream effects from any candidate doc
- If still uncertain, ask the user one targeted question (not a list of 5)
"Lexicon term is borderline new vs existing"
- Treat as new. Always ask. The cost of asking is low; the cost of silent vocabulary drift is high
"Verify keeps failing after retries"
- Stop. Report which test, which file, which line. Let the user decide whether the design or the impl is wrong
"User keeps adding requirements mid-execution"
- Politely defer: "I'll finish this change first (estimated N minutes), then handle the next one"

Output Format

When reporting back to the user, always include:

Intent classification — what kind of change you understood
Files touched — grouped by docs / source / tests
Lexicon delta — new / changed / deprecated terms (or "no changes")
Verify status — red 0 ✅ or red >0 with concrete failures
Suggested commit message — single line, conventional commits format
Next action — what you recommend the user do (review, commit, request more changes)
Scope decisions — sub-scopes you intentionally excluded and the reason (e.g. "did not touch Module.tenant_id because it would require redesigning RLS isolation policies, out of scope for this evolution"). Required for change_data_model and cross_cutting types; optional but recommended for others. Pre-approval short-circuits from Step 3 should be listed here as well.

Why This Skill Exists

CoDD's value is coherence: requirements, design, lexicon, source, and tests move together so no document lies about the system. The CLI form (codd plan, codd implement, codd verify) makes this explicit and reproducible — ideal for greenfield projects and CI automation.

But the CLI form has a cost in Brownfield modification: the user must remember which command to run, in which order, with what arguments. Each codd fix "PHENOMENON" invocation is a context switch.

codd-evolve removes that cost by accepting natural language ("add logout button") and orchestrating the CLI chain underneath. The user expresses intent; coherence is preserved automatically. The CLI remains the engine; this skill is the conversational front.

This is not a replacement for the CLI. Both are first-class:

CLI for Greenfield, CI, automation, education, and third-party orchestrators
Skill for Brownfield, conversational modification, daily evolution within Claude Code

Use the right tool for the right phase.