name: writing-odd-state-files description: | Hardens Outcome-Driven Development (Aden ODD) execution by enforcing the Goal → Success Criteria → Constraints → Tests → Outcomes → Judgment loop and the state file layout that persists it. Use when authoring or updating an ODD state file, when the user says "ODD", "outcome-driven", "Aden ODD", "goals + success criteria", "create state file", "lifecycle DRAFT/READY/ ACTIVE/COMPLETED", or when work needs measurable acceptance (weighted threshold ≥0.90) instead of vibes-based done. Also triggers when the user asks to "loop on success criteria until ACCEPT", needs a HARD GATE before implementation, wants Tests-as-Hypotheses (one per criterion), or wants a Process Tree to bind workflow control flow. Do NOT use for plan files (use write-plan), traces (use writing-traces), domain models (use writing-ddd-domain-models), or simple TODOs.
Writing ODD State Files
%% Last Modified: 05/03/26 11:13:39 %%
State files are the persistent memory of an Aden ODD run. Without [w] writes after every transition, restarts wipe the evidence chain and the loop has to re-author Goal/Criteria/Constraints from scratch. This skill enforces the layout, the lifecycle, and the process tree that bind any ODD-conformant workflow.
Expert Vocabulary Payload
%% Last Modified: 05/03/26 11:13:39 %%
Aden ODD Spec (must-follow): Goal (intent + direction, no outcomes), SuccessCriterion (id + description + metric + target + weight ∈ [0,1]), Constraint (type ∈ {hard, soft} × category ∈ {time, cost, safety, scope, quality}), Outcome struct (success, result, error, state_changes, tokens_used, latency_ms, summary), feedback loop judgment (ACCEPT, RETRY, REPLAN, ESCALATE), lifecycle (DRAFT → READY → ACTIVE → COMPLETED | FAILED | SUSPENDED), HARD GATE [e]
Metrics: output_contains, output_equals, llm_judge, custom, weighted score, ACCEPT threshold ≥0.90, weight sum = 1.0, hypothesis ("if system correct → outcome X under condition Y")
Process Tree (van der Aalst §3.2.8): sequential (→), exclusive choice (×), parallel (∧), redo loop (↻), silent activity (τ), do-part, redo-part, canonical inline form, visual tree form, activity legend, conformant trace, re-entry rule
State Persistence: session provenance, [r] read recovery, [w] write transition, change log, lifecycle re-entry, scope-out via τ, NBA (Next Best Actions), session learnings (lost on restart)
Constraint Taxonomy: hard-scope, hard-quality, hard-safety, hard-time, hard-cost, soft-quality, soft-time, fallback behavior, no-crash guarantee
Anti-Pattern Watchlist
%% Last Modified: 05/03/26 11:13:39 %%
Scan every state file you author or modify against these. Fix violations before [w].
1. silent-gate-bypass
%% Last Modified: 05/03/26 11:13:39 %%
Detection: Lifecycle advances DRAFT → READY (or READY → ACTIVE) without an explicit user confirmation message in the transcript. The HARD GATE [e] node fires without "user said yes." Resolution: STOP. Mark lifecycle SUSPENDED with reason "awaiting [e] HARD GATE." Surface DRAFT artifacts (Goal, T4, T5, T6) inline + state file path. Do NOT proceed until user types explicit approval.
2. goal-mutation-without-escalate
%% Last Modified: 05/03/26 11:13:39 %%
Detection: The Goal text changes between writes without an [l] ESCALATE redo entry in the Change Log. Framework rewrote intent autonomously. Resolution: Revert Goal to last user-validated text. If the loop genuinely needs Goal change, surface as Type I decision: "Tests reveal Goal assumption flawed. ESCALATE [l] needed — please confirm new Goal."
3. vague-success-criteria
%% Last Modified: 05/03/26 11:13:39 %%
Detection: A criterion lacks any of id, description, metric, target, weight, OR weights do not sum to 1.0, OR metric is freeform prose ("works well", "is fast") instead of output_contains | output_equals | llm_judge | custom.
Resolution: Rewrite each row as a 7-column table: ID | Description | Metric | Target | Weight | Test (hypothesis). Verify sum(weights) == 1.0 before [w].
4. untagged-constraints
%% Last Modified: 05/03/26 11:13:39 %%
Detection: Constraints written as bullets without [category] tag (e.g. - HC1: must be fast). Soft/hard split missing.
Resolution: Reformat as HC<n> [<category>]: <constraint> with category ∈ {time, cost, safety, scope, quality}. Group under **Hard:** and **Soft:** headings.
5. outcome-struct-elision
%% Last Modified: 05/03/26 11:13:39 %%
Detection: Test execution at [g] reports only pass/fail or only result. Missing fields from {success, result, error, state_changes, tokens_used, latency_ms, summary}.
Resolution: Each test execution must emit ALL 7 fields. If a field is N/A (e.g., state_changes for pure functions), record null explicitly — never omit.
6. implementation-before-tests
%% Last Modified: 05/03/26 11:13:39 %%
Detection: Code written for the system before [f] generates failing tests. Tasks table shows [j] entries with no preceding [f]/[g]. Resolution: Block [j]. Write tests first (one hypothesis per SuccessCriterion). Run [g] and confirm 100% fail (TypeError, AssertionError) BEFORE any production code.
7. re-entry-re-authoring
%% Last Modified: 05/03/26 11:13:39 %%
Detection: A new session re-runs [a] [b] [c] [d] when lifecycle ∈ {ACTIVE, COMPLETED, FAILED}. State file shows duplicate Goal/Criteria sections or rewritten T3-T6 without justification. Resolution: Always run [r] FIRST. If lifecycle ≥ ACTIVE → take τ-skip on the DRAFT+gate block. Resume at the test loop or operational phase per recovered position.
8. silent-write-elision
%% Last Modified: 05/03/26 11:13:39 %%
Detection: Lifecycle transition (DRAFT→READY, RETRY judgment, ACCEPT) occurred but no [w] row in the Tasks table or no entry in the Change Log.
Resolution: Append [w] row + Change Log entry retroactively. Persist before next action. Continuous [w] is non-negotiable.
9. weight-arithmetic-failure
%% Last Modified: 05/03/26 11:13:39 %%
Detection: Weighted score at [h] computed wrong (sum > 1.0, or arithmetic error, or missing criterion contribution).
Resolution: Show the explicit math in the judgment row: C1(0.20) + C2(0.25) + ... = 1.00. ACCEPT only if computed value ≥ 0.90. Below threshold → choose RETRY/REPLAN/ESCALATE with named reason.
10. mixed-goal-and-outcomes
%% Last Modified: 05/03/26 11:13:39 %%
Detection: The Goal section contains measurable targets ("achieve 95% accuracy") or test descriptions. Goal has been polluted with what should be in T4/T5. Resolution: Strip Goal back to intent + direction only. Move targets to Success Criteria (T4). Move guardrails to Constraints (T5).
Behavioral Instructions
%% Last Modified: 05/03/26 11:13:39 %%
Phase 0: Anti-Pattern Scan
%% Last Modified: 05/03/26 11:13:39 %%
- Before any write, scan the existing state file (if any) against the Anti-Pattern Watchlist above. Fix detected violations as the first action.
Phase 1: [r] Recover Lifecycle (MANDATORY first)
%% Last Modified: 05/03/26 11:13:39 %%
- Read the state file at the canonical path (or ask user where it lives). Locate the Lifecycle section.
- Classify recovered lifecycle:
(none)orDRAFTorREADY-pre-gate→ take authoring branch (Phase 2).ACTIVE,COMPLETED,FAILED, orSUSPENDED-post-gate→ τ-skip authoring. Resume at the test-outcomes loop (Phase 3) or terminal node.
- Write the recovered position into a one-line status block before proceeding. WHY: makes re-entry decision auditable.
Phase 2: DRAFT Authoring (only if lifecycle requires)
%% Last Modified: 05/03/26 11:13:39 %%
- Author Goal [a] — one paragraph of intent + direction. NO outcomes, NO measurable targets, NO test descriptions. If you find yourself writing "achieve X%" you are in T4 territory.
- Author Success Criteria [b] — table with columns
ID | Description | Metric | Target | Weight | Test (hypothesis). One row per criterion. Verifysum(weights) == 1.0. Each metric ∈ {output_contains, output_equals, llm_judge, custom}. - Author Constraints [c] — split into
**Hard:**and**Soft:**sections. Each entry:<H|S>C<n> [<category>]: <constraint>. Categories: time | cost | safety | scope | quality. Hard constraints define failure; soft constraints prefer behavior. - Author Context [d] — durable background that shapes agent behavior. File paths, existing patterns, schema, references, links. NOT decisions or rationale (those go in Change Log).
- Run [w]: persist DRAFT artifacts + initial Change Log entry.
Phase 3: HARD GATE [e]
%% Last Modified: 05/03/26 11:13:39 %%
- Surface DRAFT artifacts to user inline + state file abs path. State explicitly: "DRAFT ready for review. HARD GATE — please confirm Goal + C1..Cn + constraints, or call out revisions."
- STOP. Do not advance lifecycle. Wait for explicit user approval. IF user unavailable → write lifecycle =
SUSPENDEDwith reason; halt. - Once user confirms → lifecycle DRAFT → READY → ACTIVE. Run [w].
Phase 4: Test-Outcomes Loop ↻
%% Last Modified: 05/03/26 11:13:39 %%
- [f] Generate tests — one test per SuccessCriterion.id. Each test framed as a hypothesis: "if system correct → outcome X under condition Y."
- [g] Run tests — emit FULL Outcome struct per execution:
{success, result, error, state_changes, tokens_used, latency_ms, summary}. Usenullfor N/A fields, never omit. - [h] Aggregate — compute weighted score:
sum(criterion.weight × pass_indicator). Show the arithmetic explicitly. Choose judgment:- score ≥ 0.90 → ACCEPT (exit loop, advance to terminal)
- score < 0.90, system code is the issue → RETRY [j]
- score < 0.90, criteria thresholds or system structure off → REPLAN [k]
- score < 0.90, Goal assumption flawed → ESCALATE [l] (Type I — surface to user)
- Run [w] after each judgment. Loop back to [f] if not ACCEPT.
Phase 5: Optional Operational + Real-World Phases
%% Last Modified: 05/03/26 11:13:39 %%
- IF scope includes staging/pilot deploy → run operational phase [m]→↻[n]→[w]. Otherwise → mark
scope-out [τ]in Tasks table with reason. - IF scope includes production deploy → run real-world phase [o]→↻[p]→[w]. Otherwise → mark
scope-out [τ]in Tasks table with reason.
Phase 6: Terminal
%% Last Modified: 05/03/26 11:13:39 %%
- [q] Emit COMPLETED — update Lifecycle section, mark all Tasks ✅, append final Change Log entry summarizing T0→Tn.
- Run final [w].
Output Format — State File Layout
%% Last Modified: 05/03/26 11:13:39 %%
!echo /Users/wesleyfrederick/.claude/skills/caveman/SKILL.md
ALWAYS Use this file EXACTLY for output layout: !
jact extract file ${CLAUDE_SKILL_DIR}/references/state-file-template.md, BECAUSE it contains the GOLD STANDARD for state filesWHEN writing output, ALWAYS use
/caveman ultraskill, BECAUSE it saves on token usage and prevents LLM vomit that does not add value
Examples
%% Last Modified: 05/03/26 11:13:39 %%
Example 1: BAD vs GOOD Goal
%% Last Modified: 05/03/26 11:13:39 %%
BAD:
## Goal
Build a learnings cache that achieves 90% hit rate, runs in <50ms, supports
session isolation, and passes all 6 tests including TTL sweep within 7 days.
Mixes intent with measurable targets and test descriptions. Pollutes Goal with T4/T5 content. Triggers mixed-goal-and-outcomes anti-pattern.
GOOD:
## T3 — Goal
Cut repeat full-body learning injection per session. Cache by session_id +
content-hash (jact pattern) → mid-session edit auto-invalidate. Markers live
per-repo in `.learnings/` (mirror `.jact/` convention). Cold start = baseline.
**Files:** [paths]
**Ref pattern:** [path] — `{sid}_{md5(file)}`
Pure intent + direction. Targets live in T4. Constraints live in T5. Goal stays stable across the whole loop unless [l] ESCALATE.
Example 2: BAD vs GOOD Success Criteria
%% Last Modified: 05/03/26 11:13:39 %%
BAD:
- Should be fast
- Should not regress baseline
- Should support multiple sessions
- Tests should pass
No IDs. No metrics. No targets. No weights. No hypothesis. Triggers vague-success-criteria anti-pattern.
GOOD:
| ID | Description | Metric | Target | Weight | Test (hypothesis) |
|---|---|---|---|---|---|
| C1 | Cold cache emits full body per ID | output_contains | Each ID's `### {title}\n{body}` present | 0.20 | Empty cache run → diff vs baseline |
| C2 | Warm cache emits title-only on repeat | custom | 2nd-fire output: `### {title}` line + zero body lines | 0.25 | 2x run same sid → 2nd has titles only |
| C3 | Sessions isolated | custom | sid=B output identical to cold-cache regardless of sid=A history | 0.15 | sid=A then sid=B → B full bodies |
| C4 | Cold byte-identical to baseline | output_equals | Cold output == pre-change baseline byte-for-byte | 0.20 | Snapshot test |
| C5 | Mid-session edit invalidates cache | custom | After file md5 changes, next inject = full body | 0.15 | Run sid=A → edit → run sid=A → full |
| C6 | TTL sweep removes >7d markers | custom | After invocation, no marker w/ mtime older than 7d | 0.05 | touch marker mtime=-8d → run → marker gone |
**Weight sum:** 1.00. **Threshold:** ≥0.90 → ACCEPT.
Every criterion: id + description + metric + target + weight + hypothesis. Weights sum = 1.0. Each row maps directly to a [f]/[g] test execution.
Example 3: BAD vs GOOD Outcome at [g]
%% Last Modified: 05/03/26 11:13:39 %%
BAD:
- Test C1: passed
- Test C2: failed (cache not hit)
- Test C3: passed
Missing success / result / error / state_changes / tokens_used / latency_ms / summary. Triggers outcome-struct-elision.
GOOD:
- C1 Outcome: {success: true, result: "full body emitted for 3/3 IDs",
error: null, state_changes: ["3 markers written"], tokens_used: 1240,
latency_ms: 87, summary: "cold cache baseline-equivalent"}
- C2 Outcome: {success: false, result: "full body emitted (expected title-only)",
error: "cache miss on warm key {sid}_{md5}", state_changes: [],
tokens_used: 1240, latency_ms: 91, summary: "key mismatch — investigate _cache_key"}
Full struct per execution. The error field on C2 directly informs RETRY [j] target (fix _cache_key).
Questions This Skill Answers
%% Last Modified: 05/03/26 11:13:39 %%
- "Create a state file for this work using ODD"
- "How do I structure goals and success criteria for Aden ODD?"
- "Build me an outcome-driven plan with HARD GATE"
- "I want to loop on success criteria until ACCEPT"
- "What goes in T3/T4/T5/T6?"
- "Set up the lifecycle DRAFT → READY → ACTIVE → COMPLETED for this task"
- "Add a process tree to my plan"
- "Make this plan measurable instead of vibes"
- "Why did my agent skip the HARD GATE?"
- "How do I write tests as hypotheses for ODD?"
- "What's the right shape for Outcome struct?"
- "My weights don't sum to 1.0 — fix the criteria table"
- "Persist state — I keep losing context on restart"
- "Resume an ODD run from where it left off"
- "Goal vs Success Criteria vs Constraints — which goes where?"