recipe - SKILL.md Agent Skill

name: recipe description: Draft an AI Cookbook recipe for handley-lab.github.io. Two paths — short (interview-driven YAML card only) and long-form (forensic JSONL analysis with the 5-minute-cap pair-timing methodology and a Socratic verification loop). Hands off to the website skill for preview. allowed-tools: []

Recipe skill

Use this skill to add a recipe to the AI Cookbook at /cookbook/. The reference long-form companion page is _recipes/jaxwavelets.md, produced by the long-form path below.

When to use

The user wants to draft a short recipe card for the AI Cookbook (use the short path below — interview-driven, no JSONL forensics).
The user wants a fuller forensic write-up of a past Claude Code session (use the long-form path — Phases 0–6 below).
A workshop attendee is sitting next to a freshly-cloned handley-lab.github.io and wants Claude to produce a draft from one of their past sessions.

Two paths

The AI Cookbook is a single content channel; recipes vary in depth.

Short recipes — interview the user, draft a YAML card. No ~/.claude/projects/ search, no JSONL parsing, no active-time statistics. Most workshop submissions are this shape.
Long-form recipes — full audit trails with timing analysis, prompt classification, and stall reconstruction. Phases 0–6 below. Use when the session itself is the subject of scrutiny.

If unsure, ask the user. Short recipes are the default for "this is a workflow pattern"; long-form is the default for "this is a session that warrants forensic depth."

Short path — interview to YAML card

This path deliberately skips Phases 0–3. Do not search ~/.claude/projects/, do not parse JSONL, and do not compute active-time statistics unless the user explicitly asks. A short recipe is a reusable pattern, not an audited session report. Forcing it through Phase 0–6 is a category error.

Workflow

Pick an <id> (kebab-case).
Interview the user with these prompts:
- What were you trying to do?
- What context did you give the model?
- What did the model produce?
- What became newly possible — not just faster?
- What evidence could we show publicly?
- What should another researcher copy?
Draft the YAML card.

Keep the draft short. Avoid "look how efficient we are" framing. Prefer "this interface made a new kind of scientific work possible." Plain descriptive titles, not clever ones.

Output schema

- id: <id>
  title: "<plain descriptive title>"
  category: "<e.g. Context Engineering, Research Communication, Ambient AI>"
  timebox: "<e.g. 10 minutes, one afternoon, multi-session>"
  status: "Draft"                   # Stub | Draft | In progress | Complete
  proposer: "<full name>"
  summary: "<1–2 sentences. Lead with what became possible, not what was efficient.>"
  outcome: "<concrete artefact or capability. Required, non-empty.>"
  evidence: "<how a sceptic could verify it. Required, non-empty.>"
  demo_url:
  repo_url:

Publishing during a workshop

Post the YAML to the coordination issue (e.g. issue #3 for the 5 May 2026 workshop) for the orchestrator to batch-merge into _data/recipes.yml. Multiple parallel direct edits to that file create avoidable merge conflicts.

Outside a workshop context, hand off to the website skill to commit the card on a feature branch and open a PR.

Long-form path (Phases 0–6)

What this skill is and is not

It is documentation that orchestrates a phased workflow with Socratic verification at each step. It instructs Claude to write and run analysis scripts under /tmp/recipe-<id>/ and to pause for user confirmation between phases.
It is not committed code. Each invocation generates its own analysis scripts in /tmp so the user's clone is never mutated during analysis.

Two companion files

methodology.md — the JSONL parsing rules, the 5-minute-cap active-time analysis, the Socratic verification rules. Read it before running any analysis — these rules are the hard-won output of the original jaxwavelets recipe.
template.md — the section scaffold to fill at Phase 4.

Phases

The phases are run in order. Each one produces output and pauses for user confirmation before proceeding. Do not run end-to-end without the human in the loop — the methodology depends on it.

Phase 0 — Session discovery (no UUID required)

The publication <id> is not chosen yet at Phase 0. Use scratch directory /tmp/recipe-discovery/ until the user picks an <id> (Phase 1 onward); then move scratch into /tmp/recipe-<id>/.

Use the cross-platform Python discovery script in methodology.md. Save it to /tmp/recipe-discovery/discover.py and run with python3 /tmp/recipe-discovery/discover.py 30 (the argument is the recency window in days). The script avoids GNU-only flags and Python ≥3.10-only syntax, so it runs on macOS and on Python 3.8+.

The script lists Claude Code session JSONLs across ~/.claude/projects/*/ modified in the configured window, skipping /subagents/ paths, and prints a numbered table with index, modification date, cwd, line count, and a short excerpt of the first and last real user prompts (system reminders stripped).

The user can:

pick a number from the listing. The default run writes /tmp/recipe-discovery/candidates.json; resolve the picked index with python3 /tmp/recipe-discovery/discover.py --index N which prints the JSONL path.
supply a UUID/session id. Run python3 /tmp/recipe-discovery/discover.py --uuid <uuid> — the script first looks for an exact filename match, then scans for sessionId == <uuid>.
supply an absolute JSONL path directly.

Phase 1 — Inventory

Once the user has chosen a session, ask them for an <id> (kebab-case, will become the URL slug /cookbook/<id>/). Move scratch from /tmp/recipe-discovery/ to /tmp/recipe-<id>/. From here on, all scratch files live under /tmp/recipe-<id>/.

Read the chosen JSONL using the parsing rules in methodology.md. Save to /tmp/recipe-<id>/inventory.md:

Total messages by top-level type.
Breakdown of assistant content blocks (text / tool_use / thinking).
Wall-clock span (first → last timestamp).
Tool-call counts by tool name.
mcp__llm__review verdict counts: APPROVED / FIXES REQUIRED (literal) plus colloquial-fallback bucket (not approved, not ready, rejected) and an unknown bucket. Include a short excerpt from each unknown review so the user can adjudicate (see methodology.md Review verdict extraction).
Prompt-count audit: real user(prompt) count, plus a breakdown of excluded entries by reason ([Request interrupted by user], <task-notification>, <command-message>, system-reminder-only, tool_result). See methodology.md Prompt-count audit.

Pause. Ask the user to confirm this is the session they meant.

Phase 2 — Active-time analysis

Apply the 5-minute-cap pair-timing methodology (see methodology.md). Headline outputs:

Active computation — sum of three retained-pair categories (≤ 5 min):
- tool execution (assistant(tool_use) → user(tool_result))
- AI thinking → text (user(tool_result) → assistant(text))
- AI thinking → tool (user(tool_result) → assistant(tool_use))
Reported alongside, NOT in active total:
- AI continuation after human prompt (user(prompt) → assistant(*))
- human response gap (assistant(text) → user(prompt))
Wall-clock span and (active / wall-clock) ratio.
Percentile distribution of human-response gaps; a 30-second-cap "attention" estimate as a separate, lower-bound figure.
Long-tool-execution exceptions: any assistant(tool_use) → user(tool_result) gap > 5 min, with the tool_use input and the tool_result content. Watch the subtle case: an instant-success tool_result on a multi-hour gap means the session suspended at the tool boundary — not a long tool. See methodology.md.
Candidate stalls/breaks: any gap > 30 min, with start/end, Δ, and the messages immediately either side. Quote them verbatim. Long-gap entries must NOT be labelled with retained-pair category names (e.g. don't call a 5h gap "tool execution"; label it "candidate stall, requires user adjudication").

Pause for each long gap. Quote the entry immediately before and the entry immediately after, identify the boundary pair type, and ask the user to classify it (real long-running tool, suspended/stalled session, or human break/sleep). See methodology.md for the pair-type taxonomy and why this Socratic pause is required, not optional.

Save the analysis script to /tmp/recipe-<id>/timing.py and the report to /tmp/recipe-<id>/timing.md.

Phase 3 — Tool & prompt classification

Tool-call counts by name, and human-prompt classification (rough categories: strategic / review-trigger / design-decision / quality-challenge / operational / continuation). Heuristic — keyword + length. Output as a frequency table to /tmp/recipe-<id>/classification.md. The user can override classifications.

Phase 4 — Narrative scaffold

Copy template.md to /tmp/recipe-<id>/draft.md and fill it with the computed numbers and lifted excerpts. Save the proposed card YAML separately to /tmp/recipe-<id>/card.yml.

Do not edit _recipes/ or _data/recipes.yml until Phase 6. Phase 4 is non-mutating: all output stays under /tmp/recipe-<id>/.

Sections (matching the jaxwavelets exemplar; see also the 5 May 2026 workshop critique that long forms should aim for 3–4 minute reads with forensic detail in the Methods note):

Abstract
Motivation (the embodied intelligence idea, or the project-specific intellectual hook)
N Principles (project-specific — let the user name them)
What Required Human Intervention
The Workflow
What Happened (compact session overview)
Reproducibility (with links — repo, PyPI, philosophy doc, transcript availability)
Limitations
Methods note (the timing / percentile detail moved out of the main flow)

The principles section is author-authored — produce a stub with prompts for the user to fill, not invented content. Same for the abstract: produce a draft, but flag it as a draft and ask the user to challenge or rewrite.

Phase 5 — External review (optional, opt-in, privacy-aware)

Before sending anything externally:

List the artefacts that would be sent: draft markdown, full JSONL, redacted JSONL, draft only.
Warn explicitly that JSONLs may contain private repo paths, emails, unpublished code, accidentally-pasted credentials, or supervision-private material.
Offer four modes:
- (a) draft only;
- (b) draft + redacted excerpts (auto-redact emails / token-shaped strings / absolute home paths);
- (c) draft + full JSONL after user confirms;
- (d) skip external review.
If mcp__llm__review is unavailable, fall back to a manual prompt the user can paste into ChatGPT/Gemini.

Default: draft only. Don't escalate without explicit user opt-in.

Phase 6 — Publish

Hand off to the website skill. Inputs from Phase 4: /tmp/recipe-<id>/draft.md and /tmp/recipe-<id>/card.yml.

The website skill is responsible for:

Running its scripted duplicate-ID check (yml + filename + frontmatter).
Creating the recipe/<id> feature branch.
Writing _recipes/<id>.md from the Phase 4 draft.
Appending the card to _data/recipes.yml from the Phase 4 card.yml using the full schema (id, title, category, timebox, status, summary, outcome, evidence, optional demo_url, repo_url).
Building, leak-checking, previewing at /cookbook/<id>/.
Committing and opening a PR.

Until Phase 6 is invoked, nothing under the website's tracked source has been mutated — the recipe skill alone never edits _recipes/ or _data/recipes.yml.

Edge cases

Sessions without timestamps (older Claude Code versions): fall back to wall-clock-only analysis with a banner. No 5-min cap is possible.
Compacted sessions: the first user message may be a context summary, not the original prompt. Detect <command-name> or Caveat: markers and warn.
Subagent JSONLs: under <session>/subagents/agent-*.jsonl. Excluded from Phase 0 listing.
PII / credentials: see Phase 5. Default opt-out for JSONL escalation.
Non-handley-lab sessions: ask the user whether the recipe should still publish to handley-lab.github.io or be kept private/local.
Long-running tool execution mistaken for stall: methodology distinguishes tool_use→tool_result gaps from text→prompt gaps. Inspect tool input/output before classifying.

Cautionary tale

The original jaxwavelets long-form went through several Socratic corrections before reaching the published numbers. From session 180860e2 (the analysis session that produced _recipes/jaxwavelets.md):

"When you say 46 hours — there were plenty of pauses there. Is there a way to determine how long it actually took?" — forced the introduction of the 5-min-cap method. Wall-clock 46h → active 4.7h.
"are you sure about the overnight thing? I remember waking up and sighing because all it had done was written a plan." — corrected an over-confident "productive overnight" narrative to a 5h22m stall + 18m burst.
"are you sure about 6.2h?" — pushed a tighter recomputation that landed on 4.7h.
"Take me through how you've come to that?" — required the methodology be explained, not just the result.

This is the discipline. All headline numbers are provisional until the user confirms them. See methodology.md for the full rules.