name: checkpoint-summary description: Summarize artifacts produced by liza agents for human checkpoint review mode: pairing
Purpose
After agents complete a planning or writing phase (epic planning, story writing, spec generation), summarize their output so a human can efficiently review what was decided, what remains open, and where their attention is needed.
This skill answers: "What did the agents produce, what did they decide, and what do I need to weigh in on?"
The agents already did the work — planning, reviewing, approving. This skill reads their outputs and distills them into a checkpoint summary that respects the human's time.
Distinct from spec-review: spec-review audits spec quality. This skill summarizes what was already reviewed and approved, surfacing only what needs human judgment.
Trigger
Use this skill when:
- Epic planning completes and the human needs a checkpoint summary
- Story writing completes across multiple agents
- Any multi-agent phase produces artifacts the human hasn't read
- User asks "what did the agents produce?", "what needs my attention?", or "summarize the plans"
- Orchestrator requests a human checkpoint (Liza mode)
Inputs
The single entry point is .liza/state.yaml — the source of truth for all Liza state.
From state.yaml, the skill reads:
goal.spec_ref: the upstream source document the agents worked fromtasks[]: each task with its scope, status, output capabilities, approvals, and history- Artifact refs (read all that exist, in priority order):
tasks[].plan_ref/tasks[].arch_ref— task-level planning and architecture artifactstasks[].output[].plan_ref/tasks[].output[].arch_ref— output-entry artifactstasks[].spec_ref— task-level spec (may differ fromgoal.spec_ref)
tasks[].approvals[]: review verdicts with provider/diversity context (canonical); fall back totasks[].approved_byifapprovals[]is absenttasks[].history[]: full event timeline (claimed, checkpoint, submitted, approved, merged)sprint.statusandsprint.checkpoint_trigger: why the checkpoint was triggered
No other discovery is needed. Read every artifact file referenced by the ref fields above.
Read the upstream source (goal.spec_ref). Everything else is in the state file itself.
Protocol
Phase 1: Inventory
Read
.liza/state.yamlto understand the full pipeline state: goal, tasks, agents, sprint status, and checkpoint trigger.Read the upstream source (
goal.spec_ref) to understand what the agents were working from — entities, decisions, constraints, interactions, scope boundaries.Read all produced artifacts referenced by task-level refs (
plan_ref,arch_ref) and output-entry refs (output[].plan_ref,output[].arch_ref). Skip entries where no ref field points to a file. For each artifact read:- What was produced (title, scope, capabilities/stories count)
- What verdict the reviewers gave (
tasks[].approvals[]; fall back toapproved_by) - Key events from
tasks[].history[](rejections, re-reviews, anomalies)
Phase 2: Extract
From the artifacts and agent outputs, extract three categories:
Decisions Made
Choices the agents made that the upstream source left open. For each:
- What was decided: the specific choice, in one sentence
- Where: which artifact and section (e.g., EP-002 CAP-003, or story-007 AC-2)
- Confidence: if the agent flagged it (HIGH/MEDIUM/LOW), report that; if they didn't flag it at all, mark it as unflagged
- Reversible?: can this be changed later, or does downstream work lock it in?
- Departures from upstream: decisions that contradict or reinterpret the upstream text get flagged separately — even if defensible, the human should know
Open Points
Items that remain unresolved after the agent work:
- Open questions: items agents explicitly flagged as needing human input (OQ-tagged)
- Gaps: things no artifact addresses that the upstream source expects
- Cross-artifact inconsistencies: places where sibling artifacts disagree
- Underspecified areas: concepts mentioned across multiple artifacts but never defined in any of them (e.g., "clusters and tags" referenced in filtering, listing, and CLI but never given a creation mechanism)
Risks
Implementation risks the artifacts create or carry forward:
- Reuse assumptions: references to existing components without compatibility assessment
- Atomicity/complexity: operations described as atomic that may be difficult to implement
- Handoff gaps: one artifact's output is another's input, but the interface isn't defined
- Irreversible operations: without undo or rollback being in scope
Phase 3: Prioritize
Not everything needs human attention. Classify each item:
| Priority | Meaning | Action needed |
|---|---|---|
| Decide | Human must make a choice before next phase starts | Present the decision with options |
| Confirm | Agents made a reasonable choice — human should validate | Present the decision, default is accept |
| Note | Worth knowing, no action needed | Include in summary, don't interrupt for it |
Prioritization heuristics:
- Unflagged decisions that depart from upstream text → Decide
- LOW-confidence assumptions → Decide
- MEDIUM-confidence assumptions on irreversible operations → Decide
- MEDIUM-confidence assumptions on reversible operations → Confirm
- HIGH-confidence assumptions → Note (unless they depart from upstream)
- Gaps that block the next phase → Decide
- Gaps that affect only implementation details → Note
- Risks with no mitigation path → Confirm
- Risks with natural mitigation during implementation → Note
Phase 4: Report
Present in this format. Decide items first, then Confirm, then Notes.
# Checkpoint Summary: [Phase Name]
## Status
| Artifact | Scope | Verdict |
|----------|-------|---------|
| [name] | [one-line scope] | [Approved / Rejected / Conditional] |
> N decisions needing input · M items to confirm · K notes
## Decisions Needing Human Input
Items where the human must choose before the next phase starts.
### [Decision Title]
- **Context:** [What the upstream says or doesn't say]
- **Agent decision:** [What the agents chose]
- **Why it needs you:** [What makes this non-obvious — departure from upstream,
irreversible, low confidence, or gap]
- **Options:** [If applicable — confirm agent choice, override with X, or defer]
## Decisions to Confirm
Agents made reasonable choices. Confirm or override.
### [Decision Title]
- **Agent decision:** [What was chosen]
- **Rationale:** [Why it's reasonable]
- **Override if:** [When the human might want something different]
## Open Points
Unresolved items carried forward.
### [Item Title]
- **What's open:** [Description]
- **Impact:** [What's affected if unresolved]
- **Where it surfaces:** [Which artifacts reference this]
## Risks
### [Risk Title]
- **What could go wrong:** [Description]
- **Which artifacts:** [Where this risk lives]
- **Mitigation:** [If any exists in the plans, or "none specified"]
## Notes
Brief items worth knowing but not requiring action.
- [Item]: [One-line description]
Constraints
- DO NOT modify source artifacts — only the generated checkpoint summary report may be written
- DO NOT re-review or second-guess the reviewer's verdict — summarize it
- DO NOT bury decisions in long prose — one item, one heading, one clear question
- DO surface unflagged decisions — the highest-value findings are choices agents made without marking them as assumptions
- DO state the specific question the human needs to answer, not just "this needs review"
- DO keep the summary scannable — a human should grasp the state in under 2 minutes
- DO report the total count of decisions/open points/risks in the Status section so the human can gauge the review effort before diving in
Anti-Patterns
- Re-reviewing approved work: The agents already reviewed. Don't re-evaluate the architecture, question design choices that were explicitly approved, or propose alternatives. Summarize what was decided.
- Exhaustive listing: Dumping every assumption from every artifact. The human doesn't need to see 40 HIGH-confidence assumptions that are obviously correct. Filter to what matters.
- Missing the silent decisions: Focusing on what agents flagged (assumptions, open questions) and ignoring what they didn't. The most important items are often decisions baked into the plan without being called out.
- Vague attention items: "The CLI design needs review" is not actionable. "The CLI plan doesn't expose project deletion, but the API supports it — should users be able to delete projects in v1?" is actionable.
- Severity inflation: Marking everything as "Decide". Most items are "Confirm" or "Note". Reserve "Decide" for items where the human genuinely has a choice to make that changes the downstream work.
Integration
| Skill | Relationship |
|---|---|
| epic-writing | Upstream producer. Summarize epic plans at the planning checkpoint. |
| user-story-writing | Upstream producer. Summarize stories at the story-writing checkpoint. |
| spec-review | Complementary. spec-review finds spec defects; this summarizes agent decisions. Different purposes, can run on the same artifacts. |
Mode-Specific Behavior
Pairing mode: Present the summary interactively. Walk through "Decide" items one at a time, collecting the human's decision before moving to the next. For "Confirm" items, present as a batch — the human can scan and override selectively. End with a count of decisions made and items still open.
Liza mode: Checkpoint Summary operates autonomously within task scope. Write the
report to the worktree (e.g. docs/checkpoint-summary.md) and submit for review.
If any "Decide" items exist, mark BLOCKED with blocked_reason summarizing the
decisions needed and blocked_questions listing each one — the human must resolve
them before the next phase starts. If only "Confirm" and "Note" items exist, submit
normally.
| Pairing Prompt | Liza Behavior |
|---|---|
| "N decisions need your input — walk through them?" | Mark BLOCKED; report in worktree |
| "Agents made N decisions — all look reasonable. Confirm?" | Submit for review; report in worktree |
| "No open points — ready for next phase" | Submit for review; report in worktree |