name: agent-research description: Strict research and design workflow for general coding-agent loops with evidence-backed benchmarking, iterative user discussion, and explicit execution handoff. version: 1.1.0 author: Pulse Coder Team
Agent Research & Design Skill (Strict, English)
Goal
This skill is for systematically researching and designing a general Agent Loop / Coding Agent capability model, ensuring:
- Atomic capabilities are clearly defined and implementable.
- Benchmarking against mainstream implementations (pi-mono / OpenCode / Claude Code) is evidence-backed.
- The proposed design is discussion-ready and iteration-friendly.
- No implementation starts before explicit user confirmation.
Non-Negotiable Rules
Research first, design second, execution last.
Do not move into implementation before research + design convergence.All key claims must be source-traceable.
Every important claim must include URL citations. If not, mark as assumption/speculation.Separate facts from interpretation.
Explicitly label output as:FactInferenceProposal
Minimum research depth is mandatory.
- At least 8 rounds of search, at most 12 rounds.
- At least 12 unique source URLs.
- Must cover 3 required targets: pi-mono, OpenCode, Claude Code.
Atomic capabilities must follow a standard schema.
Each capability must include: definition, input, output, trigger, failure mode, recovery strategy, success metrics.No execution without explicit user Yes.
Always ask for final execution confirmation before implementation actions.
Required Execution Flow
Phase 0: Scope Alignment
Before research starts, clarify:
- Target context (CLI agent / IDE agent / server agent)
- Constraints (time, budget, model limits, tool permissions)
- Deliverable depth (concept paper / technical design / executable task plan)
If missing info exists, ask questions first.
Phase 1: Atomic Capability Design
Draft the capability map first, including at least:
- Task Understanding
- Planning
- Tool Use
- Context Management
- Code Execution Loop
- Reflection / Critique
- Safety & Guardrails
- Memory
- Human-in-the-loop
- Evaluation & Telemetry
For each capability, provide:
- Name
- Definition
- Input
- Output
- Trigger
- Tool/Context dependencies
- Failure mode
- Recovery strategy
- Success metrics (prefer measurable metrics)
Phase 2: External Benchmarking
Mandatory Targets
- pi-mono
- OpenCode
- Claude Code
Optional Targets
- Cursor Agent
- Aider
- Copilot CLI
Research Constraints
- 8-12 progressive search rounds (overview → architecture → loop mechanics → trade-offs → validation)
- For each round, report: objective, queries, new findings, open questions
- At least 12 unique URLs
- Source priority: official docs > source repo/code > technical blogs > secondary summaries
- Cross-validate key claims with at least 2 sources (otherwise mark as low confidence)
Phase 3: Synthesis & Proposal
Output must include:
- Atomic capability catalog (standardized definitions)
- Benchmark matrix (implementation comparison)
- Recommended architecture (MVP / v1 / v2)
- Core trade-offs (complexity, cost, robustness, extensibility)
- Risk register with mitigation strategies
- Open decisions list (5-10 items)
Also explicitly mark:
- what is fact,
- what is inference,
- what is proposal.
Phase 4: Discussion Loop with User
In each discussion iteration:
- Provide “current recommendation + 3-5 key questions”
- Maintain status buckets:
- Confirmed
- Pending
- Changed
- Publish a decision snapshot at the end of each iteration
Move to final confirmation when pending items ≤ 2, or user says design is converged.
Phase 5: Execution Confirmation
Use an explicit confirmation question:
“The design is now largely converged. Do you want me to enter execution mode? Options:
- detailed technical design,
- task breakdown & timeline,
- code skeleton,
- direct implementation.”
- If user confirms: re-confirm scope boundaries, then execute.
- If user declines: deliver finalized design artifacts and next-step recommendations only.
Required Output Artifacts
A. Capability Spec Table
| Capability | Definition | Input | Output | Trigger | Failure Mode | Recovery | KPI |
|---|
B. Benchmark Matrix
| System | Loop Pattern | Tool Strategy | Context Strategy | Safety Mechanism | Strengths | Limitations |
|---|
C. Evidence Ledger
| Claim | Type (Fact/Inference) | Source URL | Confidence (High/Med/Low) | Notes |
|---|
D. Roadmap
- MVP (2-4 weeks)
- v1 (1-2 months)
- v2 (continuous optimization)
E. Decision Log
- Confirmed
- Pending
- Changed
Completion Criteria
This skill is complete only when all items are satisfied:
- 8+ research rounds completed
- 12+ unique sources provided
- Atomic capabilities standardized
- Benchmark matrix + evidence ledger delivered
- At least one user discussion iteration completed
- Explicit execution confirmation asked and recorded