ai-agent-design

name: ai-agent-design description: Design autonomous AI agents that reason, plan, and execute tasks tier: standard applyTo: '**/agent,**/ai-agent,**/orchestrat,**/multi-agent'

AI Agent Design Skill

Patterns for designing AI agents—autonomous systems that use LLMs to reason, plan, and execute multi-step tasks.

Agent vs Chatbot vs Workflow

Aspect	Chatbot	Workflow	Agent
Autonomy	Low	None	High
Planning	None	Predefined	Dynamic
Tool Use	Limited	Fixed	Flexible
Memory	Session	None	Persistent
Error Recovery	Retry	Fail	Reason & adapt

Core Patterns

ReAct (Reasoning + Acting)

1. Thought: Reason about the task
2. Action: Choose and execute a tool
3. Observation: Process tool output
4. Repeat until complete

Example:

Thought: Need Seattle weather to answer umbrella question
Action: weather_api(location="Seattle")
Observation: {"temp": 52, "condition": "rain", "precipitation": 80%}
Thought: Raining with 80% precipitation. Recommend umbrella.

Plan-and-Execute

For complex multi-step tasks:

Planner: Create high-level plan
Executor: Execute each step
Replanner: Adjust based on results

Use when order matters and partial failures need recovery.

Reflexion

Self-improvement through reflection:

Attempt task
Evaluate outcome
Generate reflection on failures
Store reflection in memory
Retry with reflection context

Multi-Agent Patterns

Supervisor

Central coordinator delegates to specialists:

       Supervisor
      /    |    \
Research Writer Reviewer

Hierarchical Teams

Nested supervisors for complex organizations:

      Top Supervisor
       /         \
Research Lead  Writing Lead
   /    \         /    \
Web   Paper   Draft   Edit

Debate/Adversarial

Multiple agents argue to reduce hallucination:

Agent A (Pro) <--argue--> Agent B (Con)
              \    |    /
               Judge

Tool Design

{
  "name": "search_database",
  "description": "Search products. Use for availability/pricing queries.",
  "parameters": {
    "query": { "type": "string", "description": "Search terms" },
    "max_results": { "type": "integer", "default": 10 }
  }
}

Principles:

Clear names (verb + noun)
Rich descriptions with when/what
Sensible defaults
Structured error returns

Tool Selection by Scale

Tools	Strategy
< 10	Direct selection
10-50	Categorize first
50+	Embed and retrieve

Memory Architecture

Working Memory    → Current context (in prompt)
Short-Term Memory → Session state (key-value)
Long-Term Memory  → Facts, history (vector DB + graph)

Memory Types

Type	Storage	Use Case
Episodic	Vector DB	Past conversations
Semantic	Graph DB	Facts, relationships
Procedural	Code/prompts	How to do tasks
Working	Prompt	Current task

Memory Management

Summarization: Compress old conversations
Forgetting: Score by recency × importance × access
Consolidation: Merge similar memories

Error Recovery Ladder

Retry: Same action with backoff
Rephrase: Different query, same goal
Alternative: Different tool, same goal
Partial: Return partial results
Escalate: Ask human
Abort: Cannot complete, explain why

Loop Detection

def detect_loop(history, window=5, threshold=0.8):
    recent = history[-window:]
    previous = history[-window*2:-window]
    return similarity(recent, previous) > threshold

Recovery: reflection prompt, force tool change, replan, escalate.

Human-in-the-Loop

Require approval for high-risk actions:

Financial transactions
Data deletion
External communications
Permission changes
Irreversible operations

Production Considerations

Observability

Log: LLM calls, tool calls, state transitions, errors, recovery attempts.

Cost Control

Strategy	Implementation
Token budgets	Max tokens per task
Step limits	Max N actions
Tiered models	GPT-4 plan, 3.5 execute
Caching	Cache tool/LLM results
Early termination	Stop when good enough

Safety Guardrails

Input: Injection detection, PII filtering, rate limiting
Action: Parameter sanitization, permission checks
Output: Policy compliance, hallucination detection

Framework Comparison

Framework	Best For
LangChain	Rapid prototyping
LangGraph	Complex multi-agent
AutoGen	Research, code gen
CrewAI	Business workflows
Semantic Kernel	Microsoft stack

Anti-Patterns

Over-autonomous: No approval checkpoints
Unbounded loops: No termination conditions
Tool explosion: Too many tools confuse agent
Memory bloat: No pruning strategy
Monolithic: One agent does everything

Checklist

Clear agent persona and capabilities
Minimal, well-described tool set
Appropriate memory architecture
Human-in-the-loop for high-risk
Observability (logging, tracing)
Safety guardrails
Adversarial input testing
Cost control and scaling plan

When to Use

✅ Good: Open-ended research, multi-step workflows, tool orchestration ❌ Poor: Simple Q&A (use RAG), deterministic flows (use code), no human oversight

name: ai-agent-design description: Design autonomous AI agents that reason, plan, and execute tasks tier: standard applyTo: '/agent,/ai-agent,/orchestrat,/multi-agent'