reflexion - SKILL.md Agent Skill

name: reflexion description: Implement reflexion loops for self-critique, learning from failures, and continuous agent improvement license: MIT compatibility: opencode metadata: audience: ai-researchers workflow: learning

What I Do

I am the Reflexion System - self-improvement mechanism inspired by the Reflexion paper (Shinn et al.). I enable agents to learn from failures.

Core Responsibilities

Failure Capture
- Record task attempts
- Capture error messages
- Document approach taken
- Store test results
- Track performance metrics
Reflection Generation
- Root cause analysis
- Identify incorrect assumptions
- Propose alternative approaches
- Extract generalizable learnings
- Generate improved strategy
Memory Storage
- Link reflections to tasks
- Store in episodic memory
- Extract patterns for reuse
- Update agent knowledge
- Share learnings across agents
Retry with Knowledge
- Apply lessons learned
- Use improved strategy
- Add validation steps
- Monitor progress carefully
- Measure improvement

When to Use Me

Use me when:

An agent fails a task
Tests don't pass
Security issues found
Performance targets not met
Code review rejected
Any failure occurs

Reflexion Pattern

When Triggered

Test failures
Code review rejections
Security vulnerabilities found
Performance targets not met
User acceptance criteria not met

Process

1. Capture Failure:

information_gathered:
  - Original task description
  - Agent's approach
  - Code/output produced
  - Test results
  - Error messages
  - Stack traces
  - Performance metrics

2. Generate Reflection:

LLM Prompt:

You attempted to complete this task:
{task_description}

Your approach was:
{approach_taken}

The code you wrote:
{code}

Test results:
{test_results}

Errors encountered:
{errors}

Performance metrics:
{metrics}

This was attempt #{attempt_number}.

Provide a detailed reflection:

1. ROOT CAUSE ANALYSIS
- What exactly went wrong?
- Why did it happen?
- What was fundamental error in reasoning?

2. INCORRECT ASSUMPTIONS
- What did you assume that was wrong?
- What did you overlook?
- What edge cases did you miss?

3. ALTERNATIVE APPROACHES
- What should you try differently?
- What patterns or techniques would work better?
- What additional validation is needed?

4. GENERALIZABLE LEARNINGS
- What lesson applies to similar tasks?
- What pattern should you remember?
- What should you check for next time?

Be specific and actionable. Focus on what to change, not just what went wrong.

Output Structure:

root_cause:
  technical: str
  reasoning: str

incorrect_assumptions:
  - assumption: str
    why_wrong: str
    correct_approach: str

improved_strategy:
  approach: str
  implementation_steps: [str]
  validation_plan: str

lessons_learned:
  - lesson: str
    applicability: str
    pattern_name: str

3. Store Reflection:

Episodic Memory:

Link to original task
Full reflection
Attempt number
Timestamp
Agent ID

Pattern Library:

If generalizable, extract pattern
Add to shared knowledge
Make available to all agents

4. Retry with Knowledge:

Enhanced Context:

Original task
Previous attempts summary
Reflections from all attempts
Relevant pattern from memory
Similar successful tasks

Retry with Improvements:

Apply lessons learned
Use improved strategy
Add suggested validation
Monitor progress more carefully

Example Reflexion

Task: Implement user authentication

Attempt 1:

Approach: Store passwords in plain text
Error: Security audit flagged critical vulnerability

Reflexion:

root_cause:
  technical: Passwords stored without hashing
  reasoning: Didn't consider security best practices

incorrect_assumptions:
  - "Simple storage is acceptable" → Wrong
  - "Application-level security sufficient" → Wrong

improved_strategy:
  approach: Use bcrypt for password hashing
  implementation:
    - Hash password before storing
    - Use high cost factor (12+)
    - Add salt automatically
    - Never retrieve or log passwords
  validation:
    - Security audit
    - Penetration testing
    - Check against OWASP guidelines

lessons_learned:
  - Always hash passwords (bcrypt, Argon2)
  - Never store sensitive data in plain text
  - Security audit before deployment
  - Follow OWASP authentication guidelines

Attempt 2:

Approach: bcrypt hashing with cost factor 12
Result: All tests pass, security audit clean
Status: SUCCESS

Pattern Stored:

name: secure_password_storage
description: Hash passwords with bcrypt
implementation: |
  import bcrypt
  
  def hash_password(password: str) -> str:
    salt = bcrypt.gensalt(rounds=12)
    return bcrypt.hashpw(password.encode(), salt).decode()
  
  def verify_password(password: str, hash: str) -> bool:
    return bcrypt.checkpw(password.encode(), hash.encode())
applies_to:
  - User authentication
  - Password reset
  - Any credential storage

Best Practices

When working with me:

Accept failures - They're learning opportunities
Be specific - Vague reflections aren't actionable
Extract patterns - Generalize learnings for reuse
Document everything - Future agents will benefit
Measure improvement - Track reflexion effectiveness

What I Learn

I store in memory:

Root cause patterns
Common mistakes
Effective solutions
Best practices
Anti-patterns to avoid

This enables continuous improvement across all agents.