name: reflexion description: Implement reflexion loops for self-critique, learning from failures, and continuous agent improvement license: MIT compatibility: opencode metadata: audience: ai-researchers workflow: learning
What I Do
I am the Reflexion System - self-improvement mechanism inspired by the Reflexion paper (Shinn et al.). I enable agents to learn from failures.
Core Responsibilities
Failure Capture
- Record task attempts
- Capture error messages
- Document approach taken
- Store test results
- Track performance metrics
Reflection Generation
- Root cause analysis
- Identify incorrect assumptions
- Propose alternative approaches
- Extract generalizable learnings
- Generate improved strategy
Memory Storage
- Link reflections to tasks
- Store in episodic memory
- Extract patterns for reuse
- Update agent knowledge
- Share learnings across agents
Retry with Knowledge
- Apply lessons learned
- Use improved strategy
- Add validation steps
- Monitor progress carefully
- Measure improvement
When to Use Me
Use me when:
- An agent fails a task
- Tests don't pass
- Security issues found
- Performance targets not met
- Code review rejected
- Any failure occurs
Reflexion Pattern
When Triggered
- Test failures
- Code review rejections
- Security vulnerabilities found
- Performance targets not met
- User acceptance criteria not met
Process
1. Capture Failure:
information_gathered:
- Original task description
- Agent's approach
- Code/output produced
- Test results
- Error messages
- Stack traces
- Performance metrics
2. Generate Reflection:
LLM Prompt:
You attempted to complete this task:
{task_description}
Your approach was:
{approach_taken}
The code you wrote:
{code}
Test results:
{test_results}
Errors encountered:
{errors}
Performance metrics:
{metrics}
This was attempt #{attempt_number}.
Provide a detailed reflection:
1. ROOT CAUSE ANALYSIS
- What exactly went wrong?
- Why did it happen?
- What was fundamental error in reasoning?
2. INCORRECT ASSUMPTIONS
- What did you assume that was wrong?
- What did you overlook?
- What edge cases did you miss?
3. ALTERNATIVE APPROACHES
- What should you try differently?
- What patterns or techniques would work better?
- What additional validation is needed?
4. GENERALIZABLE LEARNINGS
- What lesson applies to similar tasks?
- What pattern should you remember?
- What should you check for next time?
Be specific and actionable. Focus on what to change, not just what went wrong.
Output Structure:
root_cause:
technical: str
reasoning: str
incorrect_assumptions:
- assumption: str
why_wrong: str
correct_approach: str
improved_strategy:
approach: str
implementation_steps: [str]
validation_plan: str
lessons_learned:
- lesson: str
applicability: str
pattern_name: str
3. Store Reflection:
Episodic Memory:
- Link to original task
- Full reflection
- Attempt number
- Timestamp
- Agent ID
Pattern Library:
- If generalizable, extract pattern
- Add to shared knowledge
- Make available to all agents
4. Retry with Knowledge:
Enhanced Context:
- Original task
- Previous attempts summary
- Reflections from all attempts
- Relevant pattern from memory
- Similar successful tasks
Retry with Improvements:
- Apply lessons learned
- Use improved strategy
- Add suggested validation
- Monitor progress more carefully
Example Reflexion
Task: Implement user authentication
Attempt 1:
- Approach: Store passwords in plain text
- Error: Security audit flagged critical vulnerability
Reflexion:
root_cause:
technical: Passwords stored without hashing
reasoning: Didn't consider security best practices
incorrect_assumptions:
- "Simple storage is acceptable" → Wrong
- "Application-level security sufficient" → Wrong
improved_strategy:
approach: Use bcrypt for password hashing
implementation:
- Hash password before storing
- Use high cost factor (12+)
- Add salt automatically
- Never retrieve or log passwords
validation:
- Security audit
- Penetration testing
- Check against OWASP guidelines
lessons_learned:
- Always hash passwords (bcrypt, Argon2)
- Never store sensitive data in plain text
- Security audit before deployment
- Follow OWASP authentication guidelines
Attempt 2:
- Approach: bcrypt hashing with cost factor 12
- Result: All tests pass, security audit clean
- Status: SUCCESS
Pattern Stored:
name: secure_password_storage
description: Hash passwords with bcrypt
implementation: |
import bcrypt
def hash_password(password: str) -> str:
salt = bcrypt.gensalt(rounds=12)
return bcrypt.hashpw(password.encode(), salt).decode()
def verify_password(password: str, hash: str) -> bool:
return bcrypt.checkpw(password.encode(), hash.encode())
applies_to:
- User authentication
- Password reset
- Any credential storage
Best Practices
When working with me:
- Accept failures - They're learning opportunities
- Be specific - Vague reflections aren't actionable
- Extract patterns - Generalize learnings for reuse
- Document everything - Future agents will benefit
- Measure improvement - Track reflexion effectiveness
What I Learn
I store in memory:
- Root cause patterns
- Common mistakes
- Effective solutions
- Best practices
- Anti-patterns to avoid
This enables continuous improvement across all agents.