research-reflection - SKILL.md Agent Skill

name: research-reflection description: "Reflect on completed research tasks to improve future performance. Use when: a research task has just been completed and the agent should evaluate its own process, store lessons learned, or retrieve past reflections before starting new work. NOT for: active research execution or data analysis." metadata: { "openclaw": { "emoji": "🪞" } }

Research Reflection (Reflexion Pattern)

Self-evaluate completed research tasks, store lessons learned, and retrieve relevant past experience to improve future research quality. Based on the Reflexion framework (verbal reinforcement learning without weight updates).

When to Use

After completing any substantial research task (literature review, data analysis, systematic review)
Before starting a new task in a previously explored domain
When a research approach failed and you need to diagnose why
"What did I learn from the last CRISPR review?"
"Before starting, check if we've done similar work before"

When NOT to Use

During active research execution (focus on the task)
For simple factual lookups
For non-research tasks

Reflexion Cycle

The cycle runs after each substantial research task:

Execute Task → Evaluate Output → Generate Reflection → Store in Memory → Retrieve Next Time

Phase 1: Self-Evaluation

After completing a research task, evaluate against these dimensions:

## Post-Task Evaluation

### Completeness (1-5)
- Did I search enough databases? Which ones did I miss?
- Did I follow all 6 Research Depth Phases from SCIENCE.md?
- Are there obvious gaps in my coverage?

### Accuracy (1-5)
- Did every citation come from a tool result?
- Did I cross-verify claims against primary databases?
- Were statistical results correctly reported?

### Efficiency (1-5)
- Did I use the right tools for each subtask?
- Were there redundant searches I could have avoided?
- Did I hit rate limits that slowed me down?

### Depth (1-5)
- Did I go beyond surface-level results?
- Did I trace citation chains?
- Did I identify contradictions and open questions?

### Actionability (1-5)
- Is the output directly useful to the user?
- Are next steps clearly identified?
- Is the methodology reproducible?

Phase 2: Generate Reflection

Produce a structured reflection:

## Reflection: [Task Title]
Date: [ISO date]
Domain: [e.g., molecular biology, econometrics]
Task type: [literature review / data analysis / systematic review / ...]

### What went well
- [Specific successful strategies]

### What went poorly
- [Specific failures or inefficiencies]

### Key lessons
- [Actionable insights for future tasks]

### Tools that worked best
- [Tool name]: [Why it was effective for this task]

### Tools that underperformed
- [Tool name]: [Why, and what to use instead]

### Domain-specific insights
- [API endpoints that were useful]
- [Search strategies that worked]
- [Data sources that were reliable/unreliable]

### Score: [total/25]

Phase 3: Store Reflection

Store reflections using the science-memory extension:

# Store reflection with domain tags
curl -X POST http://localhost:18789/memory/store \
  -H "Content-Type: application/json" \
  -d '{
    "key": "reflection-[task-id]",
    "content": "[reflection markdown]",
    "tags": ["reflection", "domain-tag", "task-type"],
    "metadata": {"score": 18, "date": "2026-03-11"}
  }'

Phase 4: Retrieve Before New Task

Before starting a new research task, retrieve relevant past reflections:

# Search for reflections in the same domain
curl http://localhost:18789/memory/search?q=domain-tag+task-type&tags=reflection

Use retrieved reflections to:

Avoid repeating past mistakes
Reuse successful search strategies
Skip known-unreliable data sources
Apply domain-specific insights

Reflection Templates by Task Type

Literature Review Reflection

Databases searched and result counts
Most effective search terms
Key papers found vs. expected coverage
Citation chains that revealed important papers
Gaps: papers known to exist but not found

Data Analysis Reflection

Data sources used and data quality issues
Analysis methods applied and their appropriateness
Unexpected findings that required investigation
Reproducibility: can the analysis be rerun from the report?

Systematic Review Reflection

PRISMA compliance level
Screening efficiency (false positive/negative rate)
Data extraction completeness
Meta-analysis method appropriateness

Continuous Improvement

Pattern Recognition

Over multiple reflections, identify recurring patterns:

Which databases consistently perform best for your domain?
Which search strategies yield the highest-quality results?
What's your typical bottleneck (rate limits, data quality, synthesis)?

Strategy Evolution

Based on accumulated reflections:

Update personal search templates
Maintain a "preferred tools by domain" list
Build domain-specific query libraries
Track API reliability over time

Auto-Trigger: Post-Task Reflection

This skill should be triggered automatically after any substantial research task. The agent should:

After task completion: Generate a reflection without being asked
Save to disk: Store as ~/clawd/reflections/{domain}/{date}-{topic}.md
Before new tasks: Check ~/clawd/reflections/ for relevant past reflections in the same domain

File-Based Reflection Storage (No External Service Required)

# Create reflection directory structure
mkdir -p ~/clawd/reflections/{biology,medicine,economics,physics,chemistry,cs,social-science,general}

# Save a reflection after task completion
cat > ~/clawd/reflections/biology/2026-03-11-crispr-review.md << 'EOF'
## Reflection: CRISPR Base Editing Literature Review
Date: 2026-03-11
Domain: biology
Task type: literature review
Score: 20/25

### What went well
- Semantic Scholar returned high-quality results for "CRISPR base editing"
- Citation chain from Komor et al. 2016 revealed all major developments
- Cross-verification with UniProt confirmed protein structures

### What went poorly
- PubMed API was slow (>5s per query), should have used Europe PMC
- Missed Anzalone 2019 prime editing paper initially

### Key lessons
- For CRISPR topics, always include David Liu's lab papers as seed
- UniProt cross-references to PDB save a separate structure search
- Europe PMC is faster than PubMed for programmatic access

### Tools that worked best
- Semantic Scholar: excellent for CRISPR literature, good TLDR summaries
- Jina Reader: successfully extracted full text from Nature papers

### Tools that underperformed
- PubMed: slow API, consider Europe PMC as primary for biomedical
EOF

Retrieving Past Reflections Before New Tasks

# Search for relevant reflections before starting a new task
# Simple grep-based search (no vector DB needed)
grep -rl "CRISPR\|base editing\|gene editing" ~/clawd/reflections/ 2>/dev/null | head -5

# Read the most recent reflection in a domain
ls -t ~/clawd/reflections/biology/*.md 2>/dev/null | head -3 | xargs head -50

Accumulated Reflection Analysis

After 5+ reflections in a domain, generate a meta-analysis:

## Domain Meta-Analysis: Biology (N=8 reflections)

### Best-performing tools
1. Semantic Scholar (avg score 4.2/5 across tasks)
2. UniProt (4.0/5 for protein-related queries)
3. Jina Reader (3.8/5 for full-text extraction)

### Common failure patterns
1. PubMed API timeouts (3/8 tasks affected)
2. Missing recent preprints not yet indexed
3. Rate limits on Semantic Scholar for >50 queries

### Recommended strategy updates
1. Use Europe PMC instead of PubMed for programmatic access
2. Add bioRxiv/medRxiv search for recent preprints
3. Batch Semantic Scholar queries to avoid rate limits

Best Practices

Reflect immediately after task completion while details are fresh
Be honest about failures -- the goal is improvement, not a perfect score
Keep reflections concise (200-400 words)
Tag reflections with both domain and task type for easy retrieval
Review accumulated reflections quarterly to identify meta-patterns
Share useful reflections across the skill-evolution system
Always save reflections to disk -- memory is ephemeral, files persist
Keep only the last 3 reflections per domain in active context (window management)
After 5+ reflections in a domain, generate a meta-analysis to compress accumulated knowledge