self-evolving-memory

name: self-evolving-memory description: > Self-evolving memory architecture for AI agents that learn across interactions without training. Use when building agents that need to: (1) accumulate experience across cases, (2) correct recurrent reasoning mistakes via reflection, (3) adapt tool-use behavior dynamically. Applicable to medical diagnosis, code debugging, customer support, research analysis, and any domain where agents solve repeated similar tasks. Trigger: self-evolving memory, inter-case learning, agent memory, episodic memory, adaptive heuristics, tool reliability tracking, agent reflection.

Self-Evolving Memory Architecture

Pattern from Evo-MedAgent (arXiv:2604.14475) — a test-time memory module that equips any frozen LLM agent with inter-case learning capacity, requiring no training.

Core Architecture

Three complementary memory stores:

1. Retrospective Clinical Episodes (RCE)

Store problem-solving experiences from past cases as structured episodes
Each episode: {problem_description, solution_path, outcome, key_insights}
At test time: retrieve top-k similar episodes for the current case
Similarity: semantic embedding matching on problem description

2. Adaptive Procedural Heuristics (APH)

Priority-tagged diagnostic/decision rules that evolve via reflection
Format: {rule, priority, confidence, last_updated, source_case}
After each case: reflect on what worked/didn't, update heuristics
Rules are sorted by priority; higher confidence = higher priority

3. Tool Reliability Controller (TRC)

Track per-tool trustworthiness scores over time
Format: {tool_name, success_rate, avg_quality, last_used}
Dynamically adjust tool selection based on reliability scores
Decay old scores to adapt to tool updates/changes

Workflow

Step 1: Case Processing

Input: new problem/case
1. Query RCE for similar past episodes (embedding similarity)
2. Query APH for relevant heuristics (keyword/category match)
3. Query TRC for tool reliability scores
4. Combine retrieved context into augmented prompt
5. Execute agent reasoning with augmented context

Step 2: Post-Case Reflection

After case completion:
1. Evaluate outcome (correct/incorrect/partial)
2. If incorrect: analyze failure mode, create new heuristic
3. If correct: reinforce existing heuristics, store episode in RCE
4. Update TRC scores for tools used
5. Prune low-confidence heuristics periodically

Step 3: Memory Maintenance

Periodically (every N cases):
1. Cluster similar RCE episodes, merge duplicates
2. Recalculate APH priorities based on aggregate outcomes
3. Decay TRC scores for unused tools
4. Remove heuristics with confidence < threshold

Implementation Notes

No training required: works on top of any frozen model
Per-case overhead: bounded by one retrieval pass + one reflection call
Scalability: use vector database for RCE retrieval at scale
Safety: reflection should validate new heuristics before adding

Performance Gains (from paper)

Base Model	Without Memory	With Memory	Improvement
GPT-5-mini	0.68 MCQ acc	0.79 MCQ acc	+11 points
Gemini-3 Flash	0.76 MCQ acc	0.87 MCQ acc	+11 points

Self-Evolving Memory Architecture

Core Architecture

1. Retrospective Clinical Episodes (RCE)

2. Adaptive Procedural Heuristics (APH)

3. Tool Reliability Controller (TRC)

Workflow

Step 1: Case Processing

Step 2: Post-Case Reflection

Step 3: Memory Maintenance

Implementation Notes

Performance Gains (from paper)

Related Papers