name: "fademem-biologically-inspired-forgetting-agent" description: > Implement biologically-inspired forgetting mechanisms for LLM agent memory systems. Build dual-layer memory hierarchies with adaptive exponential decay, semantic relevance scoring, and LLM-guided conflict resolution to keep agent context lean and high-quality. Use when: "add forgetting to my agent memory", "implement memory decay for my chatbot", "build an agent memory system with selective retention", "reduce memory bloat in my AI agent", "implement FadeMem-style memory management", "add adaptive memory consolidation to my agent".
FadeMem: Biologically-Inspired Forgetting for Agent Memory
This skill enables Claude to design and implement agent memory systems that actively forget irrelevant information using biologically-inspired decay mechanisms. Based on the FadeMem architecture, you build dual-layer memory hierarchies (working memory + long-term memory) where each memory entry decays at rates governed by semantic relevance, access frequency, and temporal recency. Rather than storing everything or dropping everything at a context boundary, this approach continuously prunes low-value memories while consolidating important ones -- achieving ~45% storage reduction while improving multi-hop reasoning and retrieval quality.
When to Use
- When the user is building a conversational agent that needs to remember information across many sessions without unbounded memory growth
- When the user asks to add forgetting, decay, or memory management to an existing agent or chatbot system
- When the user wants to reduce token usage or storage costs for an agent that accumulates too much context
- When the user needs to resolve contradictions between old and new information in agent memory
- When the user is implementing a RAG system and wants smarter retention policies than simple FIFO or fixed-window truncation
- When the user asks for a memory architecture that prioritizes important memories and lets irrelevant ones fade
Key Technique
Dual-Layer Memory with Differential Decay. FadeMem divides agent memory into two tiers: a working memory that holds recent, high-activation entries (analogous to a conversation buffer), and a long-term memory that stores consolidated, durable knowledge. Each memory entry carries a retention score computed by an adaptive exponential decay function:
retention(m) = base_relevance(m) * exp(-lambda(m) * time_since_last_access(m))
The decay rate lambda is not fixed -- it adapts per-entry based on three modulators: (1) semantic relevance to the agent's current task or recent queries, which slows decay for on-topic memories; (2) access frequency, where frequently retrieved memories decay slower (a "use it or lose it" principle); and (3) temporal pattern, where memories from bursty, clustered access patterns are treated as more important than one-off mentions. Entries whose retention score drops below a threshold are either consolidated (merged with related entries via summarization) or permanently forgotten.
LLM-Guided Conflict Resolution and Fusion. When the system detects semantically overlapping or contradictory memories (e.g., a user changed their address, or two sessions give different preferences), it invokes the LLM to evaluate which memory is more current, more contextually grounded, or more consistent with the broader memory store. The winner is kept or a fused summary is generated; the loser decays faster. This prevents stale or contradictory information from polluting retrieval results. Memory fusion also compresses verbose multi-turn exchanges into concise factual entries, reducing storage without losing core information.
Step-by-Step Workflow
Define the memory entry schema. Each entry needs:
id,content(text),embedding(vector),created_at(timestamp),last_accessed_at(timestamp),access_count(integer),base_relevance(float 0-1),retention_score(float),layer(enum:working|long_term), andtags(list of topic strings).Implement the working memory buffer. Create a fixed-capacity buffer (e.g., last 20-50 entries) that holds recent interactions. New entries always land here first. When the buffer is full, trigger a consolidation cycle rather than simply dropping the oldest entry.
Implement the adaptive decay function. For each memory entry, compute:
import math, time def compute_retention(entry, current_time, query_embedding=None): time_delta = (current_time - entry["last_accessed_at"]) / 3600 # hours freq_boost = min(math.log1p(entry["access_count"]) / 5.0, 1.0) if query_embedding is not None: semantic_sim = cosine_similarity(query_embedding, entry["embedding"]) else: semantic_sim = entry["base_relevance"] lambda_decay = 0.1 * (1.0 - 0.4 * freq_boost - 0.3 * semantic_sim) retention = entry["base_relevance"] * math.exp(-lambda_decay * time_delta) return max(retention, 0.0)The key insight:
lambda_decayshrinks for frequently-accessed, semantically-relevant entries, so they decay much slower.Run periodic decay sweeps. On every N-th interaction (e.g., every 5 turns or on each new session), recompute
retention_scorefor all entries. Entries below aforget_threshold(e.g., 0.15) are candidates for removal. Entries betweenforget_thresholdand aconsolidate_threshold(e.g., 0.35) are candidates for fusion.Implement memory fusion via LLM summarization. Group candidate entries by semantic similarity (cluster embeddings with a cosine threshold of 0.75+). For each cluster, prompt the LLM:
Summarize the following related memory entries into a single concise factual statement. Preserve key facts, names, dates, and preferences. Drop conversational filler. Entries: {entries}Replace the cluster with the fused entry, inheriting the highest
base_relevanceand summedaccess_countfrom the group.Implement LLM-guided conflict resolution. When fusion detects contradictory entries (e.g., cosine similarity > 0.8 but semantic content diverges), prompt the LLM:
These two memory entries appear to conflict: A (created {date_a}): {content_a} B (created {date_b}): {content_b} Which is more likely to be current/correct? Return the resolved fact or indicate which to keep.Apply the resolution: keep the winner, accelerate decay on the loser (multiply its
lambdaby 3x), or replace both with a merged entry.Promote and demote between layers. After a decay sweep: promote working memory entries with
retention_score > 0.7andaccess_count >= 3to long-term memory. Demote long-term entries withretention_score < consolidate_thresholdback to working memory for re-evaluation or fusion. Delete any entry withretention_score < forget_threshold.Implement retrieval with decay-aware ranking. When the agent needs context, retrieve candidate memories by semantic similarity, then re-rank by multiplying similarity with
retention_score. This naturally down-ranks stale memories even if they are semantically close:def retrieve(query_embedding, memories, top_k=10): scored = [] now = time.time() for m in memories: sim = cosine_similarity(query_embedding, m["embedding"]) retention = compute_retention(m, now, query_embedding) score = sim * 0.6 + retention * 0.4 scored.append((m, score)) m["last_accessed_at"] = now # refresh on access m["access_count"] += 1 scored.sort(key=lambda x: x[1], reverse=True) return [m for m, s in scored[:top_k]]Wire into the agent loop. Insert memory operations at three points: (a) after each user turn, encode and store new entries in working memory; (b) before each LLM call, retrieve top-k memories and inject as context; (c) after every N turns, run the decay sweep + consolidation cycle.
Tune thresholds empirically. Start with
forget_threshold=0.15,consolidate_threshold=0.35,promotion_threshold=0.7, baselambda=0.1. Monitor memory store size and retrieval quality. If the agent forgets too aggressively, lowerlambdaor raise thresholds. If memory bloats, do the opposite.
Concrete Examples
Example 1: Multi-session customer support agent
User: "Build a memory system for my support chatbot that remembers customer preferences across sessions but doesn't bloat over time."
Approach:
- Define memory entries with the schema from Step 1, stored in a SQLite database with a vector column (via
sqlite-vecor similar). - On each customer message, extract key facts (name, product, issue, preferences) and store as individual memory entries with
base_relevancescored by the LLM (0.3 for small talk, 0.7 for product preferences, 0.9 for active issues). - On each new session start, run a decay sweep. A customer's old shipping address from 6 months ago with
access_count=1decays to ~0.08 and gets forgotten. Their product preference accessed 12 times stays at ~0.82 and remains. - When two sessions mention different email addresses, conflict resolution fires: the LLM determines the newer one is an update, keeps it, and accelerates decay on the old one.
Output structure:
# memory_store.py
class FadeMemStore:
def __init__(self, db_path, forget_threshold=0.15, consolidate_threshold=0.35):
self.db = sqlite3.connect(db_path)
self.forget_threshold = forget_threshold
self.consolidate_threshold = consolidate_threshold
self._init_tables()
def add(self, content, embedding, relevance=0.5):
"""Add new entry to working memory."""
def retrieve(self, query_embedding, top_k=10):
"""Semantic search with decay-aware re-ranking."""
def decay_sweep(self, current_query_embedding=None):
"""Recompute retention scores, consolidate or forget entries."""
def resolve_conflicts(self, entries):
"""LLM-guided resolution for contradictory memory pairs."""
def fuse(self, cluster):
"""Summarize a cluster of related entries into one."""
Example 2: Research assistant agent with long-running context
User: "My research agent accumulates thousands of paper summaries and notes. Help me add FadeMem-style decay so it keeps the most relevant ones."
Approach:
- Wrap the existing note store with decay metadata (
last_accessed_at,access_count,retention_score). - Set
base_relevanceusing the LLM to score each note's relevance to the agent's declared research topics (passed as a topic vector). - Every 50 interactions, run a decay sweep. Notes about papers the user cited 10 times stay strong. Notes about tangentially-browsed papers from weeks ago decay below threshold and get consolidated into a single "background context" summary per topic cluster.
- Retrieval uses the hybrid scoring (similarity * 0.6 + retention * 0.4), ensuring the agent surfaces actively-used references over stale ones.
Output: A wrapper module that monkey-patches the existing store:
# fademem_wrapper.py
class FadeMemWrapper:
def __init__(self, base_store, llm_client, embedding_fn):
self.store = base_store
self.llm = llm_client
self.embed = embedding_fn
def ingest(self, text, topic_relevance=None): ...
def query(self, question, top_k=15): ...
def maintenance_cycle(self): ... # decay + fuse + conflict resolve
Example 3: Adding forgetting to a LangChain or LlamaIndex agent
User: "I'm using LangChain's ConversationBufferMemory but it grows too large. Add FadeMem-style forgetting."
Approach:
- Subclass
ConversationBufferMemoryor wrap it with aFadeMemMemoryclass. - Override
save_contextto assign decay metadata to each new memory entry. - Override
load_memory_variablesto run a lightweight decay check (skip full sweep, just filter by precomputedretention_score), returning only entries aboveforget_threshold. - Add a
maintenance()method called every N turns that runs the full sweep with consolidation. - Contradictions between early and late conversation context get resolved via the chain's LLM.
from langchain.memory import ConversationBufferMemory
class FadeMemMemory(ConversationBufferMemory):
def __init__(self, forget_threshold=0.15, **kwargs):
super().__init__(**kwargs)
self._decay_metadata = {}
self._turn_count = 0
self.forget_threshold = forget_threshold
def save_context(self, inputs, outputs):
super().save_context(inputs, outputs)
# Attach decay metadata to newest entry
...
def load_memory_variables(self, inputs):
# Filter by retention_score before returning
...
Best Practices
- Do: Score
base_relevanceat ingestion time using the LLM or a classifier. A well-calibrated initial score is the single biggest lever on retention quality. Factual user preferences should score 0.7-0.9; small talk should score 0.1-0.3. - Do: Always refresh
last_accessed_atand incrementaccess_counton retrieval hits. This is the "use it or lose it" signal that keeps important memories alive. - Do: Batch decay sweeps rather than running on every turn. Every 5-10 turns for chatbots, every session boundary for multi-session agents. Sweeps are O(n) over the memory store.
- Do: Log what gets forgotten. Maintain a lightweight "forgotten entries" audit log (just IDs and timestamps) so you can debug recall failures during development.
- Avoid: Setting
forget_thresholdtoo aggressively at first. Start conservative (0.10-0.15) and tighten once you confirm the agent isn't losing important information. - Avoid: Running conflict resolution on every pair of similar memories. Only trigger it when fusion detects contradiction (high embedding similarity but divergent factual content). LLM calls for resolution are expensive.
- Avoid: Using uniform decay rates. The entire point of FadeMem is differential decay -- frequently accessed, semantically relevant memories must decay slower than idle ones.
Error Handling
- Premature forgetting of critical facts: If the agent forgets something it shouldn't, check whether
base_relevancewas scored too low at ingestion, or whether thelambdabase rate is too high. Add apinnedflag for critical memories (e.g., user's name, core preferences) that exempts them from decay. - Memory fusion produces lossy summaries: When the LLM's fused summary drops key details, improve the fusion prompt to explicitly enumerate what must be preserved. Alternatively, keep the original entries alongside the summary until the originals decay naturally.
- Conflict resolution picks the wrong entry: Add a recency bias to the conflict resolution prompt -- in most agent scenarios, newer information should win unless the user explicitly corrects back.
- Embedding quality is poor: Decay-aware retrieval depends on good embeddings. If semantic similarity scores are unreliable, fall back to keyword overlap as a secondary signal in the retrieval ranking.
- Memory store grows despite decay: Check that decay sweeps are actually executing. A common bug is forgetting to call
maintenance_cycle()in the agent loop. Add a turn counter that triggers sweeps automatically.
Limitations
- Not suited for perfect-recall requirements. If the application must never lose any information (legal, medical records), do not use active forgetting. Use archival storage with a FadeMem-style retrieval ranking instead, without actual deletion.
- LLM cost overhead. Conflict resolution and fusion require LLM calls. For agents with thousands of memories, batch processing and rate-limiting are necessary to keep costs manageable.
- Cold start problem. A new agent has no access history, so decay modulators default to base rates. Early memories may be forgotten too quickly before the system has enough signal. Mitigate by setting higher initial
base_relevancefor the first N entries. - Decay parameters are domain-specific. The thresholds that work for a customer support bot won't work for a research assistant. Expect to tune
lambda,forget_threshold, andconsolidate_thresholdper use case. - Does not replace vector database indexing. FadeMem is a retention policy layer, not a replacement for efficient similarity search. It sits on top of your existing embedding store or vector DB.
Reference
Paper: FadeMem: Biologically-Inspired Forgetting for Efficient Agent Memory -- Wei et al., 2026. Focus on Section 3 (the adaptive decay formulation and dual-layer architecture) and Section 5 (ablation studies showing the contribution of each decay modulator).