name: simplemem description: > Build persistent LLM agent memory with SimpleMem - semantic lossless compression with 30x token reduction. MCP server (cloud/self-hosted), Python API, cross-session memory. Use when: adding long-term memory to agents, cross-session context recall, semantic dialogue search, memory consolidation.
SimpleMem
Efficient lifelong memory for LLM agents via semantic lossless compression.
SimpleMem converts unstructured dialogue into compact, atomic memory units with coreference resolution and temporal anchoring. It achieves 43.24% F1 on the LoCoMo benchmark with 30x fewer tokens than full-context methods.
Integration Paths
Choose the path that fits your use case:
| Path | Best For | Setup |
|---|---|---|
| MCP Server (cloud) | Quickest start, multi-platform | Config only |
| MCP Server (self-hosted) | Privacy, custom deployment | Docker or Python |
Python API (pip install simplemem) |
Programmatic integration | pip/uv |
| SimpleMem-Cross | Cross-session agent memory | Python + orchestrator |
MCP Server Setup (Recommended)
Cloud Service
The fastest path. Uses the hosted service at mcp.simplemem.cloud.
- Visit
https://mcp.simplemem.cloud - Enter your OpenRouter API key to get an auth token
- Add to your MCP client config:
Claude Desktop / Claude Code (~/.claude.json):
{
"mcpServers": {
"simplemem": {
"url": "https://mcp.simplemem.cloud/mcp",
"headers": {
"Authorization": "Bearer YOUR_TOKEN"
}
}
}
}
Cursor (.cursor/mcp.json):
{
"mcpServers": {
"simplemem": {
"url": "https://mcp.simplemem.cloud/mcp",
"headers": {
"Authorization": "Bearer YOUR_TOKEN"
}
}
}
}
Self-Hosted (Docker)
For privacy-sensitive deployments or custom infrastructure:
git clone https://github.com/aiming-lab/SimpleMem.git
cd SimpleMem/MCP
# Configure environment
export JWT_SECRET_KEY="your-secure-random-secret"
export ENCRYPTION_KEY="your-32-byte-encryption-key!!"
# Run with Docker
docker compose up -d
The server exposes:
- Web UI:
http://localhost:8000/ - REST API:
http://localhost:8000/api/ - MCP endpoint:
http://localhost:8000/mcp
See references/mcp-setup.md for full self-hosting guide.
MCP Tools Available
| Tool | Purpose |
|---|---|
memory_add |
Store a dialogue (auto-compresses to atomic facts) |
memory_add_batch |
Batch store multiple dialogues |
memory_query |
Semantic Q&A over stored memories |
memory_retrieve |
Raw memory entry retrieval |
memory_delete |
Delete entries by entry_id or ref_id |
memory_stats |
Storage statistics |
memory_clear |
Delete all memories (irreversible) |
Python API
Install via pip or uv:
pip install simplemem
# or
uv add simplemem
Basic Usage
from simplemem import SimpleMemSystem
# Initialize (uses config.py for API settings)
system = SimpleMemSystem(clear_db=True)
# Stage 1: Add dialogues (semantic structured compression)
system.add_dialogue("Alice", "Let's meet at Starbucks tomorrow at 2pm", "2025-11-15T14:30:00")
system.add_dialogue("Bob", "I'll bring the market analysis report", "2025-11-15T14:31:00")
# Finalize atomic encoding
system.finalize()
# Stage 3: Query with intent-aware retrieval
answer = system.ask("When and where will Alice and Bob meet?")
# "16 November 2025 at 2:00 PM at Starbucks"
Parallel Processing
For large dialogue datasets:
system = SimpleMemSystem(
clear_db=True,
enable_parallel_processing=True,
max_parallel_workers=8,
enable_parallel_retrieval=True,
max_retrieval_workers=4
)
Memory Deletion
Delete entries by ID:
# Delete by entry_id
system.vector_store.delete_by_id(entry_id="uuid-here")
# Delete by ref_id (application-level reference)
system.vector_store.delete_by_ref_id(ref_id="my-ref")
Configuration
SimpleMem supports any OpenAI-compatible API provider:
# config.py
OPENAI_API_KEY = "your-api-key" # OpenAI, OpenRouter, or compatible
OPENAI_BASE_URL = None # Set for non-OpenAI providers
# Models
LLM_MODEL = "gpt-4.1-mini" # Or any OpenAI-compatible model
EMBEDDING_MODEL = "Qwen/Qwen3-Embedding-0.6B" # State-of-the-art retrieval
# Via OpenRouter (multi-provider gateway)
OPENROUTER_API_KEY = "sk-or-..."
LLM_MODEL = "openai/gpt-4.1-mini" # OpenRouter model path
EMBEDDING_MODEL = "qwen/qwen3-embedding-8b"
EMBEDDING_DIMENSION = 4096
Provider options:
- OpenAI direct - Set
OPENAI_API_KEY - OpenRouter - Set
OPENROUTER_API_KEY, prefix model names with provider - LiteLLM - Multi-provider abstraction, configure via
LITELLM_*settings - Azure OpenAI - Set
OPENAI_BASE_URLto your Azure endpoint - Qwen / local models - Set
OPENAI_BASE_URLto your endpoint
See references/api-reference.md for full configuration.
Cross-Session Memory (SimpleMem-Cross)
SimpleMem-Cross enables persistent memory across conversations. Agents recall context, decisions, and learnings from previous sessions automatically.
Performance: 64% improvement over Claude-Mem on LoCoMo benchmark (score 48 vs 29.3).
Quick Example
from cross.orchestrator import create_orchestrator
async def main():
orch = create_orchestrator(project="my-project")
# Start session - previous context injected automatically
result = await orch.start_session(
content_session_id="session-001",
user_prompt="Continue building the REST API",
)
print(result["context"]) # Relevant context from past sessions
# Record events during the session
await orch.record_message(result["memory_session_id"], "User asked about JWT")
await orch.record_tool_use(
result["memory_session_id"],
tool_name="read_file",
tool_input="auth/jwt.py",
tool_output="class JWTHandler: ...",
)
# Finalize - extracts observations, generates summary, stores memories
report = await orch.stop_session(result["memory_session_id"])
print(f"Stored {report.entries_stored} memory entries")
await orch.end_session(result["memory_session_id"])
orch.close()
Key Features
| Feature | Description |
|---|---|
| Session lifecycle | start -> record -> stop -> end with full event tracking |
| Automatic context injection | Token-budgeted context from previous sessions at start |
| Event collection | Messages, tool uses, file changes with 3-tier redaction |
| Observation extraction | Heuristic extraction of decisions, discoveries, learnings |
| Provenance tracking | Every memory links back to source evidence |
| Consolidation | Automatic decay, merge, and prune of old memories |
See references/cross-session.md for full API and architecture.
Three-Stage Pipeline
SimpleMem's architecture is based on semantic lossless compression:
Stage 1: Semantic Structured Compression
Raw dialogue is converted to atomic, self-contained memory units:
- Coreference resolution: "He" becomes "Bob"
- Temporal anchoring: "tomorrow" becomes "2025-11-16T14:00:00"
- Density gating: Filters low-information content
Input: "He'll meet Bob tomorrow at 2pm"
Output: "Alice will meet Bob at Starbucks on 2025-11-16T14:00:00"
Stage 2: Online Semantic Synthesis
Related fragments are consolidated during writes:
- Eliminates redundant information
- Merges related facts into unified representations
- Maintains compact memory topology
Stage 3: Intent-Aware Retrieval Planning
Queries trigger parallel multi-view retrieval:
- Semantic layer: Dense vector similarity (1024-d embeddings)
- Lexical layer: BM25 keyword matching
- Symbolic layer: Metadata filtering (persons, locations, timestamps, entities)
- Dynamic retrieval depth based on query complexity
See references/architecture.md for detailed architecture.
Data Model
Each memory entry contains:
| Field | Type | Description |
|---|---|---|
entry_id |
UUID | Unique identifier |
ref_id |
String | Application-level reference (for deletion) |
lossless_restatement |
String | Self-contained atomic fact |
keywords |
List[str] | Core keywords for BM25 search |
timestamp |
ISO 8601 | When the event occurred |
location |
String | Where (natural language) |
persons |
List[str] | People mentioned |
entities |
List[str] | Companies, products, organizations |
topic |
String | Topic phrase |
agents |
List[str] | Agent sources |
source |
String | Origin identifier |
Troubleshooting
Embedding dimension mismatch:
RuntimeError: lance error: Invalid: ListType can only be casted to FixedSizeListType
The EMBEDDING_DIMENSION in config doesn't match the model's output. Update the value and clear the database: rm -rf data/lancedb/*
API key not detected:
- Verify key is set in
config.py(notconfig.py.example) - For OpenRouter keys, ensure they start with
sk-or- - Restart Python after updating the key
Model not found:
- Use full provider path for OpenRouter:
openai/gpt-4.1-mininotgpt-4.1-mini - Check available models at
openrouter.ai/models