memory-system

star 57

Operate and extend the joelclaw agent memory system — observation pipeline, write gates, vector store, retrieval, reflection, and nightly maintenance. Use when working on memory functions, debugging recall, tuning observation quality, or evolving the memory architecture.

joelhooks By joelhooks schedule Updated 6/6/2026

name: memory-system displayName: Memory System description: "Operate and extend the joelclaw agent memory system — observation pipeline, write gates, vector store, retrieval, reflection, and nightly maintenance. Use when working on memory functions, debugging recall, tuning observation quality, or evolving the memory architecture." version: 0.1.0 author: joel tags: - memory - system-bus - typesense - recall - reflection

Memory System

Operational reference for the joelclaw memory pipeline (session signal → durable recall → curated MEMORY.md).

When to use

Use this skill when working on memory capture, write gates, Typesense-backed recall, reflection/promote flows, or nightly maintenance.

Canonical flow

sessions → observe → write-gate → store → decay/rank → retrieve → inject
    ↑                                                              ↓
    └──── nightly maintenance (dedup + stale pruning + stats) ─────┘
                       ↓
               observe → reflect → propose → triage → MEMORY.md

Guiding filter: “will this fact still be true and useful next month?”

1) Write gate states (allow / hold / discard)

State Persist Default retrieval Rules
allow yes yes Durable, reusable facts (constraints, architecture truths, operational fixes, explicit user rules).
hold yes no (unless explicitly requested) Ambiguous/contextual signal worth keeping but not auto-injecting.
discard no no Noise, tool traces, instruction artifacts, ephemeral chatter.

Rules to enforce:

  • Very short/low-information observations (e.g. <12 chars) → discard.
  • Instruction-edit artifacts/raw tool traces (<toolCall>, command dumps, “replace X with Y”) → discard.
  • Facts with ADR IDs, concrete file paths, or explicit runnable commands bias toward allow.
  • If gate annotation parsing fails, mark write_gate_fallback=true and track drift (high fallback rate is a health smell).

2) Vector store schema (Typesense only)

Backend: memory_observations in Typesense. Qdrant is retired (per slog, 2026-02-28).

Minimum operational fields:

  • Identity/source: id, session_id, source, timestamp, updated_at
  • Content: observation, observation_type, embedding
  • Write gate: write_verdict, write_confidence, write_reason, write_gate_version, write_gate_fallback
  • Taxonomy: category_id, category_confidence, category_source, taxonomy_version
  • Ranking/lifecycle: merged_count, recall_count, retrieval_priority, stale, stale_tagged_at, last_used_at, superseded_by, supersedes

3) Category taxonomy (7 categories)

Use taxonomy v1 IDs:

  1. jc:preferences
  2. jc:rules-conventions
  3. jc:system-architecture
  4. jc:operations
  5. jc:memory-system
  6. jc:projects
  7. jc:people-relationships

4) Retrieval pipeline

  1. Query rewrite (fast model, hard timeout, fail-open to original query).
  2. Hybrid search (keyword + vector over memory_observations).
  3. Time decay ranking: final_score = raw_score × exp(-0.01 × days_since_created)
    • stale memories get extra downweight.
  4. Cap results before injection (protect context budget).
  5. Budget profile:
    • lean: 2–3 hits, no rewrite, low-latency checks
    • balanced: 5–7 hits, default interactive mode
    • deep: 10–15 hits, complex debugging/research
    • auto: choose profile from query complexity/context

5) Reflection cycle

observe → reflect → propose → triage (3 tiers) → promote to MEMORY.md

Triage tiers:

  • Tier 1 auto-action: auto-promote / auto-reject / auto-merge using deterministic rules.
  • Tier 2 LLM batch review: batch adjudication for undecided proposals.
  • Tier 3 human review: only ambiguous/risky proposals; then promote/edit/reject.

Goal: keep MEMORY.md small, durable, and high-signal.

6) Nightly maintenance

Run idempotent maintenance to keep recall quality high:

  1. Dedup sweep (semantic similarity merge; maintain supersession chain).
  2. Stale pruning (mark old never-recalled observations stale; prune very old stale records conservatively).
  3. Stats emission (observation count, merges, stale volume, category distribution).

7) ADR map (read before changing architecture)

  • ADR-0021 — memory system foundation.
  • ADR-0068 — auto-triage pipeline.
  • ADR-0077 — next phase (reflection + maintenance).
  • ADR-0082 — Typesense as memory backend (Qdrant replaced).
  • ADR-0094 → ADR-0100 — proposed evolution (write gate governance, taxonomy/budgets, forward triggers, graph/dual-search roadmap).

8) Writing observations (mandatory at session end)

Every session that produces a durable pattern, operational fix, or architectural insight MUST write observations before closing.

joelclaw send "memory/observation.submitted" -d '{
  "observation": "<what was learned — concrete, reusable, future-tense useful>",
  "category": "jc:operations",
  "source": "pi-session",
  "tags": ["stripe", "payout"]
}'

Use one send call per distinct observation. Batch is fine — fire them in a loop.

Category cheatsheet

Category Use for
jc:operations How things work, API quirks, CLI patterns, operational fixes
jc:rules-conventions Conventions, SOPs, team/project rules
jc:system-architecture Topology, wiring, how components connect
jc:projects Per-project facts, payout rates, product catalogs
jc:preferences Joel's explicit preferences
jc:people-relationships People, contacts, roles
jc:memory-system Memory system itself

What makes a good observation

  • Concrete: "Stripe Report Run requires explicit payment_metadata[product] column in columns param or it returns blank" — not "Stripe has metadata"
  • Reusable: will this still be true next month?
  • Actionable: an agent reading this cold should know what to do differently
  • Not a transcript: no "the user asked me to", no raw tool output, no "I discovered that"

What to skip

  • Instruction artifacts, tool traces, ephemeral command outputs
  • Facts already in skills or ADRs (skills are the durable home; observations are for recall/search)
  • Anything under 12 chars — the write gate discards it

9) Operations commands (slog + recall)

# Record memory-system changes
slog write --action configure --tool memory-system --detail "<what changed>" --reason "<why>"

# Inspect recent operational history
slog tail --count 50

# Default recall
joelclaw recall "<query>" --budget balanced --limit 7

# Deep recall for hard debugging
joelclaw recall "<query>" --budget deep --limit 10

# Inspect memory-system category specifically
joelclaw recall "<query>" --category jc:memory-system --limit 10

# Include held memories when needed
joelclaw recall "<query>" --include-hold --raw

9a) joelclaw inngest memory-health interpretation

memory-health uses recent/current-window observations when any were written in the lookback. Do not fail the system on a tiny current-window category-confidence sample: fewer than 25 known-confidence observations is reported as evidence but treated as too small to gate. Stale RUNNING memory runs that started before the current worker are stale SDK ghosts, not live backlog.

9b) Runs archive derived-index recovery (ADR-0243)

Raw Pi/Claude/Codex Run capture and searchable Typesense indexes are separate layers:

  • source of truth: ~/.joelclaw/runs-dev/<user>/<yyyy-mm>/<run-id>.jsonl plus .metadata.json
  • derived indexes: runs_dev, run_chunks_dev
  • Inngest function: memory/run.captured

When Typesense is stale but raw blobs still arrive:

  1. Verify raw capture and latest Typesense timestamp separately.
  2. Check Inngest queue health. joelclaw runs --count 5 --hours 1 --compact should return quickly.
  3. If Inngest GQL hangs and logs show stale queue leases, log first, restart inngest-0, wait Ready, then force worker sync:
kubectl -n joelclaw delete pod inngest-0 --grace-period=30
kubectl -n joelclaw wait --for=condition=Ready pod/inngest-0 --timeout=180s
curl -fsS -X PUT http://127.0.0.1:3111/api/inngest
  1. Backfill missing raw blobs into Typesense with the controlled script, not by flooding Inngest with thousands of replay events:
TYPESENSE_API_KEY=$(secrets lease typesense_api_key) \
  bun scripts/backfill-run-typesense.ts \
  --since <iso-or-ms> \
  --machine dark-wizard \
  --runtime pi \
  --limit 0 \
  --sleep-ms 250

9) Canonical code paths

  • Observe: packages/system-bus/src/inngest/functions/observe.ts
  • Write gate: packages/system-bus/src/memory/write-gate.ts
  • Taxonomy: packages/system-bus/src/memory/taxonomy-v1.ts
  • Recall adapter: packages/cli/src/capabilities/adapters/typesense-recall.ts
  • Reflect/propose: packages/system-bus/src/inngest/functions/reflect.ts, promote.ts
  • Nightly maintenance: packages/system-bus/src/inngest/functions/memory/nightly-maintenance.ts

10) Non-negotiables

  • Memory stores patterns, not transcript noise.
  • No silent failure paths: emit telemetry on every transition.
  • Keep retrieval bounded; never flood context windows.
  • Keep MEMORY.md curated; do not auto-append raw observations.
Install via CLI
npx skills add https://github.com/joelhooks/joelclaw --skill memory-system
Repository Details
star Stars 57
call_split Forks 3
navigation Branch main
article Path SKILL.md
More from Creator