cass

star 392

Mine past agent sessions for working prompts, decisions, and patterns. Use when "what did I ask?", "find that prompt", session archaeology, or agent history.

boshu2 By boshu2 schedule Updated 6/7/2026

name: cass description: Mine past agent sessions for working prompts, decisions, and patterns. Use when "what did I ask?", "find that prompt", session archaeology, or agent history. practices:

  • pragmatic-programmer skill_api_version: 1 user-invocable: false hexagonal_role: supporting metadata: tier: execution external_dependencies:
    • "cass binary (>=0.3.6 recommended; some commands require HEAD \u2014 see Version Pinning)"
    • jq (required for parsing --json output)
    • GNU coreutils 'timeout' (recommended; cass index can hang under contention)
    • ssh + rsync (optional; only for cross-machine cass sources workflows)
    • fastembed model bundle ~90MB (optional; only for --mode semantic / hybrid; install via cass models install)

cass Session Search

Table of Contents

Core Insight: Your repeated prompts are your best prompts. If you typed it 10+ times, it works. Mine your history.

The Goldmine Principle

Your conversation history contains:

  • Refined prompts — Every rephrase that worked better was captured
  • Working rituals — Prompts repeated 10+ times ARE your methodology
  • Scope decisions — "When did we decide NOT to do X?"
  • Recovery moments — What you searched for after context loss = what mattered

The insight: Mining your past beats inventing new approaches.


THE EXACT PROMPT — Discovery Workflow

1. Bootstrap: Check health, refresh index, get project overview
   cass status --json && cass index --json
   cass search "*" --workspace /data/projects/PROJECT --aggregate agent,date --limit 1 --json

2. Find prompts: Search for keywords, filter to user prompts (lines 1-3)
   cass search "KEYWORD" --workspace /data/projects/PROJECT --json --fields minimal --limit 50 \
     | jq '[.hits[] | select(.line_number <= 3)]'

3. Follow hits: View the actual content
   cass view /path/from/source_path.jsonl -n LINE -C 20

4. Expand context: See the full conversation flow
   cass expand /path/from/source_path.jsonl --line LINE --context 3

5. Discover related: Find the whole work cluster
   cass context /path/from/source_path.jsonl --json

Why This Workflow Works

  • Aggregations first — Know the terrain before diving in
  • --fields minimal — 5x smaller output, preserves context window
  • line_number <= 3 — User prompts live at the top of sessions
  • Context clustering — Work happens in clusters; one good hit → many related sessions

Version Pinning Caveat

cass evolves quickly. The skill describes HEAD behavior (latest source in /dp/coding_agent_session_search). The released v0.3.6 binary lacks several features added since:

  • cass sources agents {list,exclude,include} — added 2026-04-20 (commit 82d8d70e)
  • Tail-end writer-race tolerance for full rebuilds — fixed 2026-04-22 (commit e06342f2, bead zz8ni)
  • Lexical generation manifests for federated installs (commits 2b7b86a1, 683ccd03, cf76fe15)
  • Rebuild producer stall telemetry (commit 73a86604)

When a flag/subcommand returns "unrecognized" or behaves differently than documented, run cass --version and check git log -- src/lib.rs for the relevant commit. Each affected section calls out which commit/version it depends on.

Probe what your installed binary actually supports:

cass capabilities --json | jq '{version: .crate_version, features, connectors}'
cass introspect --json   | jq '.commands[].name'

Two-Step Bootstrap (Replaces "ALWAYS first")

Three states matter — never conflate them.

State What it means What to do
cass health exit 0 Sub-50ms preflight passed Search immediately
cass health exit 1 + index.stale=true Index is usable but old Search now, refresh in background with a wall-clock cap: ( timeout 600 cass index --json &>/tmp/cass-bg.log </dev/null & ) (NEVER bare & — cass index can hang)
cass status returns database.exists=false OR documents=0 Truly broken/uninitialized Run cass doctor --fix --json, then cass index --full --json

The trap: Treating a stale index as broken triggers an unneeded full rebuild (8–25s cost) when an incremental refresh (1–3s) or even a stale-but-correct query would have worked.

# Robust two-step bootstrap that never blocks the user.
# IMPORTANT: every cass index call gets a wall-clock cap. cass index has been
# observed to hang indefinitely under contention — without `timeout`, the
# bootstrap itself becomes the symptom.
cass status --json | jq '{healthy, fresh: .index.fresh, stale: .index.stale, db: .database.exists, sem: .semantic.available}'

# Refresh policy: stale → bg refresh (capped); never block search
if [ "$(cass status --json | jq -r '.index.stale')" = "true" ]; then
  ( timeout 600 cass index --json >"/tmp/cass-index.$$.log" 2>&1 </dev/null & ) 2>/dev/null
fi
# Search even with stale index — results are still useful
cass search "KEYWORD" --workspace /path --json --fields minimal --limit 10

For the production-quality version of this logic (cap-on-every-call, broken-state escalation, exit-code semantics for hooks), use scripts/recover.sh — it implements the full decision tree with timeouts and per-PID logs.

cass health returning exit 1 on stale is a deliberate preflight signal for cron/CI. In an interactive agent loop, prefer cass status --json and decide.


Stuck-Index & Recovery Decision Tree

Real-world bugs we've hit (all observed in mined sessions). Walk top-down — first match wins.

Symptom Likely cause Autonomous fix (no user prompt needed)
cass index --json streams phase: indexing, current: 0, total: N for >2 min OPEN issue #196 in v0.3.6: incremental hangs after prior --force-rebuild Kill the run, then cass index --full --force-rebuild --json (25s typical)
cass status shows index.rebuilding=true and pid is stale (>1h) Crashed indexer left lock cass doctor --fix --json (removes stale .index.lock automatically)
cass search returns 0 hits but the file plainly contains the term Term lives in tool stdout/stderr (skipped at index time) Fall back to rg -n "TERM" /path/to/session.jsonl
cass search --workspace /X returns 0; same query without --workspace works Workspace string mismatch cass search "KEYWORD" --aggregate workspace --limit 1 --json to discover the canonical path, then re-run
vtable constructor failed: fts_messages (older bug, fixed in 0.3.0+) DB↔FTS schema drift after upgrade cass doctor --fix --json rebuilds the Tantivy side from SQLite
--limit 0 panic Earlier cass versions panicked on limit=0 Always pass --limit 1 (or --limit 5) for aggregations
Massive core.NNNNN files in cass project dir Past indexer crash recorded a coredump They're SAFE to leave; they don't affect search. Only delete with explicit user permission.
cass models install fails with WSAENOTCONN on Windows (closed #193) Network blip during huggingface download Retry once; if it persists, use --mirror <URL> to point at a different HF mirror, or --from-file <DIR> if you have the model cached locally. Then cass models verify.
cass index says "Index rebuild is already in progress" but nothing visible Concurrent agent triggered a rebuild Don't fight it — `cass status --json
cass index --full --force-rebuild --json runs for 200s+ then exits with "index failed: updating last_indexed_at after index run ... database is busy" Concurrent-rebuild race in persist_final_index_run_metadata. Fixed at HEAD (commit e06342f2, bead zz8ni); affects v0.3.6 and earlier. Verify the index data is good: `cass search "X" --limit 1 --json
cass status shows unhealthy indefinitely after a successful-looking rebuild Same as above — timestamp marker never landed Same fix: confirm searches return hits, then a single quiet cass index --json writes the marker. Upgrade past v0.3.6 for the proper fix

Never paper over by skipping cass and falling back to manual find/grep. The autonomous fixes above resolve every real failure without user intervention. Asking the user to "rebuild your index" is the worst outcome — they have agents waiting.


Quick Reference

# Health + refresh (ALWAYS first)
cass status --json && cass index --json

# Project overview: who did what, when?
cass search "*" --workspace /path --aggregate agent,date --limit 1 --json

# Find keyword, minimal output
cass search "KEYWORD" --workspace /path --json --fields minimal --limit 50

# Follow a hit
cass view /path.jsonl -n LINE -C 20        # Line-oriented
cass expand /path.jsonl --line LINE --context 3  # Message-oriented

# Find related sessions
cass context /path.jsonl --json

# Export for parsing
cass export /path.jsonl --format json --include-tools -o /tmp/out.json

# Inspect or change persistent agent-harness exclusions
cass sources agents list --json
cass sources agents exclude openclaw
cass sources agents exclude openclaw --keep-indexed-data
cass sources agents include openclaw

When to Use What

You Want Use Why
Project overview --aggregate agent,date --limit 1 Counts only, no content
Find prompts --fields minimal + jq select(.line_number <= 3) User prompts are lines 1-3
Ritual detection Count matches: >10 = ritual Repeated = working
Full conversation cass expand --context 3 Message boundaries preserved
Raw JSON parsing cass export --include-tools -o file.json Never pipe exports
Content not found rg "string" /path.jsonl cass skips tool outputs
Noisy harness flooding index cass sources agents exclude <agent> Persistently disable future indexing

Critical Rules

Rule Why Consequence
--limit 1 minimum --limit 0 panics Use 1 for aggregations
--fields minimal Token efficiency 5x smaller output
Export to file Piping causes broken pipe panic -o /tmp/out.json always
Exact workspace paths Case-sensitive matching Use --aggregate workspace to discover
--include-tools Tool calls hidden by default Required for full export

Agent Harness Exclusion

When a user tells you one agent harness is producing garbage, loops, or too much disk usage, handle that directly in cass instead of telling them it cannot be excluded.

# See current state
cass sources agents list --json

# Persistently stop indexing this harness in future scans/syncs/watch mode
cass sources agents exclude openclaw

# Keep already indexed data but block future indexing
cass sources agents exclude openclaw --keep-indexed-data

# Re-enable later
cass sources agents include openclaw

What exclude actually does

  • Writes the preference to sources.toml, so the setting survives future runs
  • Prevents future indexing even if the source files still exist on disk
  • By default, purges already archived local data for that harness and rebuilds lexical search so the exclusion also reclaims space

When to use it

  • A harness is spamming looped or low-value output
  • A user wants cass to remember "ignore this source going forward"
  • You need a reversible, agent-friendly way to reduce archive bloat without manually deleting source files

Search Modes

Mode When Example
lexical (default) Exact strings, filenames "AGENTS.md", "--workspace"
semantic Conceptual, unknown wording "scope reduction discussions"
hybrid Broad exploration "architecture decisions"

Default to lexical. Only use semantic when you don't know exact wording.

Enabling Semantic / Hybrid (one-time)

cass models status --json    # state: not_installed | installed | partial
cass models install          # downloads ~90MB MiniLM bundle from HuggingFace
cass index --semantic --build-hnsw --json   # builds vector + HNSW
cass search "QUERY" --mode hybrid --json    # then queries fall back to lexical if semantic missing

If the model is not_installed, --mode hybrid and --mode semantic silently fall back to lexical — no panic, no degraded experience. See SEMANTIC_AND_HYBRID.md.


Cross-Machine Search (Multi-Workstation Corpus)

When the user has agents running on css, csd, ts1, ts2, etc., the cass corpus on each machine is disjoint. Three ways to reach across:

# Option A: One-shot remote query (no setup, slow per call)
ssh css 'cass search "KEYWORD" --json --fields minimal --limit 20' | jq '.hits'

# Option B: Configured sources (preferred — caches the remote sessions locally)
cass sources setup                                      # interactive wizard, auto-discovers from ~/.ssh/config
cass sources add ssh://user@css --name css --preset linux-defaults
cass sources sync --source css --json                   # rsyncs new sessions, then re-indexes
cass search "KEYWORD" --json                            # results now span all configured sources
cass sources list --json                                # see what's wired up

# Option C: Parallel fan-out (when speed matters more than dedup)
for h in css csd ts1 ts2; do
  ssh "$h" 'cass search "KEYWORD" --json --fields minimal --limit 10' > "/tmp/cass-$h.json" &
done
wait
jq -s '[.[] | .hits[]] | unique_by(.source_path + (.line_number|tostring))' /tmp/cass-*.json

cass sources doctor diagnoses connectivity. Configured-source results carry origin_host in their hit metadata — preserve it when reporting back to the user. Full reference: REMOTE_SOURCES.md.


Anti-Patterns (Don't Do These)

Anti-pattern Why it's wrong Do instead
Asking the user "should I rebuild the index?" They have agents waiting; rebuild is safe and idempotent Just run cass doctor --fix --json (preserves source data)
Running cass index --full whenever status says unhealthy A 25s rebuild for a 30-min stale index is wasteful Check index.stale separately from database.exists; prefer incremental
Running bare cass to "see what's there" Launches blocking TUI in the agent's session Always --json or --robot; never bare
Piping cass export into head/jq Broken-pipe panic on large sessions cass export ... -o /tmp/x.json first, then operate on the file
Treating subagent files as the same as parent sessions Subagents are separate conversation logs with their own line-2 prompt Filter by select(.source_path | contains("subagent"))
Using --limit 0 for "no limit" Earlier cass panics; modern cass caps to RAM ceiling but rarely what you want Use a real limit (--limit 50) or pagination via --cursor
Searching with --workspace /X and trusting 0 hits Workspace strings are case-sensitive and trailing-slash-sensitive When 0 hits but you expected some, re-run with --aggregate workspace --limit 1 to discover the canonical key
Skipping --fields minimal on wide scans Default full returns ~3KB per hit × 100 hits = 300KB context burn Always pass --fields minimal for wide passes; upgrade to summary/full for the few you keep
Reading session file with cat to extract a prompt Loads the full conversation into context cass view PATH -n LINE -C 5 (window) or cass expand PATH --line LINE --context 3 (message-aware)
Re-indexing on every cass search Wasteful; index is shared across processes Index is shared. Only refresh when cass status says stale or recommended_action says so

Resume a Past Session in Its Native Harness

cass resume resolves a session path into the exact command its native CLI uses to continue the conversation — Claude Code, Codex, Gemini, OpenCode, pi_agent.

# Find a relevant past session
cass search "KEYWORD" --json --fields minimal --limit 5 \
  | jq -r '.hits[0].source_path' > /tmp/sess.path

# Print the resume command without executing
cass resume "$(cat /tmp/sess.path)" --shell

# Or replace the current process with the resumed agent
cass resume "$(cat /tmp/sess.path)" --exec

Pitfall: Subagent files (subagents/agent-*.jsonl) are not resumable by design — they're orchestrated by a parent. You'll get session_id_not_found with a hint to pass --agent claude. Resolve to the parent session via cass context <path> --json first. See RESUME.md.


The Heuristics

Signal Meaning Action
line_number 1-3 User prompts Filter: select(.line_number <= 3)
/subagents/ line 2 THE extraction prompt Copy-paste ready
total_matches > 10 Ritual pattern Document it, reuse it
0 results + content exists Workspace path mismatch Use --aggregate workspace

jq Essentials

# User prompts only
| jq '[.hits[] | select(.line_number <= 3)]'

# Source paths for follow-up
| jq '.hits[].source_path' -r

# Aggregation buckets
| jq '.aggregations.agent.buckets'

# Count matches
| jq '.total_matches'

# Find repeated prompts (ritual detection)
| jq '[.hits[] | select(.line_number <= 3) | .title[0:80]] | group_by(.) | map({prompt: .[0], count: length}) | sort_by(-.count) | .[0:20]'

Hidden Power: Capabilities the Old Skill Missed

Command What it gives you When
cass health <50ms exit-code-only preflight Cron / hook gating
cass index --watch --json Filesystem-watcher keeps index live; one cycle = --watch-once /path Long-running orchestrator hosts
cass index --idempotency-key K --json Cached identical-key responses for 24h Retried CI runs
cass index --semantic --build-hnsw O(log n) approximate vector search After cass models install
cass doctor --fix --json Auto-rebuilds index from DB; backs up corrupt DB to .corrupt.<ts> Any time status.healthy=false
cass resume PATH --shell Cross-harness resume command emitter Continuing a past Codex/Claude/Gemini session
cass sources setup Interactive ssh-config-aware multi-machine wizard First time wiring a fleet
cass sources sync --source NAME --json rsync remote sessions, then re-index Periodic fleet refresh
cass sources doctor --json Connectivity + path probe Before relying on cross-machine results
cass sources mappings ... Rewrite source paths to local equivalents After moving a workspace
cass sources agents {list,exclude,include} Persistent harness exclusion (writes disabled_agents in ~/.config/cass/sources.toml) When openclaw / a noisy connector floods the index
cass models install / status / verify / remove Manage the MiniLM bundle (~90MB) Enabling semantic search
cass analytics tokens | tools | models Per-day/per-tool/per-model usage stats from indexed sessions Cost reports, regression checks
cass analytics rebuild --json Backfill rollup tables when coverage_pct is low After bulk import or sources sync
cass analytics validate --json Detect drift between raw rows and rollups Sanity check before reporting numbers
cass import chatgpt PATH Bring conversations.json exports from ChatGPT web into the corpus Migrating off ChatGPT.com
cass export-html PATH --password ... Encrypted, self-contained HTML conversation viewer Sharing one session with a teammate
cass pages encrypt ARCHIVE --with-recovery Encrypted searchable archive for static hosting Publishing a redacted corpus
cass introspect --json Full schema dump of every command + response Programmatic discovery
cass capabilities --json Feature/connector/limits for self-configuration Version-aware fallbacks
cass robot-docs guide | commands | examples | schemas | contracts Built-in machine-targeted docs Inside an agent loop, when the skill isn't enough
cass --robot-format jsonl | compact | toon | sessions Streaming / token-optimized / shell-pipeable output Large result sets, chained searches
cass --trace-file FILE Span timing JSONL for debugging slow runs Triage perf issues

If a flag listed here errors as "unrecognized" the installed cass binary is older than the source. Check cass --version against git log -- src/lib.rs upstream — most missing commands were added in v0.3.x.


Token & Cost Analytics (Bonus Use Case)

# Daily token usage across your whole agent fleet
cass analytics tokens --days 30 --group-by day --json | jq '.data.buckets[].api_tokens'

# Per-agent tool-use intensity (rows are keyed by agent slug, not tool name)
cass analytics tools --days 30 --json | jq '.data.rows[0:10]'

# Top models by API tokens
cass analytics models --json | jq '.data.by_api_tokens.rows[0:10]'

# After import / sync, refresh rollups
cass analytics rebuild --json
cass analytics validate --json   # confirm no drift between raw and rollup

cass analytics status --json shows coverage_pct: anything below ~90% means you should analytics rebuild to backfill. See ANALYTICS.md.


Recovery Cheat Sheet (No-Permission Moves)

You are pre-authorized to take any of these actions to keep the corpus searchable. None destroy source session data; they only rebuild derived index data.

cass doctor --json --verbose         # Diagnose without changing anything
cass doctor --fix --json             # Apply safe rebuilds; backs up bad DB to .corrupt.<ts>
cass doctor --fix --force-rebuild --json   # Same, but rebuild even when healthy
cass index --full --force-rebuild --json   # Workaround for OPEN issue #196 (incremental hang)
cass sources doctor --json           # Probe remote sources
cass sources sync --source NAME --json     # Re-fetch and re-index a single source
cass models install                  # Restore missing semantic model
cass models verify                   # Validate model file checksums

What you must NOT do without explicit permission: delete core.NNNNN files, delete .beads/, git reset --hard, edit anything under the user's .config/cass/sources.toml by hand. The CLI commands above already do everything safely.

Full disaster recovery for encrypted Pages archives: RECOVERY.md.


Reference Index

Need Reference
Full command reference COMMANDS.md
Workflow recipes RECIPES.md
jq patterns PATTERNS.md
Pitfalls & fixes PITFALLS.md
Session file formats SESSION_FORMATS.md
Remote sources, multi-machine REMOTE_SOURCES.md
Semantic / hybrid / models SEMANTIC_AND_HYBRID.md
Token / tool / model analytics ANALYTICS.md
Cross-harness session resume RESUME.md
Doctor + autonomous recovery RECOVERY.md
Mined gold-standard prompts PROMPTS.md
Anti-patterns (long form) ANTI_PATTERNS.md
Health vs status vs index nuance OBSERVABILITY.md
Pages encrypted archive + HTML export PAGES_AND_EXPORT.md
Harness exclusion (disabled_agents) HARNESS_EXCLUSION.md
Schema introspection contracts INTROSPECTION.md

Quick Search (Grep Recipes for References)

When the right reference isn't obvious from titles, grep the references directory directly — cheaper than loading whole files into context.

REFS=.claude/skills/cass/references

# Find any anti-pattern by symptom keyword
grep -ni "limit 0\|broken pipe\|workspace path\|stale" "$REFS"/ANTI_PATTERNS.md "$REFS"/PITFALLS.md

# Find the recipe / jq for a task
grep -niE "ritual|user prompt|aggregate|subagent|cluster|timeline" "$REFS"/RECIPES.md "$REFS"/PATTERNS.md

# Find a recovery recipe (issue numbers, error strings, fix names)
grep -niE "doctor|--force-rebuild|issue #196|last_indexed_at|database is busy|core\." "$REFS"/RECOVERY.md "$REFS"/OBSERVABILITY.md

# Find a flag, command, or response field
grep -niE "robot-format|--mode|--cursor|_meta|fallback_mode|hits_clamped" "$REFS"/COMMANDS.md "$REFS"/INTROSPECTION.md

# Find a real "what did I ask" prompt template you've used before
grep -ni "use cass\|find that\|session history\|what worked" "$REFS"/PROMPTS.md

# Find harness/connector slugs and exclusion behavior
grep -niE "openclaw|disabled_agents|sources agents" "$REFS"/HARNESS_EXCLUSION.md "$REFS"/REMOTE_SOURCES.md

# Find resume / cross-harness behavior
grep -niE "resume|--shell|--exec|subagent.*not resumable" "$REFS"/RESUME.md

These grep across the whole references directory in <50ms and surface a line+filename that you can then open with the Read tool — far cheaper than reading the whole reference.


Scripts

Scripts live under scripts/. They contribute zero context tokens — they execute, never load. None of them mutate state without explicit confirmation.

Script Usage
./scripts/quick_analysis.sh /path One-command project overview (status → aggregate agent/date → top prompts)
./scripts/prompt_miner.py --workspace /path Find repeated prompts (ritual detection)
./scripts/validate.sh Validate cass install + skill structure
./scripts/recover.sh Autonomous recovery decision tree (READY → STALE_BUT_USABLE → BROKEN); safe by default. Use as a PreToolUse hook before cass search. Wraps every cass index call in timeout to dodge issue #196 hangs
./scripts/multi_machine_search.sh "QUERY" [host…] Parallel fan-out across the fleet (defaults to css/csd/ts1/ts2); merges hits, dedups by source_path:line, sorts by score. Per-host timeout cap; safe re: shell-special query chars

Validation

# Quick health check
cass status --json | jq '.index.fresh'

# Should return: true

If false, run: cass index --json

Install via CLI
npx skills add https://github.com/boshu2/agentops --skill cass
Repository Details
star Stars 392
call_split Forks 40
navigation Branch main
article Path SKILL.md
More from Creator