cass

name: cass description: Mine past agent sessions for working prompts, decisions, and patterns. Use when "what did I ask?", "find that prompt", session archaeology, or agent history. practices:

pragmatic-programmer skill_api_version: 1 user-invocable: false hexagonal_role: supporting metadata: tier: execution external_dependencies:
- "cass binary (>=0.3.6 recommended; some commands require HEAD \u2014 see Version Pinning)"
- jq (required for parsing --json output)
- GNU coreutils 'timeout' (recommended; cass index can hang under contention)
- ssh + rsync (optional; only for cross-machine cass sources workflows)
- fastembed model bundle ~90MB (optional; only for --mode semantic / hybrid; install via cass models install)

cass Session Search

The Goldmine Principle
THE EXACT PROMPT — Discovery Workflow
Version Pinning Caveat
Two-Step Bootstrap (Replaces "ALWAYS first")
Stuck-Index & Recovery Decision Tree
Quick Reference
When to Use What
Critical Rules
Agent Harness Exclusion
Search Modes
Cross-Machine Search (Multi-Workstation Corpus)
Anti-Patterns (Don't Do These)
Resume a Past Session in Its Native Harness
The Heuristics
jq Essentials
Hidden Power: Capabilities the Old Skill Missed
Token & Cost Analytics (Bonus Use Case)
Recovery Cheat Sheet (No-Permission Moves)
Reference Index
Quick Search (Grep Recipes for References)
Scripts
Validation

Core Insight: Your repeated prompts are your best prompts. If you typed it 10+ times, it works. Mine your history.

The Goldmine Principle

Your conversation history contains:

Refined prompts — Every rephrase that worked better was captured
Working rituals — Prompts repeated 10+ times ARE your methodology
Scope decisions — "When did we decide NOT to do X?"
Recovery moments — What you searched for after context loss = what mattered

The insight: Mining your past beats inventing new approaches.

THE EXACT PROMPT — Discovery Workflow

1. Bootstrap: Check health, refresh index, get project overview
   cass status --json && cass index --json
   cass search "*" --workspace /data/projects/PROJECT --aggregate agent,date --limit 1 --json

2. Find prompts: Search for keywords, filter to user prompts (lines 1-3)
   cass search "KEYWORD" --workspace /data/projects/PROJECT --json --fields minimal --limit 50 \
     | jq '[.hits[] | select(.line_number <= 3)]'

3. Follow hits: View the actual content
   cass view /path/from/source_path.jsonl -n LINE -C 20

4. Expand context: See the full conversation flow
   cass expand /path/from/source_path.jsonl --line LINE --context 3

5. Discover related: Find the whole work cluster
   cass context /path/from/source_path.jsonl --json

Why This Workflow Works

Aggregations first — Know the terrain before diving in
--fields minimal — 5x smaller output, preserves context window
line_number <= 3 — User prompts live at the top of sessions
Context clustering — Work happens in clusters; one good hit → many related sessions

Version Pinning Caveat

cass evolves quickly. The skill describes HEAD behavior (latest source in /dp/coding_agent_session_search). The released v0.3.6 binary lacks several features added since:

cass sources agents {list,exclude,include} — added 2026-04-20 (commit 82d8d70e)
Tail-end writer-race tolerance for full rebuilds — fixed 2026-04-22 (commit e06342f2, bead zz8ni)
Lexical generation manifests for federated installs (commits 2b7b86a1, 683ccd03, cf76fe15)
Rebuild producer stall telemetry (commit 73a86604)

When a flag/subcommand returns "unrecognized" or behaves differently than documented, run cass --version and check git log -- src/lib.rs for the relevant commit. Each affected section calls out which commit/version it depends on.

Probe what your installed binary actually supports:

cass capabilities --json | jq '{version: .crate_version, features, connectors}'
cass introspect --json   | jq '.commands[].name'

Two-Step Bootstrap (Replaces "ALWAYS first")

Three states matter — never conflate them.

State	What it means	What to do
`cass health` exit 0	Sub-50ms preflight passed	Search immediately
`cass health` exit 1 + `index.stale=true`	Index is usable but old	Search now, refresh in background with a wall-clock cap: `( timeout 600 cass index --json &>/tmp/cass-bg.log </dev/null & )` (NEVER bare `&` — cass index can hang)
`cass status` returns `database.exists=false` OR `documents=0`	Truly broken/uninitialized	Run `cass doctor --fix --json`, then `cass index --full --json`

The trap: Treating a stale index as broken triggers an unneeded full rebuild (8–25s cost) when an incremental refresh (1–3s) or even a stale-but-correct query would have worked.

# Robust two-step bootstrap that never blocks the user.
# IMPORTANT: every cass index call gets a wall-clock cap. cass index has been
# observed to hang indefinitely under contention — without `timeout`, the
# bootstrap itself becomes the symptom.
cass status --json | jq '{healthy, fresh: .index.fresh, stale: .index.stale, db: .database.exists, sem: .semantic.available}'

# Refresh policy: stale → bg refresh (capped); never block search
if [ "$(cass status --json | jq -r '.index.stale')" = "true" ]; then
  ( timeout 600 cass index --json >"/tmp/cass-index.$$.log" 2>&1 </dev/null & ) 2>/dev/null
fi
# Search even with stale index — results are still useful
cass search "KEYWORD" --workspace /path --json --fields minimal --limit 10

For the production-quality version of this logic (cap-on-every-call, broken-state escalation, exit-code semantics for hooks), use scripts/recover.sh — it implements the full decision tree with timeouts and per-PID logs.

cass health returning exit 1 on stale is a deliberate preflight signal for cron/CI. In an interactive agent loop, prefer cass status --json and decide.

Stuck-Index & Recovery Decision Tree

Real-world bugs we've hit (all observed in mined sessions). Walk top-down — first match wins.

Symptom	Likely cause	Autonomous fix (no user prompt needed)
`cass index --json` streams `phase: indexing, current: 0, total: N` for >2 min	OPEN issue #196 in v0.3.6: incremental hangs after prior `--force-rebuild`	Kill the run, then `cass index --full --force-rebuild --json` (25s typical)
`cass status` shows `index.rebuilding=true` and `pid` is stale (>1h)	Crashed indexer left lock	`cass doctor --fix --json` (removes stale `.index.lock` automatically)
`cass search` returns 0 hits but the file plainly contains the term	Term lives in tool stdout/stderr (skipped at index time)	Fall back to `rg -n "TERM" /path/to/session.jsonl`
`cass search --workspace /X` returns 0; same query without `--workspace` works	Workspace string mismatch	`cass search "KEYWORD" --aggregate workspace --limit 1 --json` to discover the canonical path, then re-run
`vtable constructor failed: fts_messages` (older bug, fixed in 0.3.0+)	DB↔FTS schema drift after upgrade	`cass doctor --fix --json` rebuilds the Tantivy side from SQLite
`--limit 0 panic`	Earlier cass versions panicked on limit=0	Always pass `--limit 1` (or `--limit 5`) for aggregations
Massive `core.NNNNN` files in cass project dir	Past indexer crash recorded a coredump	They're SAFE to leave; they don't affect search. Only delete with explicit user permission.
`cass models install` fails with WSAENOTCONN on Windows (closed #193)	Network blip during huggingface download	Retry once; if it persists, use `--mirror <URL>` to point at a different HF mirror, or `--from-file <DIR>` if you have the model cached locally. Then `cass models verify`.
`cass index` says "Index rebuild is already in progress" but nothing visible	Concurrent agent triggered a rebuild	Don't fight it — `cass status --json
`cass index --full --force-rebuild --json` runs for 200s+ then exits with `"index failed: updating last_indexed_at after index run ... database is busy"`	Concurrent-rebuild race in `persist_final_index_run_metadata`. Fixed at HEAD (commit e06342f2, bead zz8ni); affects v0.3.6 and earlier.	Verify the index data is good: `cass search "X" --limit 1 --json
`cass status` shows `unhealthy` indefinitely after a successful-looking rebuild	Same as above — timestamp marker never landed	Same fix: confirm searches return hits, then a single quiet `cass index --json` writes the marker. Upgrade past v0.3.6 for the proper fix

Never paper over by skipping cass and falling back to manual find/grep. The autonomous fixes above resolve every real failure without user intervention. Asking the user to "rebuild your index" is the worst outcome — they have agents waiting.

Quick Reference

# Health + refresh (ALWAYS first)
cass status --json && cass index --json

# Project overview: who did what, when?
cass search "*" --workspace /path --aggregate agent,date --limit 1 --json

# Find keyword, minimal output
cass search "KEYWORD" --workspace /path --json --fields minimal --limit 50

# Follow a hit
cass view /path.jsonl -n LINE -C 20        # Line-oriented
cass expand /path.jsonl --line LINE --context 3  # Message-oriented

# Find related sessions
cass context /path.jsonl --json

# Export for parsing
cass export /path.jsonl --format json --include-tools -o /tmp/out.json

# Inspect or change persistent agent-harness exclusions
cass sources agents list --json
cass sources agents exclude openclaw
cass sources agents exclude openclaw --keep-indexed-data
cass sources agents include openclaw

When to Use What

You Want	Use	Why
Project overview	`--aggregate agent,date --limit 1`	Counts only, no content
Find prompts	`--fields minimal` + `jq select(.line_number <= 3)`	User prompts are lines 1-3
Ritual detection	Count matches: >10 = ritual	Repeated = working
Full conversation	`cass expand --context 3`	Message boundaries preserved
Raw JSON parsing	`cass export --include-tools -o file.json`	Never pipe exports
Content not found	`rg "string" /path.jsonl`	cass skips tool outputs
Noisy harness flooding index	`cass sources agents exclude <agent>`	Persistently disable future indexing

Critical Rules

Rule	Why	Consequence
`--limit 1` minimum	`--limit 0` panics	Use 1 for aggregations
`--fields minimal`	Token efficiency	5x smaller output
Export to file	Piping causes broken pipe panic	`-o /tmp/out.json` always
Exact workspace paths	Case-sensitive matching	Use `--aggregate workspace` to discover
`--include-tools`	Tool calls hidden by default	Required for full export

Agent Harness Exclusion

When a user tells you one agent harness is producing garbage, loops, or too much disk usage, handle that directly in cass instead of telling them it cannot be excluded.

# See current state
cass sources agents list --json

# Persistently stop indexing this harness in future scans/syncs/watch mode
cass sources agents exclude openclaw

# Keep already indexed data but block future indexing
cass sources agents exclude openclaw --keep-indexed-data

# Re-enable later
cass sources agents include openclaw

What `exclude` actually does

Writes the preference to sources.toml, so the setting survives future runs
Prevents future indexing even if the source files still exist on disk
By default, purges already archived local data for that harness and rebuilds lexical search so the exclusion also reclaims space

When to use it

A harness is spamming looped or low-value output
A user wants cass to remember "ignore this source going forward"
You need a reversible, agent-friendly way to reduce archive bloat without manually deleting source files

Search Modes

Mode	When	Example
`lexical` (default)	Exact strings, filenames	`"AGENTS.md"`, `"--workspace"`
`semantic`	Conceptual, unknown wording	`"scope reduction discussions"`
`hybrid`	Broad exploration	`"architecture decisions"`

Default to lexical. Only use semantic when you don't know exact wording.

Enabling Semantic / Hybrid (one-time)

cass models status --json    # state: not_installed | installed | partial
cass models install          # downloads ~90MB MiniLM bundle from HuggingFace
cass index --semantic --build-hnsw --json   # builds vector + HNSW
cass search "QUERY" --mode hybrid --json    # then queries fall back to lexical if semantic missing

If the model is not_installed, --mode hybrid and --mode semantic silently fall back to lexical — no panic, no degraded experience. See SEMANTIC_AND_HYBRID.md.

Cross-Machine Search (Multi-Workstation Corpus)

When the user has agents running on css, csd, ts1, ts2, etc., the cass corpus on each machine is disjoint. Three ways to reach across:

# Option A: One-shot remote query (no setup, slow per call)
ssh css 'cass search "KEYWORD" --json --fields minimal --limit 20' | jq '.hits'

# Option B: Configured sources (preferred — caches the remote sessions locally)
cass sources setup                                      # interactive wizard, auto-discovers from ~/.ssh/config
cass sources add ssh://user@css --name css --preset linux-defaults
cass sources sync --source css --json                   # rsyncs new sessions, then re-indexes
cass search "KEYWORD" --json                            # results now span all configured sources
cass sources list --json                                # see what's wired up

# Option C: Parallel fan-out (when speed matters more than dedup)
for h in css csd ts1 ts2; do
  ssh "$h" 'cass search "KEYWORD" --json --fields minimal --limit 10' > "/tmp/cass-$h.json" &
done
wait
jq -s '[.[] | .hits[]] | unique_by(.source_path + (.line_number|tostring))' /tmp/cass-*.json

cass sources doctor diagnoses connectivity. Configured-source results carry origin_host in their hit metadata — preserve it when reporting back to the user. Full reference: REMOTE_SOURCES.md.

Anti-Patterns (Don't Do These)

Anti-pattern	Why it's wrong	Do instead
Asking the user "should I rebuild the index?"	They have agents waiting; rebuild is safe and idempotent	Just run `cass doctor --fix --json` (preserves source data)
Running `cass index --full` whenever `status` says unhealthy	A 25s rebuild for a 30-min stale index is wasteful	Check `index.stale` separately from `database.exists`; prefer incremental
Running bare `cass` to "see what's there"	Launches blocking TUI in the agent's session	Always `--json` or `--robot`; never bare
Piping `cass export` into `head`/`jq`	Broken-pipe panic on large sessions	`cass export ... -o /tmp/x.json` first, then operate on the file
Treating subagent files as the same as parent sessions	Subagents are separate conversation logs with their own line-2 prompt	Filter by `select(.source_path \| contains("subagent"))`
Using `--limit 0` for "no limit"	Earlier cass panics; modern cass caps to RAM ceiling but rarely what you want	Use a real limit (`--limit 50`) or pagination via `--cursor`
Searching with `--workspace /X` and trusting 0 hits	Workspace strings are case-sensitive and trailing-slash-sensitive	When 0 hits but you expected some, re-run with `--aggregate workspace --limit 1` to discover the canonical key
Skipping `--fields minimal` on wide scans	Default `full` returns ~3KB per hit × 100 hits = 300KB context burn	Always pass `--fields minimal` for wide passes; upgrade to `summary`/`full` for the few you keep
Reading session file with `cat` to extract a prompt	Loads the full conversation into context	`cass view PATH -n LINE -C 5` (window) or `cass expand PATH --line LINE --context 3` (message-aware)
Re-indexing on every `cass search`	Wasteful; index is shared across processes	Index is shared. Only refresh when `cass status` says `stale` or `recommended_action` says so

Resume a Past Session in Its Native Harness

cass resume resolves a session path into the exact command its native CLI uses to continue the conversation — Claude Code, Codex, Gemini, OpenCode, pi_agent.

# Find a relevant past session
cass search "KEYWORD" --json --fields minimal --limit 5 \
  | jq -r '.hits[0].source_path' > /tmp/sess.path

# Print the resume command without executing
cass resume "$(cat /tmp/sess.path)" --shell

# Or replace the current process with the resumed agent
cass resume "$(cat /tmp/sess.path)" --exec

Pitfall: Subagent files (subagents/agent-*.jsonl) are not resumable by design — they're orchestrated by a parent. You'll get session_id_not_found with a hint to pass --agent claude. Resolve to the parent session via cass context <path> --json first. See RESUME.md.

The Heuristics

Signal	Meaning	Action
`line_number` 1-3	User prompts	Filter: `select(.line_number <= 3)`
`/subagents/` line 2	THE extraction prompt	Copy-paste ready
`total_matches` > 10	Ritual pattern	Document it, reuse it
0 results + content exists	Workspace path mismatch	Use `--aggregate workspace`

jq Essentials

# User prompts only
| jq '[.hits[] | select(.line_number <= 3)]'

# Source paths for follow-up
| jq '.hits[].source_path' -r

# Aggregation buckets
| jq '.aggregations.agent.buckets'

# Count matches
| jq '.total_matches'

# Find repeated prompts (ritual detection)
| jq '[.hits[] | select(.line_number <= 3) | .title[0:80]] | group_by(.) | map({prompt: .[0], count: length}) | sort_by(-.count) | .[0:20]'

Hidden Power: Capabilities the Old Skill Missed

Command	What it gives you	When
`cass health`	<50ms exit-code-only preflight	Cron / hook gating
`cass index --watch --json`	Filesystem-watcher keeps index live; one cycle = `--watch-once /path`	Long-running orchestrator hosts
`cass index --idempotency-key K --json`	Cached identical-key responses for 24h	Retried CI runs
`cass index --semantic --build-hnsw`	O(log n) approximate vector search	After `cass models install`
`cass doctor --fix --json`	Auto-rebuilds index from DB; backs up corrupt DB to `.corrupt.<ts>`	Any time `status.healthy=false`
`cass resume PATH --shell`	Cross-harness resume command emitter	Continuing a past Codex/Claude/Gemini session
`cass sources setup`	Interactive ssh-config-aware multi-machine wizard	First time wiring a fleet
`cass sources sync --source NAME --json`	rsync remote sessions, then re-index	Periodic fleet refresh
`cass sources doctor --json`	Connectivity + path probe	Before relying on cross-machine results
`cass sources mappings ...`	Rewrite source paths to local equivalents	After moving a workspace
`cass sources agents {list,exclude,include}`	Persistent harness exclusion (writes `disabled_agents` in `~/.config/cass/sources.toml`)	When openclaw / a noisy connector floods the index
`cass models install / status / verify / remove`	Manage the MiniLM bundle (~90MB)	Enabling semantic search
`cass analytics tokens \| tools \| models`	Per-day/per-tool/per-model usage stats from indexed sessions	Cost reports, regression checks
`cass analytics rebuild --json`	Backfill rollup tables when coverage_pct is low	After bulk `import` or `sources sync`
`cass analytics validate --json`	Detect drift between raw rows and rollups	Sanity check before reporting numbers
`cass import chatgpt PATH`	Bring `conversations.json` exports from ChatGPT web into the corpus	Migrating off ChatGPT.com
`cass export-html PATH --password ...`	Encrypted, self-contained HTML conversation viewer	Sharing one session with a teammate
`cass pages encrypt ARCHIVE --with-recovery`	Encrypted searchable archive for static hosting	Publishing a redacted corpus
`cass introspect --json`	Full schema dump of every command + response	Programmatic discovery
`cass capabilities --json`	Feature/connector/limits for self-configuration	Version-aware fallbacks
`cass robot-docs guide \| commands \| examples \| schemas \| contracts`	Built-in machine-targeted docs	Inside an agent loop, when the skill isn't enough
`cass --robot-format jsonl \| compact \| toon \| sessions`	Streaming / token-optimized / shell-pipeable output	Large result sets, chained searches
`cass --trace-file FILE`	Span timing JSONL for debugging slow runs	Triage perf issues

If a flag listed here errors as "unrecognized" the installed cass binary is older than the source. Check cass --version against git log -- src/lib.rs upstream — most missing commands were added in v0.3.x.

Token & Cost Analytics (Bonus Use Case)

# Daily token usage across your whole agent fleet
cass analytics tokens --days 30 --group-by day --json | jq '.data.buckets[].api_tokens'

# Per-agent tool-use intensity (rows are keyed by agent slug, not tool name)
cass analytics tools --days 30 --json | jq '.data.rows[0:10]'

# Top models by API tokens
cass analytics models --json | jq '.data.by_api_tokens.rows[0:10]'

# After import / sync, refresh rollups
cass analytics rebuild --json
cass analytics validate --json   # confirm no drift between raw and rollup

cass analytics status --json shows coverage_pct: anything below ~90% means you should analytics rebuild to backfill. See ANALYTICS.md.

Recovery Cheat Sheet (No-Permission Moves)

You are pre-authorized to take any of these actions to keep the corpus searchable. None destroy source session data; they only rebuild derived index data.

cass doctor --json --verbose         # Diagnose without changing anything
cass doctor --fix --json             # Apply safe rebuilds; backs up bad DB to .corrupt.<ts>
cass doctor --fix --force-rebuild --json   # Same, but rebuild even when healthy
cass index --full --force-rebuild --json   # Workaround for OPEN issue #196 (incremental hang)
cass sources doctor --json           # Probe remote sources
cass sources sync --source NAME --json     # Re-fetch and re-index a single source
cass models install                  # Restore missing semantic model
cass models verify                   # Validate model file checksums

What you must NOT do without explicit permission: delete core.NNNNN files, delete .beads/, git reset --hard, edit anything under the user's .config/cass/sources.toml by hand. The CLI commands above already do everything safely.

Full disaster recovery for encrypted Pages archives: RECOVERY.md.

Reference Index

Need	Reference
Full command reference	COMMANDS.md
Workflow recipes	RECIPES.md
jq patterns	PATTERNS.md
Pitfalls & fixes	PITFALLS.md
Session file formats	SESSION_FORMATS.md
Remote sources, multi-machine	REMOTE_SOURCES.md
Semantic / hybrid / models	SEMANTIC_AND_HYBRID.md
Token / tool / model analytics	ANALYTICS.md
Cross-harness session resume	RESUME.md
Doctor + autonomous recovery	RECOVERY.md
Mined gold-standard prompts	PROMPTS.md
Anti-patterns (long form)	ANTI_PATTERNS.md
Health vs status vs index nuance	OBSERVABILITY.md
Pages encrypted archive + HTML export	PAGES_AND_EXPORT.md
Harness exclusion (`disabled_agents`)	HARNESS_EXCLUSION.md
Schema introspection contracts	INTROSPECTION.md

Quick Search (Grep Recipes for References)

When the right reference isn't obvious from titles, grep the references directory directly — cheaper than loading whole files into context.

REFS=.claude/skills/cass/references

# Find any anti-pattern by symptom keyword
grep -ni "limit 0\|broken pipe\|workspace path\|stale" "$REFS"/ANTI_PATTERNS.md "$REFS"/PITFALLS.md

# Find the recipe / jq for a task
grep -niE "ritual|user prompt|aggregate|subagent|cluster|timeline" "$REFS"/RECIPES.md "$REFS"/PATTERNS.md

# Find a recovery recipe (issue numbers, error strings, fix names)
grep -niE "doctor|--force-rebuild|issue #196|last_indexed_at|database is busy|core\." "$REFS"/RECOVERY.md "$REFS"/OBSERVABILITY.md

# Find a flag, command, or response field
grep -niE "robot-format|--mode|--cursor|_meta|fallback_mode|hits_clamped" "$REFS"/COMMANDS.md "$REFS"/INTROSPECTION.md

# Find a real "what did I ask" prompt template you've used before
grep -ni "use cass\|find that\|session history\|what worked" "$REFS"/PROMPTS.md

# Find harness/connector slugs and exclusion behavior
grep -niE "openclaw|disabled_agents|sources agents" "$REFS"/HARNESS_EXCLUSION.md "$REFS"/REMOTE_SOURCES.md

# Find resume / cross-harness behavior
grep -niE "resume|--shell|--exec|subagent.*not resumable" "$REFS"/RESUME.md

These grep across the whole references directory in <50ms and surface a line+filename that you can then open with the Read tool — far cheaper than reading the whole reference.

Scripts

Scripts live under scripts/. They contribute zero context tokens — they execute, never load. None of them mutate state without explicit confirmation.

Script	Usage
`./scripts/quick_analysis.sh /path`	One-command project overview (status → aggregate agent/date → top prompts)
`./scripts/prompt_miner.py --workspace /path`	Find repeated prompts (ritual detection)
`./scripts/validate.sh`	Validate cass install + skill structure
`./scripts/recover.sh`	Autonomous recovery decision tree (READY → STALE_BUT_USABLE → BROKEN); safe by default. Use as a `PreToolUse` hook before `cass search`. Wraps every `cass index` call in `timeout` to dodge issue #196 hangs
`./scripts/multi_machine_search.sh "QUERY" [host…]`	Parallel fan-out across the fleet (defaults to css/csd/ts1/ts2); merges hits, dedups by source_path:line, sorts by score. Per-host `timeout` cap; safe re: shell-special query chars

Validation

# Quick health check
cass status --json | jq '.index.fresh'

# Should return: true

If false, run: cass index --json