name: vectorize-context description: Manual vector DB for semantic search over .opencode/context/ — triggered via /harvest-context search level: 2
Vectorize Context
Indexes .opencode/context/ markdown files into a local sqlite-vec vector database for semantic search. Manual trigger only — invoked via /harvest-context search.
How It Works
The system uses lazy freshness: every query automatically stats all context files, re-indexes only what changed, then searches. This covers every write path:
| Trigger | Behavior |
|---|---|
| Hub writes a new context/decision/pattern file | Next query picks it up automatically |
/harvest-context saves research docs |
Indexed on next query |
/orchestrate completes and saves patterns |
Indexed on next query |
/ideation finalizes a plan to context |
Indexed on next query |
Direct .md file edit in .opencode/context/ |
Indexed on next query |
| Full re-index needed | Delete .opencode/.vector/ — auto-rebuilds |
The ML model is only loaded when there's actual work to do (files changed). If everything is up to date, ensureIndexed() returns instantly with no model load.
Scripts
| Script | Purpose |
|---|---|
scripts/veclib.mjs |
Shared library — exported programmatic API for all integrations |
scripts/vectorize.mjs |
CLI: manual re-index (for debugging/forced refresh) |
scripts/query.mjs |
CLI: semantic search over context |
Programmatic API (veclib.mjs)
Used by Hubs hub subcommands and agents:
// Lazy re-index: stats files, only loads model if something changed
import { ensureIndexed } from './veclib.mjs';
await ensureIndexed(); // current project
await ensureIndexed('/path/to/app'); // specific project
// Semantic search (auto-refreshes index first)
import { queryChunks } from './veclib.mjs';
const results = await queryChunks(undefined, 'auth patterns', 10);
// results: [{ source, heading, content, file_path, distance }]
// Index stats
import { getIndexStats } from './veclib.mjs';
const stats = await getIndexStats();
// { exists, totalChunks, totalFiles, files: [...] }
CLI Usage
# Manual re-index (useful for debugging)
node {skill_dir}/scripts/vectorize.mjs
# Semantic search (auto-refreshes on every call)
node {skill_dir}/scripts/query.mjs "how does error handling work"
QUERY="auth patterns" node {skill_dir}/scripts/query.mjs
Output Format
Query results
=== Search Results ===
1. patterns/error-handling.md — Error Patterns (score: 0.71)
Error handling follows a centralized approach with...
[file: .opencode/context/patterns/error-handling.md]
2. frameworks/architecture.md — System Design (score: 0.64)
The project follows a layered architecture...
[file: .opencode/context/frameworks/architecture.md]
Dependencies
npm install better-sqlite3 sqlite-vec @xenova/transformers
Requires Node.js 18+ and ~200MB RAM for the embedding model (loaded lazily only when indexing).
Integration Points
Hub Lifecycle Integration (manual trigger)
After any hub subcommand writes to .opencode/context/:
- Write the file (existing behavior)
- Manual vectorize: Run
/harvest-context searchto index and query — no automatic indexing
Agent Integration
Agents can use queryChunks() to retrieve relevant context during execution:
const ctx = await queryChunks(process.cwd() + '/.opencode', query, 5);
// Inject results into agent context as supporting evidence
Forced Re-index
rm -rf .opencode/.vector/ # Delete the DB
# Next query or ensureIndexed() call auto-rebuilds it
How It Works Internally
ensureIndexed()scans.opencode/context/for*.mdfiles- Compares current file mtimes against stored mtimes in the DB
- If no files changed → returns immediately, no model loaded
- If files changed → loads model, chunks new/changed files by
##/###headers - Generates 384-dim embeddings via
Xenova/all-MiniLM-L6-v2 - Deletes old chunks for changed files, inserts new ones
- Query always runs against the freshly-updated index
The vec0 virtual table uses L2 distance. Since embeddings are normalized (unit vectors), L2 distance sorts equivalently to cosine similarity — nearest neighbors are the most semantically similar chunks.