name: turbovec description: Local vector search with turbovec (Google TurboQuant, Rust+Python). Use for embeddings indexing, semantic search, local RAG, or filtered vector retrieval without external services like Pinecone.
turbovec — local vector index
Rust vector index on Google's TurboQuant quantizer. 10M float32 docs (31GB) fit in ~4GB. Faster than FAISS. No train step: add vectors, they are indexed. Fully local — nothing leaves the machine.
Installation (v0.7.0)
pip install turbovec==0.7.0
Create a dedicated venv and install there; do not pip-install into a system Python.
API (v0.7.0 — differs from upstream README)
import numpy as np
from turbovec import TurboQuantIndex
idx = TurboQuantIndex(dim=1536, bit_width=4)
idx.add(vecs) # 2D float32, C-contiguous: np.ascontiguousarray(x.astype(np.float32))
scores, ids = idx.search(q, k=10) # q must be 2D (batch): vecs[7:8], not vecs[7]
# returns 2D arrays (one row per query)
mask = np.zeros(len(idx), dtype=bool) # filtered search: bool mask over slots
mask[allowed_slots] = True
scores, ids = idx.search(q, k=10, mask=mask)
idx.write("index.tq") # persistence round-trips exactly
idx = TurboQuantIndex.load("index.tq")
Gotchas:
- Queries MUST be 2D; a 1D vector raises
TypeError: 'ndarray' object cannot be cast as 'ndarray'. maskis a bool array of lengthlen(idx)(slot-based), not an id allowlist. The README'sIdMapIndex.add_with_ids/allowlist=API is newer than the 0.7.0 wheel.bit_width=4is the standard memory/recall tradeoff; pair with any local embedding model for an air-gapped RAG stack.
When to use
- Semantic search over local corpora (memory, knowledge base, transcripts)
- RAG retrieval where privacy/latency matters (no Pinecone round-trip)
- Hybrid retrieval: narrow candidates with SQL/BM25 first, then dense rerank via
mask