name: ragmail description: Query ragmail LanceDB workspaces for email search, counts, date/topic filters, and amount extraction. Use for answering questions about who/what/when from email data. metadata: short-description: Query ragmail email databases
ragmail skill
Use this skill when the user asks questions about email content in ragmail workspaces (who said what, counts by sender/date, topic summaries, costs mentioned, etc.). Default to running small Python scripts that query LanceDB directly; use the bundled CLI script only as a convenience or for quick reference.
Quick start
- Identify the target workspace (e.g.,
2026) or the full LanceDB path. - Ensure Python runs from the repo
.venv(created with eitheruvorpython -m venv). - Prefer a short ad‑hoc Python script that connects to LanceDB and runs the query.
- If the question uses relative dates ("last summer", "yesterday"), convert to absolute dates before querying.
Python environment (required)
If .venv already exists, reuse it:
source .venv/bin/activate
./.venv/bin/python -c "import lancedb; print('ok')"
If .venv is missing, set it up with either:
# Option A: uv-managed venv (recommended)
uv venv
source .venv/bin/activate
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv sync --project python
# Option B: stdlib venv module
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e python
Locate the database
- Workspace root:
workspaces/<name>/ - Database:
workspaces/<name>/db/email_search.lancedb - If
workspaces/<name>/workspace.jsonexists, preferpaths.db.
Direct query recipe (preferred)
Use Python + lancedb to query the tables directly.
import lancedb
from pathlib import Path
db_path = Path("workspaces/2026/db/email_search.lancedb")
db = lancedb.connect(str(db_path))
emails = db.open_table("emails")
# FTS search + filters
rows = (
emails.search("Arkin teacher", query_type="fts")
.where("year = 2026 AND month = 1", prefilter=True)
.limit(50)
.to_list()
)
For deeper body matches, use email_chunks and search chunk_text.
Direct Python Queries (recommended)
Use UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python - <<'PY' to run a short script inline. This enables more complex filtering and joins than the CLI wrapper.
Example: FTS + filter + custom projection
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python - <<'PY'
import lancedb
db = lancedb.connect("workspaces/2026/db/email_search.lancedb")
emails = db.open_table("emails")
rows = (
emails.search("school trip", query_type="fts")
.where("year = 2025 AND month = 9", prefilter=True)
.limit(20)
.to_list()
)
for r in rows:
print(r["date"], r["from_address"], r["subject"])
PY
Example: Pull full emails after chunk hits
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python - <<'PY'
import lancedb
db = lancedb.connect("workspaces/2026/db/email_search.lancedb")
chunks = db.open_table("email_chunks")
emails = db.open_table("emails")
hits = (
chunks.search("refund", query_type="fts")
.where("year = 2024", prefilter=True)
.limit(5)
.to_list()
)
email_ids = {h["email_id"] for h in hits}
full = (
emails.search()
.where("email_id IN (" + ",".join([f\"'{i}'\" for i in email_ids]) + ")", prefilter=True)
.to_list()
)
for r in full:
print(r["subject"])
print(r["body_plain"][:500])
print("----")
PY
Example: Aggregate counts by sender
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python - <<'PY'
import lancedb
from collections import Counter
db = lancedb.connect("workspaces/2026/db/email_search.lancedb")
emails = db.open_table("emails")
rows = (
emails.search("invoice", query_type="fts")
.where("year = 2024", prefilter=True)
.limit(2000)
.to_list()
)
counts = Counter(r["from_address"] for r in rows)
for addr, cnt in counts.most_common(10):
print(cnt, addr)
PY
Optional CLI Script (reference / quick use)
Use this when you need fast counts, snippets, or amount extraction without writing a Python snippet.
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python .agents/skills/ragmail/scripts/ragmail_query.py query --workspace 2026 --query "Arkin teacher"
Commands
Run from repo root:
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python .agents/skills/ragmail/scripts/ragmail_query.py query --workspace 2026 --query "Arkin teacher" --limit 50
Query
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python .agents/skills/ragmail/scripts/ragmail_query.py query \
--workspace 2026 \
--query "Arkin teacher" \
--limit 50 \
--fields date,from_name,from_address,subject,email_id,snippet
Count
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python .agents/skills/ragmail/scripts/ragmail_query.py count \
--workspace 2026 \
--from-like "bob" \
--year 2026 \
--month 2
Sum amounts (costs)
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python .agents/skills/ragmail/scripts/ragmail_query.py sum \
--workspace 2026 \
--query "house painting" \
--start 2025-06-01 \
--end 2025-08-31
Filters (all commands)
--year/--monthfor quick month scoping--start/--endfor date ranges (YYYY-MM-DD or ISO datetime)--from-addressfor exact sender match--from-like,--to-like,--subject-like,--labels-likefor string filters--tablecan beemails(default) oremail_chunks
Output notes
queryemits key fields and a snippet.countreturnscount=and warns if it hits--max-scan.sumextracts currency-like amounts from matching emails and reports totals plus examples.
Find And Display Full Email
Use a two-step flow: locate the matching chunk to get email_id, then fetch the full email body from the emails table.
- Find the matching chunk and capture
email_id:
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python .agents/skills/ragmail/scripts/ragmail_query.py query \
--workspace 2026 \
--table email_chunks \
--query "Born 24 March 2010" \
--limit 5 \
--fields date,from_name,from_address,subject,email_id,chunk_text
- Fetch the full email by
email_id(direct LanceDB query):
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python - <<'PY'
import lancedb
from pathlib import Path
db = lancedb.connect("workspaces/2026/db/email_search.lancedb")
emails = db.open_table("emails")
email_id = "bf509cfe2e3ca574"
rows = (
emails.search()
.where(f"email_id = '{email_id}'", prefilter=True)
.limit(1)
.to_list()
)
print(rows[0]["body_plain"] if rows else "not found")
PY
Alternate: Use ragmail_query.py with --email-id to fetch the full body directly.
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python .agents/skills/ragmail/scripts/ragmail_query.py query \
--workspace 2026 \
--email-id bf509cfe2e3ca574 \
--limit 1 \
--fields date,from_name,from_address,subject,email_id,body_plain
Tip: If the query uniquely identifies the email, you can also ask ragmail_query.py to emit body_plain directly by adding it to --fields.
Schema + indexing reference
See references/db.md for the current schema, FTS columns, workspace layout, and ingestion notes.
Raw data + pipeline locations
When you need deeper context or to verify a record, dig in:
- Raw mbox:
private/gmail-*.mbox - Workspace outputs:
workspaces/<name>/clean,workspaces/<name>/spam,workspaces/<name>/split,workspaces/<name>/reports,workspaces/<name>/logs
When to go deeper
If search results are thin or you need more body context:
- Try
--table email_chunkswith the same query. - Increase
--limitor--max-scan. - Use narrower date ranges.
Attachments (opt-in, index-backed and usually fast)
Only fetch attachments if the user explicitly asks. Attachment extraction now uses split/mbox_index.jsonl (mbox_file + offset + length) to seek directly to one message, so it is usually fast and does not require full MBOX scans.
Preferred flow:
- Check metadata in LanceDB (
has_attachment,attachment_names,attachment_types) to see if attachments exist. - If the user explicitly asks, use the attachment extractor to pull from the indexed message bytes.
- Ensure
workspaces/<name>/split/mbox_index.jsonlexists (created bypreprocess).
Fast metadata check (no MBOX scan)
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python - <<'PY'
import lancedb
db = lancedb.connect("workspaces/2026/db/email_search.lancedb")
emails = db.open_table("emails")
email_id = "bf509cfe2e3ca574"
row = (
emails.search()
.where(f"email_id = '{email_id}'", prefilter=True)
.limit(1)
.to_list()
)
print(row[0]["has_attachment"], row[0]["attachment_names"], row[0]["attachment_types"])
PY
Extract an attachment by Message-ID (optimized)
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python .agents/skills/ragmail/scripts/ragmail_attachments.py \
--workspace 2026 \
--message-id "<abc123@example.com>" \
--out-dir /tmp/attachments
Extract by email_id
UV_PROJECT_ENVIRONMENT=$PWD/.venv uv run --project python python .agents/skills/ragmail/scripts/ragmail_attachments.py \
--workspace 2026 \
--email-id bf509cfe2e3ca574 \
--out-dir /tmp/attachments
Index location (required)
The pipeline creates workspaces/<name>/split/mbox_index.jsonl (during the preprocess stage). Attachment extraction requires this index.
If it’s missing, ask the user to run ragmail pipeline --stages preprocess --workspace <name>.
Keep updated
If schemas, FTS columns, or workspace layouts change, update:
references/db.mdscripts/ragmail_query.py