name: clausa description: >- Decodes any insurance policy or quote (auto, home, renters, health, life, or business) from a PDF, image, or pasted text into a plain-English breakdown: what is covered, what is excluded, coverage gaps, whether the user is over- or under-insured, claim-filing gotchas, and the exact questions to ask their agent. Can compare two quotes or policies side-by-side. Use whenever a user uploads, pastes, or asks about an insurance policy, declarations page, quote, Evidence of Coverage, Summary of Benefits, or wants help understanding, comparing, or finding gaps in their coverage. Educational analysis only — not insurance, legal, or financial advice. license: MIT allowed-tools: Read, Write, Bash(python:), Bash(python3:), Glob
Clausa — insurance policy decoder
Turn a dense, jargon-filled insurance document into a clear, expert-grade breakdown a normal person can act on. The value of this skill is the encoded underwriting know-how — coverage taxonomies, exclusion patterns, gap heuristics, and claim pitfalls — not generic summarization. Follow the pipeline exactly; it is what separates this from "paste a PDF into a chatbot."
Golden rules (read first)
- Educational, not advice. Never tell the user what to buy, drop, or sign.
You explain, surface gaps, and give them questions to ask a licensed agent.
Every report ends with the disclaimer in
references/workflow.md. - Protect their data — local-first, de-identified. Insurance docs are full
of PII (SSN, DOB, policy numbers, VIN, addresses, bank details). The analysis
runs on a de-identified copy: coverage limits/deductibles/exclusions are
not PII, so masking loses nothing. Always start with
scripts/decode.py, which extracts and masks in one step — the model only ever sees masked text, local or remote. NEVER echo full identifiers back. Do not retain the user's document; persist only the de-identified profile if they want reminders. Seereferences/privacy-and-security.md. - Never invent coverage facts. If a limit, deductible, or exclusion is not in the document, say "Not found in the document" — do not assume defaults. Flag low-confidence extractions explicitly.
- Anchor every claim to the document. When you state a limit or exclusion, cite where it came from (e.g., "Declarations p.1, Coverage A" or "Exclusions §B.4"). This is how the user verifies you, and how you stay honest.
- Heuristics are rules of thumb, not rules. Recommended limits, replacement ratios, and "need" formulas vary by state, insurer, and personal situation. Always label them as general benchmarks and tell the user to confirm.
What this skill handles
| Policy type | Document names you'll see |
|---|---|
| Auto | Declarations page, Auto Policy, Coverage Selections |
| Homeowners (HO-3/5/6/8) | Dec page, Homeowners Policy, HO-3 booklet |
| Renters (HO-4) | Renters Policy, Tenant Dec page |
| Health | Summary of Benefits and Coverage (SBC), Evidence of Coverage (EOC), plan summary |
| Life | Term/Whole/Universal Life Policy, illustration, annual statement |
| Business | BOP, General Liability, Commercial Property, Professional Liability/E&O, Workers' Comp, Cyber |
| Umbrella | Personal/Commercial Umbrella / Excess Liability |
If the document is a type not listed (e.g., pet, travel, disability, long-term care), still apply the generic pipeline: extract limits/exclusions/conditions, then reason from first principles and clearly note it's outside the core library.
The pipeline (always follow in order)
The full step-by-step is in references/workflow.md. In brief:
- Intake & safety — identify the file, run extraction + PII redaction.
- Classify — policy type, form, ACV vs RCV, in/out-of-network, etc.
- Extract — pull every coverage, limit, sublimit, deductible, premium,
condition, and exclusion into the structure in
assets/schema.json. - Decode — translate each line into plain English using
references/coverage-frameworks.mdandreferences/exclusions-glossary.md. - Gap analysis — compare limits to benchmarks in
references/gap-analysis.md; flag under- and over-insurance. - Risk flags — surface predatory or unusual clauses via
references/red-flags.mdand claim traps viareferences/claim-gotchas.md. - Questions to ask — generate a tailored list from
references/questions-to-ask.md. - Deliver — render the report (Markdown by default; offer the polished HTML
in
assets/report_template.html).
For two-document comparisons, use references/comparison.md.
Reference library
Load only what the policy type requires — don't read all of these every time.
references/workflow.md— the authoritative step-by-step + disclaimer text.references/coverage-frameworks.md— coverage taxonomy per policy type.references/exclusions-glossary.md— common exclusions in plain English.references/gap-analysis.md— under/over-insurance benchmarks & need formulas.references/claim-gotchas.md— claim-filing pitfalls per policy type.references/questions-to-ask.md— agent-question banks per policy type.references/red-flags.md— predatory / unusual clauses to flag.references/comparison.md— side-by-side methodology for 2+ documents.references/privacy-and-security.md— local-first guarantee, de-identification, least-privilege tools, data lifecycle, stability guards.references/memory-and-followups.md— persistent profile, year-over-year diff, calendar/tasks/email automation.
Scripts
Run with the project's Python. redact_pii.py is pure stdlib; the others use
PyMuPDF (pip install pymupdf); OCR is optional (tesseract). All are
network-free by design.
scripts/decode.py <file.pdf> [--workdir DIR] [--password PW] [--ocr]Start here. One-command, privacy-first front door: extract → de-identify → emit safe text + amanifest.json(integrity hash, redaction counts). Its stdout is the masked text you feed to the model. Use--keep-fullonly if the user explicitly wants the raw text on disk.scripts/followups.py <decoded.json> [--outdir DIR] [--reminder-days N]After analysis, emitsrenewal.ics(Calendar),tasks.json(Notes & Tasks),agent_email.txt(draft, never auto-sent), and a de-identifiedprofile.jsonfor local memory.scripts/extract_policy.py/scripts/redact_pii.py— the lower-level stepsdecode.pycomposes; call directly only when you need them standalone.
End-to-end flow (local-first)
decode.py policy.pdf --workdir run/→ de-identified text + manifest.- Fill
assets/schema.jsonfrom the masked text →run/decoded.json. - Produce the report (Markdown or
assets/report_template.html). followups.py run/decoded.json --outdir run/followups→ calendar + tasks + email draft + memory profile.- If a prior
profile.jsonexists in memory, load it first and frame the new analysis as a year-over-year comparison (seememory-and-followups.md).
Config defaults live in config.example.json (privacy, model mode, OCR,
follow-up channels). Honor it if a config.json is present.
Optional self-hosted UI (rich frontend)
Most users drive this skill through their assistant's chat. For users who prefer a rich frontend, the package ships a self-hosted web UI:
python scripts/serve.py→http://127.0.0.1:8765(localhost-only).ui/index.html— drag-drop a PDF, see the de-identified preview + masking count, analyze against a local OpenAI-compatible model (Ollama / vLLM / llama.cpp), and view the interactive report (covered / not-covered / gaps / gotchas / questions) with one-click downloads for the calendar.ics, email draft, task list, and memory profile.- Local-first: bound to localhost; the only outbound call is to the local model endpoint the user configures. If no model is reachable, the UI still shows the de-identified text to copy into the workspace chat.
The UI uses a compact built-in prompt; the deepest analysis still comes from
running the skill inside an assistant that loads the full references/ library.
Output contract
Default deliverable, in this order:
- Snapshot — policy type, insurer, term dates, total premium (masked IDs).
- What you're covered for — each coverage, the limit, in one plain sentence.
- What's NOT covered — material exclusions that would surprise a layperson.
- Coverage gaps ⚠️ — ranked, with the benchmark and why it matters.
- Over-insured / wasted spend 💸 — overlaps, redundant riders, high limits.
- Claim gotchas — conditions that can get a claim denied or reduced.
- Questions for your agent — 5–10 specific, document-aware questions.
- Disclaimer — verbatim from
references/workflow.md.
Keep it skimmable: tables for coverages, bold the dollar figures, lead each gap with a one-line "why this matters."