clausa - SKILL.md Agent Skill

name: clausa description: >- Decodes any insurance policy or quote (auto, home, renters, health, life, or business) from a PDF, image, or pasted text into a plain-English breakdown: what is covered, what is excluded, coverage gaps, whether the user is over- or under-insured, claim-filing gotchas, and the exact questions to ask their agent. Can compare two quotes or policies side-by-side. Use whenever a user uploads, pastes, or asks about an insurance policy, declarations page, quote, Evidence of Coverage, Summary of Benefits, or wants help understanding, comparing, or finding gaps in their coverage. Educational analysis only — not insurance, legal, or financial advice. license: MIT allowed-tools: Read, Write, Bash(python:), Bash(python3:), Glob

Clausa — insurance policy decoder

Turn a dense, jargon-filled insurance document into a clear, expert-grade breakdown a normal person can act on. The value of this skill is the encoded underwriting know-how — coverage taxonomies, exclusion patterns, gap heuristics, and claim pitfalls — not generic summarization. Follow the pipeline exactly; it is what separates this from "paste a PDF into a chatbot."

Golden rules (read first)

Educational, not advice. Never tell the user what to buy, drop, or sign. You explain, surface gaps, and give them questions to ask a licensed agent. Every report ends with the disclaimer in references/workflow.md.
Protect their data — local-first, de-identified. Insurance docs are full of PII (SSN, DOB, policy numbers, VIN, addresses, bank details). The analysis runs on a de-identified copy: coverage limits/deductibles/exclusions are not PII, so masking loses nothing. Always start with scripts/decode.py, which extracts and masks in one step — the model only ever sees masked text, local or remote. NEVER echo full identifiers back. Do not retain the user's document; persist only the de-identified profile if they want reminders. See references/privacy-and-security.md.
Never invent coverage facts. If a limit, deductible, or exclusion is not in the document, say "Not found in the document" — do not assume defaults. Flag low-confidence extractions explicitly.
Anchor every claim to the document. When you state a limit or exclusion, cite where it came from (e.g., "Declarations p.1, Coverage A" or "Exclusions §B.4"). This is how the user verifies you, and how you stay honest.
Heuristics are rules of thumb, not rules. Recommended limits, replacement ratios, and "need" formulas vary by state, insurer, and personal situation. Always label them as general benchmarks and tell the user to confirm.

What this skill handles

Policy type	Document names you'll see
Auto	Declarations page, Auto Policy, Coverage Selections
Homeowners (HO-3/5/6/8)	Dec page, Homeowners Policy, HO-3 booklet
Renters (HO-4)	Renters Policy, Tenant Dec page
Health	Summary of Benefits and Coverage (SBC), Evidence of Coverage (EOC), plan summary
Life	Term/Whole/Universal Life Policy, illustration, annual statement
Business	BOP, General Liability, Commercial Property, Professional Liability/E&O, Workers' Comp, Cyber
Umbrella	Personal/Commercial Umbrella / Excess Liability

If the document is a type not listed (e.g., pet, travel, disability, long-term care), still apply the generic pipeline: extract limits/exclusions/conditions, then reason from first principles and clearly note it's outside the core library.

The pipeline (always follow in order)

The full step-by-step is in references/workflow.md. In brief:

Intake & safety — identify the file, run extraction + PII redaction.
Classify — policy type, form, ACV vs RCV, in/out-of-network, etc.
Extract — pull every coverage, limit, sublimit, deductible, premium, condition, and exclusion into the structure in assets/schema.json.
Decode — translate each line into plain English using references/coverage-frameworks.md and references/exclusions-glossary.md.
Gap analysis — compare limits to benchmarks in references/gap-analysis.md; flag under- and over-insurance.
Risk flags — surface predatory or unusual clauses via references/red-flags.md and claim traps via references/claim-gotchas.md.
Questions to ask — generate a tailored list from references/questions-to-ask.md.
Deliver — render the report (Markdown by default; offer the polished HTML in assets/report_template.html).

For two-document comparisons, use references/comparison.md.

Reference library

Load only what the policy type requires — don't read all of these every time.

references/workflow.md — the authoritative step-by-step + disclaimer text.
references/coverage-frameworks.md — coverage taxonomy per policy type.
references/exclusions-glossary.md — common exclusions in plain English.
references/gap-analysis.md — under/over-insurance benchmarks & need formulas.
references/claim-gotchas.md — claim-filing pitfalls per policy type.
references/questions-to-ask.md — agent-question banks per policy type.
references/red-flags.md — predatory / unusual clauses to flag.
references/comparison.md — side-by-side methodology for 2+ documents.
references/privacy-and-security.md — local-first guarantee, de-identification, least-privilege tools, data lifecycle, stability guards.
references/memory-and-followups.md — persistent profile, year-over-year diff, calendar/tasks/email automation.

Scripts

Run with the project's Python. redact_pii.py is pure stdlib; the others use PyMuPDF (pip install pymupdf); OCR is optional (tesseract). All are network-free by design.

scripts/decode.py <file.pdf> [--workdir DIR] [--password PW] [--ocr] Start here. One-command, privacy-first front door: extract → de-identify → emit safe text + a manifest.json (integrity hash, redaction counts). Its stdout is the masked text you feed to the model. Use --keep-full only if the user explicitly wants the raw text on disk.
scripts/followups.py <decoded.json> [--outdir DIR] [--reminder-days N] After analysis, emits renewal.ics (Calendar), tasks.json (Notes & Tasks), agent_email.txt (draft, never auto-sent), and a de-identified profile.json for local memory.
scripts/extract_policy.py / scripts/redact_pii.py — the lower-level steps decode.py composes; call directly only when you need them standalone.

End-to-end flow (local-first)

decode.py policy.pdf --workdir run/ → de-identified text + manifest.
Fill assets/schema.json from the masked text → run/decoded.json.
Produce the report (Markdown or assets/report_template.html).
followups.py run/decoded.json --outdir run/followups → calendar + tasks + email draft + memory profile.
If a prior profile.json exists in memory, load it first and frame the new analysis as a year-over-year comparison (see memory-and-followups.md).

Config defaults live in config.example.json (privacy, model mode, OCR, follow-up channels). Honor it if a config.json is present.

Optional self-hosted UI (rich frontend)

Most users drive this skill through their assistant's chat. For users who prefer a rich frontend, the package ships a self-hosted web UI:

python scripts/serve.py → http://127.0.0.1:8765 (localhost-only).
ui/index.html — drag-drop a PDF, see the de-identified preview + masking count, analyze against a local OpenAI-compatible model (Ollama / vLLM / llama.cpp), and view the interactive report (covered / not-covered / gaps / gotchas / questions) with one-click downloads for the calendar .ics, email draft, task list, and memory profile.
Local-first: bound to localhost; the only outbound call is to the local model endpoint the user configures. If no model is reachable, the UI still shows the de-identified text to copy into the workspace chat.

The UI uses a compact built-in prompt; the deepest analysis still comes from running the skill inside an assistant that loads the full references/ library.

Output contract

Default deliverable, in this order:

Snapshot — policy type, insurer, term dates, total premium (masked IDs).
What you're covered for — each coverage, the limit, in one plain sentence.
What's NOT covered — material exclusions that would surprise a layperson.
Coverage gaps ⚠️ — ranked, with the benchmark and why it matters.
Over-insured / wasted spend 💸 — overlaps, redundant riders, high limits.
Claim gotchas — conditions that can get a claim denied or reduced.
Questions for your agent — 5–10 specific, document-aware questions.
Disclaimer — verbatim from references/workflow.md.

Keep it skimmable: tables for coverages, bold the dollar figures, lead each gap with a one-line "why this matters."