name: anon
description: De-identify a session transcript (file or folder) by redacting PII LOCALLY before any sharing or cloud use. Produces a redacted GREEN copy with unique reserved-sentinel placeholders ([CONFIDE_PERSON_0001], [CONFIDE_EMAIL_0001], [CONFIDE_DATE_0002]...) plus a counts-only stats summary, and a local secret .map.json (0600, gitignored) that enables confide:rehydrate to restore real values after a cloud analysis. Use when the user says "anonymize this transcript", "redact PII", "de-identify session", "make safe to share", "strip personal data", "anonymize notes before sending to an LLM", or points at a transcript/folder that should be scrubbed. Local-only by default — raw text never leaves the machine; the map is the only artifact with originals and stays local; nothing printed is PII; human review is still required before sharing.
confide:anon — local PII redaction
Redact personally identifying information from a transcript (or a whole folder) using the
layered local stack in shared/confide_core.py: regex (emails / URLs / phones / IDs / dates)
→ Natasha (RU named entities) → local LLM (quasi-identifiers). Spans are interval-merged and
replaced with placeholders. The result is a GREEN copy safe to review.
By default anon emits a reversible map: unique reserved-sentinel placeholders (the same
EXACT value always becomes the same [CONFIDE_<TYPE>_<NNNN>], e.g. [CONFIDE_PERSON_0001],
[CONFIDE_EMAIL_0001], [CONFIDE_DATE_0002]) plus a sibling <name>.map.json (structured:
schema_version, doc_id, green_sha256, created, entries[]) mapping each placeholder to
its original. The CONFIDE_ sentinel is reserved — a real transcript essentially never
contains it, so there is no collision risk and rehydrate never touches ordinary prose like
"Person 1". This is exact-value matching, not entity coreference: inflected forms (e.g. RU
"Марина" vs "Марины") are SEPARATE placeholders (no lemmatized merge). That map is the
secret — the ONLY artifact with originals; it stays local and enables
confide:rehydrate to put real values back into a cloud analysis of the GREEN text
(round-trip: redact → analyze the green → rehydrate locally). Use --no-map for the legacy
non-reversible [TYPE] style (no map written).
Privacy invariants (do not violate)
- Local-only. No cloud APIs. Raw text never leaves the machine.
- By design, original PII is written ONLY to the local, 0600, gitignored
<name>.map.json, which never leaves the machine. It is never printed, and never written to the GREEN copy (the GREEN holds placeholders only; the original file is read, never rewritten). The map is the SECRET — the one artifact with originals. A.gitignorecovering*.map.json,*.view.html, and*.restored.mdis written/updated in the output dir so these local-only artifacts can never be committed. If the output dir looks cloud-synced (iCloud / Dropbox / OneDrive / Google Drive), anon prints a WARNING that the secret map would be uploaded. - Counts only. stdout and the
*.stats.jsonfiles carry counts (by type, by layer, redaction rate) — never PII values or redacted text dumps. - Human review still required. Redaction is a floor, not a guarantee. A human must read the GREEN copy before sharing. Pair with confide:red to check residual re-identification risk.
Run it
Run the script on a single file or a directory (processes every .md/.txt):
python3 skills/anon/scripts/anon.py PATH
For each input it writes, next to the file (or into --out DIR):
<name>.green.md— the redacted text (the only thing safe to look at / share after review)<name>.stats.json— counts only<name>.map.json— the reversible map (secret; 0600; gitignored; local only). Skipped with--no-map. A.gitignorewith*.map.jsonis also written/updated in the output dir.
Options:
--layers regex,natasha,llm— override which detection layers run (default from config). Use--layers regexfor a fully offline, deterministic pass (no models/network).--out DIR— write outputs to DIR instead of next to each input.--dry-run— compute and print stats only; write no files.--no-map— disable the reversible map; emit non-unique[TYPE]placeholders, no map.json.
Already-emitted *.green.md / *.stats.json / *.map.json are skipped, so a folder can be
re-run safely.
After running
- Report the counts summary (types, layers, redaction rate) — never paste PII.
- Tell the user the GREEN copy still needs human review before sharing.
- Remind them the
<name>.map.jsonis the secret (originals) — it stays local, never committed/shipped — and that confide:rehydrate uses it to restore real values into a cloud analysis of the GREEN text. - Offer confide:red to probe what an attacker could still infer/link.
Setup
Layer availability (Natasha, local LLM via Ollama) and defaults come from config — run
confide:setup first if Natasha/Ollama aren't installed. --layers regex always works offline.