dlp-test-files

star 3

Generate DLP test files for scanner-efficacy testing — clean carrier files plus "dirty" copies with synthetic sensitive data embedded via multiple hiding techniques. Use when asked to create DLP/data-loss test corpora, embed synthetic sensitive data in PDF/PNG/JPEG/SVG/DOCX, produce clean/dirty test pairs, or build files to exercise a content scanner's DLP detection. Drives the `airs runtime dlp generate` CLI command.

cdot65 By cdot65 schedule Updated 5/28/2026

name: dlp-test-files description: Generate DLP test files for scanner-efficacy testing — clean carrier files plus "dirty" copies with synthetic sensitive data embedded via multiple hiding techniques. Use when asked to create DLP/data-loss test corpora, embed synthetic sensitive data in PDF/PNG/JPEG/SVG/DOCX, produce clean/dirty test pairs, or build files to exercise a content scanner's DLP detection. Drives the airs runtime dlp generate CLI command.

DLP Test-File Generation

Generate paired clean and dirty files to measure how well a content scanner (e.g. Prisma AIRS) detects sensitive data hidden across file formats and embedding channels. All embedded data is synthetic (reserved/test ranges only — never real PII).

This skill drives the airs runtime dlp generate command. Do not hand-roll generators; use the CLI.

When to use

  • "Generate DLP test files / a DLP test corpus"
  • "Embed (synthetic) sensitive data in a PDF / PNG / JPEG / SVG / DOCX"
  • "Create clean and dirty test files for the scanner"
  • Building a batch to score a DLP/guardrail scanner's detection rate

Command

airs runtime dlp generate [options]
Option Default Meaning
--types <list> all Comma list of pdf,png,jpeg,svg,docx (or all)
--count <n> 1 Clean files to generate per type
--out <dir> ./temp Base output directory
--techniques <list> all all or comma list of technique ids (below)
--seed <n> random Seed the synthetic-payload RNG for reproducible runs
--output <fmt> pretty pretty or json summary

Examples

# Full corpus, all formats + techniques, into ./temp
airs runtime dlp generate

# Only images, 3 each, reproducible
airs runtime dlp generate --types png,jpeg,svg --count 3 --seed 42

# Just the PNG steganography variant, JSON summary
airs runtime dlp generate --types png --techniques stego-lsb --output json

Output layout

<out>/
  clean/<type>/<base>.<ext>                 # benign carriers (controls)
  dirty/<type>/<base>__<technique>.<ext>    # one per (clean file × technique)
  manifest.json                             # dirty file -> technique + embedded values

manifest.json records, for every dirty file, the technique used and the exact synthetic values embedded — use it to score scanner hits/misses.

Techniques per format

Format Technique ids
PDF meta, hidden-text, trailer, visible, visible-samecolor
PNG text-chunks, trailer, stego-lsb, visible
JPEG exif, com, trailer, visible
SVG meta, hidden-text, comment, visible
DOCX core-props, hidden-run, visible, visible-samecolor

visible renders text with foreground ≠ background (OCR-able). visible-samecolor (PDF & DOCX) draws body text in the same color as its background — extractable but camouflaged.

Scoring against a scanner

After generating, scan each dirty file's content and compare against the manifest. Example loop with airs runtime scan (text channels; images may need base64 submission per your gateway):

airs runtime dlp generate --types svg --out ./temp --seed 1 --output json
# for each dirty file, submit its content and check whether DLP fired,
# then reconcile with manifest.json entries[].values

Clean files in clean/ are true-negative controls: a correctly-configured DLP profile should allow them and block the matching dirty files.

Install via CLI
npx skills add https://github.com/cdot65/prisma-airs-cli --skill dlp-test-files
Repository Details
star Stars 3
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator