name: pdf-form-filler description: Use this skill whenever the user wants to fill out a PDF form, analyze PDF form fields, troubleshoot form field mapping, verify field identification accuracy, or export a filled PDF. This includes tax forms (IRS 1040, W-2, W-4, W-9, 1099), government applications, contracts, legal documents, insurance forms, medical forms, and any PDF — with or without AcroForm fields. Also triggers when the user reports wrong field mapping ("it put 1f into 1d"), vision analysis failures, wants to improve field extraction accuracy, or needs to fill a flat/non-fillable PDF via annotations. If the user mentions a fillable PDF, asks to populate form fields, or wants to write text onto a flat PDF form, use this skill.
PDF Form Filler Skill
You are an expert PDF form filling system. You fill any PDF with perfect field accuracy using two distinct modes:
- AcroForm mode — PDFs with fillable fields: radial spatial identity, scanline context, Gemini vision, server-side validation
- Flat PDF mode — PDFs with NO fillable fields: content stream geometry scanning, floor/ceiling detection, FreeText annotations with appearance streams
Quick Reference
CRITICAL FILES (read these first when debugging):
src/hooks/usePdfFormFields.js— Field extraction + spatial labelssrc/utils/scanlineAccumulator.js— Radial identity + bbox scoringsrc/utils/formFieldsCsv.js— CSV generation (13 columns)server/src/routes/ai.ts— Server: analyze-form, fill validation, prompt buildersrc/validator.ts— Server-side validation engine (5 rules)src/form-state.ts— Form state simulator (buildFormState, applyFields)src/eval-runner.ts— Eval assertion runner + intent parser
SCRIPTS (run from PDF-Filler-Skill/scripts/):
verify-field-map.ts— Validate field mapping against ground truthdiagnose-field.ts— Debug why a specific field maps wrongdump-field-context.ts— Dump all spatial context for every fieldeval-fill-accuracy.ts— Run automated fill accuracy evallive-fill.ts— End-to-end pipeline: prompt → intent → validate → fill JSONflat-pdf-fill.py— Flat PDF scanner + annotation filler (no AcroForm required)
TESTS (115 tests, 686 assertions):
__tests__/ground-truth.test.ts— 27 tests: field map integrity__tests__/validator.test.ts— 25 tests: all 5 validation rules__tests__/form-state.test.ts— 17 tests: state management__tests__/e2e-scenarios.test.ts— 46 tests: all 6 eval scenarios + filing status matrix
For detailed architecture, see architecture.md.
For field extraction internals, see field-extraction-guide.md.
For validation rules, see validation-rules.md.
For system prompt construction, see system-prompt-reference.md.
For flat PDF filling technique, see docs/flat-pdf-filling-guide.md.
How Form Filling Works
Pipeline Overview
PDF opened in StickyApp
|
v
[1. EXTRACT] pdfjs-dist: getFieldObjects() + getTextContent()
| -> AcroForm fields (name, type, rect, options, maxLen, tooltip)
| -> Text layout (every character with x,y coordinates)
|
v
[2. ENRICH] Three-layer spatial label matching:
| Layer 1: Heuristic directional (findLabelLeft/Above/Below/Right)
| Layer 2: Radial identity (ray cast + fragment clustering + line joining)
| Layer 3: Scanline accumulator (reading-order context + section detection)
| Cross-validation: bboxAdjacencyScore() confirms or overrides matches
|
v
[3. ANALYZE] Gemini vision (optional, auto-triggers if no cached template):
| -> Inject FIELD_IDs into PDF via pdf-lib
| -> Send modified PDF to Gemini page-by-page
| -> Parse: description, section, lineNumber, semanticType, format,
| calculation, dependencies, confidence
| -> Multi-pass: follow-up for missed fields
| -> Cache results as template in IndexedDB
|
v
[4. PROMPT] Build AI system prompt:
| IF vision template: 13-column CSV (compact, semantic)
| ELSE: XML + Field Lookup Index + Text Search Index
|
v
[5. FILL] AI calls fill_form_fields({ fields: {name: value}, reason })
| -> Server validates: unknown names, read-only, checkbox bool,
| dropdown options
| -> Valid fields emitted via SSE to client
| -> Validation errors returned to AI for retry
|
v
[6. EXPORT] pdf-lib: write values -> flatten form -> overlay annotations -> download
Mode 2: Flat PDF Pipeline (No AcroForm Fields)
For PDFs like the IRS W-9 that have NO fillable fields — just static drawings:
PDF with no AcroForm fields
|
v
[1. SCAN] Parse content stream for geometric primitives:
| -> `re` operator: rectangles (checkboxes, digit boxes, borders)
| -> `q 1 0 0 1 TX TY cm ... S Q`: line segments with transforms
|
v
[2. CLASSIFY] Rectangle size → element type:
| -> 8×8 pt = checkbox
| -> 14.4×24 pt = digit input box
| -> Gaps between digit boxes = dash separators
| -> Pattern 3-2-4 = SSN (XXX-XX-XXXX)
| -> Pattern 2-7 = EIN (XX-XXXXXXX)
|
v
[3. DETECT] Floor/ceiling detection:
| -> Extract text label positions via visitor pattern
| -> For each label, scan DOWN through h_lines to find floor (underline)
| -> For each label, scan UP to find ceiling (line above)
| -> Text sits ON the floor line, not below it
|
v
[4. PLACE] FreeText annotations with appearance streams:
| -> Build /AP dictionary with PDF content stream (BT/Tf/Td/Tj/ET)
| -> Register /Helv font in /Resources
| -> Set /F=4 (print flag), /BS /W=0 (no border)
| -> Without /AP, annotations are INVISIBLE in most viewers
|
v
[5. WRITE] pypdf writes annotations to output PDF
Content Stream Scanning
PDF pages are sequences of drawing commands. The re operator draws rectangles:
417.6 396.0 14.4 24.0 re → digit box at (417.6, 396.0), 14.4w × 24h
73.0 603.7 8.0 8.0 re → checkbox at (73.0, 603.7), 8w × 8h
Line segments use transformation matrices inside save/restore pairs:
q 1 0 0 1 543.35 587.97 cm → translate to (543.35, 587.97)
0 0 m → moveto (0,0) in local coords
32.9 0 l → lineto (32.9, 0) — horizontal line
S → stroke
Q → restore state
Real position: horizontal line at y=587.97 from x=543.35 to x=576.25.
Floor Detection Algorithm
From a label position, scan outward then down to find the underline:
- Start at label (x, y) — e.g., "Exempt payee code (if any)" at (457.6, 592.5)
- Scan down through all horizontal lines where:
line.y < label.y(below the label)line.y > label.y - 40(within 40pt)- Line overlaps horizontally with the label's x ± tolerance
- Pick nearest (highest y = closest floor below)
- Place text with annotation rect bottom AT the floor y-coordinate
Result: text sits precisely on the underline, exactly where a human would write.
Appearance Streams (Critical)
pypdf's FreeText() helper creates annotations WITHOUT /AP (appearance dictionary). Most PDF viewers ignore annotations that lack /AP. You MUST build it manually:
# The appearance stream is a mini PDF content stream
ap_content = "BT\n/Helv 10 Tf\n0 g\n2 1.0 Td\n(5) Tj\nET"
ap_stream["/Type"] = "/XObject"
ap_stream["/Subtype"] = "/Form"
ap_stream["/BBox"] = [0, 0, width, height]
ap_stream["/Resources"] = {"/Font": {"/Helv": <Helvetica font dict>}}
annot["/AP"] = {"/N": ap_stream} # /N = Normal appearance
annot["/DA"] = "/Helv 10 Tf 0 g" # Default appearance fallback
annot["/F"] = 4 # Print flag
annot["/BS"] = {"/W": 0} # No border
The Three Enrichment Layers
Layer 1 — Heuristic Directional (usePdfFormFields.js):
- For each field, search LEFT/ABOVE/BELOW/RIGHT for nearest text
- Scoring:
distance * bandOverlapBonus(bbox Y-band overlap breaks ties) - Fast but fails on dense forms where widget Y-offset (0.008) exceeds half the row spacing
Layer 2 — Radial Identity (scanlineAccumulator.js):
- Each field independently casts rays in 4 cardinal directions
- Collects ALL text fragments within ray band, clusters by Y (tolerance: 0.004)
- Joins fragments into full lines: "Your " + "first " + "name" -> "Your first name"
- LEFT ray: Y band = 0.025, picks line with Y >= field center (accounts for widget offset)
- Cross-validated with
bboxAdjacencyScore(field, text, direction):- Score = bandOverlapRatio * proximityScore (exponential decay)
- If bbox finds different label with score > 0.5, bbox wins
- Why radial works: Each field finds its OWN labels, immune to global sort corruption
Layer 3 — Scanline Accumulator (scanlineAccumulator.js):
- Walk all elements (text + fields) in reading order (top-to-bottom, left-to-right)
- Maintain sliding window of last 10 text items ->
accumulatedContext - Detect section headers via regex (
/^(step|part|section|income|deductions|...)/i) - Detect grid regions: 3+ fields per Y-band, 2+ bands -> grid with column headers
- Detect dense-math regions: right column (x > 0.55), clustered calculation fields
- Output: currentSection, regionType, columnHeader, accumulatedContext
Override Priority: scanline lineNumber ALWAYS overrides heuristic lineNumber. Radial identity is immune to the widget Y-offset bug that corrupts heuristic matching on dense forms.
Reading the Form Field Map
Vision-Analyzed Forms (CSV)
When Gemini has analyzed the form, the AI sees a 13-column CSV:
field_name,type,page,line,description,section,semantic_type,format,calculation,dependencies,value,options,status
field_name— AcroForm name. Use EXACTLY this in fill_form_fields.line— Form line label ("1a", "25d"). Reference in conversation.description— What the field is for. AI-verified by Gemini vision.semantic_type— Data category: currency, ssn, ein, date, name, address, city, state, zip, phone, email, percent, integer, decimal, boolean, enum, textformat— Value pattern: "$#,###.##", "###-##-####", "9 digits no dashes", "MM/DD/YYYY"calculation— Formula: "Line 11 = Line 9 - Line 10", "Sum of Lines 1a through 1z"dependencies— Pipe-delimited field names this depends onoptions— Pipe-delimited values for dropdowns/radiosstatus— filled, empty, empty-required
Heuristic Forms (XML)
Without vision analysis, the AI sees verbose XML with navigation indices:
<field name="f1_14" type="text" status="empty" page="1" line="Your first name">
<primary-label>Your first name and middle initial</primary-label>
<label-left>Your first name</label-left>
<nearby-text>Your first name [left, closest] | Last name [right, near]</nearby-text>
<field-right>f1_15</field-right>
<max-length>25</max-length>
</field>
Plus a Text Search Index (visible text -> nearby fields) and Field Lookup Index (line/label -> field_name).
WARNING: On dense forms (IRS 1040 income section), heuristic line attributes may be offset by 1-2 positions. ALWAYS verify with:
<primary-label>and<label-left>text- Text Search Index (source of truth)
- Neighbor tags (
<field-above>,<field-below>)
Filling Strategy
Step 1: Read the Field Map
- Scan ALL fields before filling anything
- Identify: required fields, calculated fields, checkbox groups, dependencies
- Note format constraints (maxLen, semantic_type, format column)
Step 2: Gather User Information
- Ask upfront for all needed data when possible
- Reference fields by line number + description (not cryptic field_name)
- For calculated fields, note which inputs must be filled first
Step 3: Fill Fields
fill_form_fields({
fields: {
"f1_14": "John",
"f1_15": "Doe",
"f1_16": "123456789", // SSN: 9 digits, no dashes (maxLen=9)
"c1_1": true, // Checkbox: boolean only
},
reason: "Filling identification section with user-provided data"
})
Rules:
- Fill ALL determinable fields in ONE tool call for efficiency
- Checkboxes:
trueorfalse(never "Yes", "On", "X") - Dropdowns: exact value from options column
- SSN: check maxLen — if 9, use "123456789" not "123-45-6789"
- Currency: "$1,234.00" unless maxLen forces shorter
- Calculated fields: compute from inputs, don't leave blank
Step 4: Handle Validation Errors
- Valid fields ARE applied even when some fail
- Parse error messages:
- "Unknown field name" -> check field_name column for exact spelling
- "Checkbox - use true or false" -> send boolean, not string
- "Not in allowed options" -> check options column for exact values
- "Read-only" -> skip this field
- Retry ONLY failed fields (don't re-send successful ones)
Debugging Field Mapping Issues
When a field maps to the wrong location (e.g., "1f content appears in 1d"):
Diagnosis Steps
Check the template: Is
visionAnalyzed: true? If not, the AI is using heuristic labels which are unreliable on dense forms.Run the diagnostic script:
bun run PDF-Filler-Skill/scripts/diagnose-field.ts --field="f1_XX" --pdf=path/to/form.pdfThis shows: heuristic label, radial label, scanline context, bbox scores, and what Gemini thinks.
Check scanline override: In
usePdfFormFields.js, scanline lineNumber should ALWAYS override heuristic:const scanLineNum = extractScanlineLineNumber(scanCtx); if (scanLineNum) { pf.scanlineLineNumber = scanLineNum; pf.lineNumber = scanLineNum; // Always override }Check radial identity: The LEFT ray should find the correct label. If not:
- Widget Y-offset may exceed RADIAL_Y_BAND (0.025)
- Row spacing may be < 0.015 (extremely dense)
- Text fragments may not cluster properly
Force re-analysis: Delete the cached template to trigger fresh Gemini analysis.
Common Failure Modes
| Symptom | Root Cause | Fix |
|---|---|---|
| Line number off by 1-2 | Widget Y-offset on dense forms | Radial identity overrides heuristic |
| Wrong section label | Section header regex missed | Add pattern to section detection |
| Missing field in vision | Gemini truncated response | Multi-pass follow-up catches it |
| Checkbox in wrong group | Spatial proximity ambiguity | identifyCheckboxGroups() adds hints |
| Stale template descriptions | Old vision analysis cached | Auto-invalidation clears bad templates |
| Empty labels on scanned PDF | No text layer in PDF | OCR needed (out of scope) |
Known Form Patterns
IRS 1040 (199 fields, 3 pages)
- Page 1: Filing Status (checkboxes), Name/SSN, Address, Standard Deduction, Dependents
- Page 2: Income (Lines 1-9), Adjustments (10-11), Deductions (12-15)
- Page 3: Tax/Credits (16-24), Payments (25-37), Signature
Field naming: f1_XX (page 1 text), f2_XX (page 2 text), c1_XX (page 1 checkboxes), c2_XX (page 2 checkboxes)
SSN fields: maxLen=9, no dashes. Address fields: standard US format.
IRS W-2, W-4, W-9, 1099
See FORM_PAGE_DESCRIPTIONS in server/src/routes/ai.ts for page-level priors.
Eval Framework
Running Evals
# Verify field map against ground truth
bun run PDF-Filler-Skill/scripts/verify-field-map.ts --form=1040
# Test fill accuracy (fills form, checks results)
bun run PDF-Filler-Skill/scripts/eval-fill-accuracy.ts --form=1040 --scenario=basic
# Dump all field context for debugging
bun run PDF-Filler-Skill/scripts/dump-field-context.ts --pdf=path/to/form.pdf
Adding New Form Support
- Create ground truth file:
evals/ground-truth-{form-name}.json - Add page descriptions to
FORM_PAGE_DESCRIPTIONSinai.ts - Run
verify-field-map.tsto check extraction accuracy - Run
eval-fill-accuracy.tsto verify fill correctness - If accuracy < 95%, tune radial constants or add form-specific heuristics