name: pdf-generation-verification description: Generate professional PDF documents from markdown and systematically verify formatting quality. Use when creating business proposals, reports, or formal documents requiring PDF output with proper tables, hyperlinks, and typography. Includes verification workflow to catch font warnings, table wrapping, bare URLs, and formatting issues before presenting to users.
PDF Generation & Verification
Overview
Generate high-quality PDF documents from markdown using pandoc with XeLaTeX, followed by systematic verification to ensure professional formatting. This skill addresses common PDF generation pitfalls: font warnings, awkward table wrapping, visible bare URLs, and undetected formatting issues.
Key principle: Always verify PDF output systematically before presenting to users. Reactive debugging (user finds issues → fix → repeat) wastes time and erodes confidence.
Workflow: Generate-Verify-Fix Loop
Follow this workflow for all PDF generation tasks:
1. Pre-Generation: Prepare Markdown
Before generating the PDF, ensure the source markdown follows best practices:
Check for emoji characters:
# Search for emoji in markdown
grep -P "[\x{1F300}-\x{1F9FF}]|[\x{2600}-\x{26FF}]|[\x{2700}-\x{27BF}]" INPUT.md
Action if found: Remove emoji, replace with text bullets or standard characters (- instead of ✅)
Check for manual section numbering:
- ❌ Manual numbers in headings (
## 1. Introduction) - ✅ Use semantic levels only (
## Introduction) +--number-sectionsflag
Check for bare URLs:
# Find unlinked URLs
grep -E "^https?://" INPUT.md
Action if found: Convert to markdown links: [Descriptive Text](https://url)
Review table structure:
- Avoid 3+ column tables with long text (causes wrapping)
- Consider hierarchical bullet lists for citations instead of tables
- Use 2-column tables when possible
2. Generate PDF
Use the standard pandoc command for business documents:
pandoc INPUT.md -o OUTPUT.pdf \
--pdf-engine=xelatex \
-V geometry:a4paper \
-V geometry:margin=1in \
-V fontsize=11pt \
-V mainfont="DejaVu Sans" \
-V colorlinks=true \
-V linkcolor=blue \
-V urlcolor=blue \
--toc \
--toc-depth=2 \
2>&1 | tee /tmp/pandoc-warnings.txt
Monitor font warnings in real-time:
- Zero warnings = proceed to verification
- Font warnings present = fix before verification (see Troubleshooting)
3. Systematic Verification
CRITICAL: Do not present PDF to user without running verification.
Run the verification script:
scripts/verify-pdf.sh OUTPUT.pdf
The script checks:
- PDF metadata (pages, file size, producer)
- Bare URLs in rendered text (should be 0)
- Table formatting and alignment
- Text wrapping issues (excessive hyphenation)
- Orphaned bullets or list markers
- Document structure (sections, TOC)
Additional quick check for bullet rendering:
# Check for inline dashes (expect 0 matches)
pdftotext OUTPUT.pdf - | grep -E '^\w.*: -'
Review the verification report:
- ✓ marks indicate passing checks
- ⚠ marks indicate warnings requiring review
- ✗ marks indicate failures requiring fixes
4. Visual Confirmation
Open the PDF in a viewer to manually verify:
# macOS
open -a Skim OUTPUT.pdf
# Linux
evince OUTPUT.pdf
Check these critical elements:
- Bullet lists render as bullets (•), not inline dashes (CRITICAL)
- Tables render without text overflow
- All hyperlinks are clickable
- No visible bare URLs
- Font rendering is consistent
- Page breaks are appropriate
- Table of contents links work
5. Fix Issues and Re-verify
If verification fails or visual inspection reveals issues:
- Identify the root cause (font, table structure, URL formatting)
- Fix the source markdown
- Regenerate PDF (step 2)
- Run verification again (step 3)
- Visually confirm (step 4)
Do not present to user until all checks pass.
Font Selection
Recommended fonts for business documents:
DejaVu Sans (default choice)
- Excellent Unicode coverage
- Professional weight
- Zero warnings for standard characters (→, ×, –)
Helvetica Neue (macOS built-in)
- Classic professional appearance
- Limited Unicode support
Georgia (serif option)
- Formal documents
- Good readability
Font warnings to avoid:
[WARNING] Missing character: There is no ✅ (U+2705)
Solution: Remove emoji from source markdown before generation.
Table Formatting Guidelines
When Tables Work Well
Two-column tables with short text:
| Metric | Value |
|--------|-------|
| Time saved | 3000 hours/year |
| Cost | TBD |
Alignment-focused tables:
| Phase | Timeline | Effort |
|-------|----------|--------|
| 1 | 6 months | 540h |
| 2 | 12 months| 750h |
When to Avoid Tables
Three+ column tables with long text:
- Causes awkward mid-phrase wrapping
- Date columns waste horizontal space
- Description text breaks incorrectly
Better alternative - Hierarchical bullet lists:
**October 2025 Reforms:**
- ["How to Prepare for the 2025 NSW Strata Law Changes"](https://...) (Oct 27, 2025)
- ["NSW Strata Law Changes 2025 – What Owners Need to Know"](https://...) (Oct 15, 2025)
**Insurance Crisis:**
- ["Preparing for 2025 Insurance Renewals"](https://...) (Oct 30, 2024) – *Budget 15-20% minimum*
Advantages:
- No text wrapping issues
- Clean visual hierarchy
- Dates in natural position
- Key insights in italics
Hyperlinks & Citations
Best Practices
Embed all URLs as markdown links:
[Descriptive Title](https://example.com/long-url)
Never use bare URLs:
❌ https://example.com/long-url
Chronological citations format:
- ["Document Title"](https://url) (Oct 27, 2025) – *Key insight or context*
Verification
# Check for bare URLs in generated PDF
pdftotext OUTPUT.pdf - | grep -c "https://"
Expected result: 0 (all URLs should be embedded as clickable links)
Troubleshooting
Bullet Lists Rendering as Inline Text (CRITICAL)
Symptom: Bullet lists appear as inline text with dashes instead of proper bullets (•)
Multi-layer validation frameworks: - HTTP/API layer validation - Schema validation...
Root Cause: LaTeX's default justified text alignment breaks Pandoc-generated bullet lists.
Solutions:
- Always use
\raggedrightin LaTeX preamble (see references/pdf-best-practices.md) - Use canonical build script:
~/.claude/skills/pandoc-pdf-generation/assets/build-pdf.sh - Never create ad-hoc Pandoc commands without LaTeX preamble
Verification:
# Check for inline dashes (expect 0 matches)
pdftotext OUTPUT.pdf - | grep -E '^\w.*: -'
Why This Happens:
- Justified text tries to make lines equal width
- LaTeX may reflow text, breaking list structures
- Lists following paragraphs with colons are especially vulnerable
\raggedrightdisables justification, preserving formatting
Reference: See ~/.claude/skills/pandoc-pdf-generation/references/troubleshooting-pandoc.md → "Bullet Lists Rendering as Inline Text" section for detailed analysis.
Font Warnings During Generation
Symptom:
[WARNING] Missing character: There is no → (U+2192) in font Open Sans
Solutions:
- Switch to DejaVu Sans:
-V mainfont="DejaVu Sans" - Remove problematic characters from source markdown
- Use ASCII alternatives (-> instead of →)
Ugly Table Wrapping in PDF
Symptom: Date column breaks mid-line, topic text has awkward breaks
Solutions:
- Convert table to hierarchical bullet list (recommended)
- Reduce to 2-column table
- Shorten column content
- Use landscape orientation:
-V geometry:landscape
Bare URLs Visible in PDF
Symptom: https://example.com/path appears as plain text
Solutions:
- Convert to markdown link:
[Title](https://example.com/path) - Verify fix:
pdftotext OUTPUT.pdf - | grep "https://" - Regenerate PDF
Large File Size
Symptom: PDF >1MB for text-only document
Solutions:
- Check for accidentally embedded images
- Compress with Ghostscript:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=compressed.pdf OUTPUT.pdf
Resources
scripts/verify-pdf.sh
Comprehensive PDF verification script that checks:
- PDF metadata (pages, size, producer)
- Bare URL presence
- Table formatting issues
- Text wrapping problems
- Orphaned bullets
- Document structure
Usage:
scripts/verify-pdf.sh OUTPUT.pdf
references/pdf-best-practices.md
Detailed reference documentation covering:
- Font selection criteria and recommendations
- Table formatting patterns and anti-patterns
- Hyperlink best practices
- Pandoc command templates and options
- Common issues with detailed solutions
- Tool installation instructions
Load when:
- Troubleshooting complex formatting issues
- Selecting fonts for specific requirements
- Optimizing pandoc command for specific document types
Quick Reference
Standard Business Document Command
pandoc INPUT.md -o OUTPUT.pdf \
--pdf-engine=xelatex \
-V geometry:a4paper \
-V geometry:margin=1in \
-V fontsize=11pt \
-V mainfont="DejaVu Sans" \
-V colorlinks=true \
-V linkcolor=blue \
-V urlcolor=blue \
--toc \
--toc-depth=2
Verification Command
scripts/verify-pdf.sh OUTPUT.pdf
Quick Checks
# Font warnings during generation
pandoc INPUT.md -o OUTPUT.pdf ... 2>&1 | grep WARNING
# Bare URLs in PDF
pdftotext OUTPUT.pdf - | grep -c "https://"
# PDF metadata
pdfinfo OUTPUT.pdf | grep -E "Pages|File size"