name: pandoc description: > Universal document converter — format conversion (Markdown, HTML, LaTeX, docx, PDF, EPUB, presentations), Lua filters, templates, citations, YAML metadata, and document pipelines. Use when converting between document formats, generating PDFs, extracting content, or programmatically transforming documents.
Pandoc Skill
Pandoc is a universal document converter. It reads many input formats, transforms them through an AST, and writes to many output formats. Agents use it primarily for: format conversion, PDF generation, content extraction, and document pipelines with Lua filters.
Version: targets pandoc 3.x (tested on 3.6). Features like --sandbox,
--variable-json, and the modern Table AST require pandoc ≥ 3.0.
Official docs: https://pandoc.org/MANUAL.html
Decision Guide
| I want to... | Go to | Key command |
|---|---|---|
| Convert Markdown to HTML | §3 HTML Output | pandoc doc.md -s -o doc.html |
| Convert Markdown to PDF | §2 PDF Generation | pandoc doc.md -o doc.pdf |
| Convert Word docx to Markdown | §4 Docx Output | pandoc lua scripts/pandoc-docx-to-md.lua file.docx |
| Convert HTML to Markdown | §1 Format Matrix | pandoc -f html -t markdown page.html -o page.md |
| Convert Markdown to Word | §4 Docx Output | pandoc doc.md -o doc.docx |
| Merge chapters into a book | §8 Multi-File & Jupyter | pandoc ch*.md metadata.yaml -s --toc -o book.html |
| Convert Jupyter notebook | §8 Multi-File & Jupyter | pandoc analysis.ipynb -s -o report.html |
| Generate PDF without LaTeX | §9 Typst Output | pandoc doc.md --pdf-engine=typst -o doc.pdf |
| Create a presentation | §5 Presentations | pandoc slides.md -t revealjs -s -o slides.html |
| Create an EPUB | §7 EPUB Generation | pandoc ch1.md ch2.md -o book.epub |
| Write a Lua filter | §6 Lua Filters | pandoc --lua-filter=my.lua doc.md -o out.html |
| Add citations/bibliography | §12 Citations | pandoc doc.md --citeproc --bibliography=refs.bib -o out.html |
| Process untrusted input | §19 Sandbox | pandoc --sandbox input.md -o output.html |
| Customize PDF/docx styles | §11 Templates | --reference-doc=template.docx or --template=custom.tex |
| Extract content from docx | §13 Extraction | pandoc input.docx --extract-media=./media -t markdown |
| Batch convert many files | §15 Pipelines | pandoc lua scripts/pandoc-batch-convert.lua md html -s |
| Use reusable defaults | §14 Defaults | pandoc -d references/defaults/pdf-report.yaml doc.md |
| Debug a conversion problem | §20 Troubleshooting | Check exit codes, inspect AST, verify filters |
| Diagnose pandoc install | §18 Helper Scripts | pandoc lua scripts/pandoc-doctor.lua |
| Run pandoc Lua one-liner | below | pandoc lua -e 'print(PANDOC_VERSION)' |
pandoc lua One-Liners
# Print pandoc version
pandoc lua -e 'print(tostring(PANDOC_VERSION))'
# Parse markdown to JSON AST
pandoc lua -e 'io.stdout:write(pandoc.write(pandoc.read(io.stdin:read("*a")), "json"))'
# List available output formats
pandoc lua -e 'for f in pairs(pandoc.writers) do print(f) end'
# Convert a string between formats
pandoc lua -e 'local d=pandoc.read("**hello**","markdown"); print(pandoc.write(d,"html"))'
# Get current date for metadata
pandoc lua -e 'print(os.date("%Y-%m-%d"))'
Quick Reference: Essential Commands
# Markdown → HTML (standalone page)
pandoc input.md -s -o output.html
# Markdown → PDF (requires LaTeX engine)
pandoc input.md -o output.pdf
# Markdown → Word docx
pandoc input.md -o output.docx
# Markdown → LaTeX
pandoc input.md -o output.tex
# HTML → Markdown
pandoc -f html -t markdown input.html -o output.md
# URL → Markdown (fetch and convert)
pandoc -f html -t markdown https://example.com -o page.md
# With table of contents and numbered sections
pandoc input.md -s --toc -N -o output.html
# Multiple input files → single output
pandoc ch1.md ch2.md ch3.md -s -o book.html
# Pipe: stdout to pandoc
echo "# Hello" | pandoc -f markdown -t html
# Check supported formats
pandoc --list-input-formats
pandoc --list-output-formats
1. Format Conversion Matrix
Input Formats (most useful for agents)
| Format | Flag | Notes |
|---|---|---|
| Markdown (pandoc) | markdown |
Default input; richest extensions |
| GFM | gfm |
GitHub-Flavored Markdown |
| CommonMark | commonmark |
Strict spec |
| HTML | html |
Parses web pages |
| LaTeX | latex |
Requires valid LaTeX |
| Docx | docx |
MS Word |
| ODT | odt |
LibreOffice |
| EPUB | epub |
E-books |
| Org | org |
Emacs Org mode |
| RST | rst |
reStructuredText |
| Jupyter | ipynb |
Notebooks |
| CSV/TSV | csv / tsv |
Tabular data |
| Typst | typst |
Modern typesetting |
| Djot | djot |
Lightweight markup |
Output Formats (most useful for agents)
| Format | Flag | Notes |
|---|---|---|
| HTML5 | html |
Default output |
pdf |
Via LaTeX/Typst/HTML engine | |
| LaTeX | latex |
Intermediate for PDF |
| Docx | docx |
MS Word |
| PPTX | pptx |
PowerPoint presentations |
| EPUB3 | epub |
E-books |
| Markdown | markdown |
Pandoc's extended MD |
| GFM | gfm |
For GitHub |
| Beamer | beamer |
LaTeX slide shows |
| reveal.js | revealjs |
HTML slide shows |
| Typst | typst |
Modern typesetting |
| Plain text | plain |
Stripped formatting |
| ANSI | ansi |
Terminal output with colors |
| Man page | man |
Unix manual pages |
Extension Syntax
Enable/disable per-format extensions with +EXTENSION / -EXTENSION:
# Markdown with emoji support, without footnotes
pandoc -f markdown+emoji-footnotes input.md
# GFM with raw HTML
pandoc -f gfm+raw_html input.md
2. PDF Generation
Engines
| Engine | Flag | When to Use |
|---|---|---|
| pdflatex | --pdf-engine=pdflatex |
Default; fastest, Latin scripts |
| xelatex | --pdf-engine=xelatex |
CJK, Unicode fonts, system fonts |
| lualatex | --pdf-engine=lualatex |
Lua scripting, font fallbacks |
| typst | --pdf-engine=typst |
Fast, modern, no TeX needed |
| weasyprint | --pdf-engine=weasyprint |
HTML/CSS → PDF, no TeX |
| context | --pdf-engine=context |
ConTeXt engine |
Common PDF Recipes
# Basic PDF
pandoc doc.md -o doc.pdf
# CJK / Unicode (use xelatex or lualatex)
pandoc doc.md --pdf-engine=xelatex -V mainfont="Noto Sans CJK SC" -o doc.pdf
# Custom margins
pandoc doc.md -V geometry:margin=1in -o doc.pdf
# Landscape, two-column
pandoc doc.md -V geometry:margin=1in -V classoption=twocolumn -V classoption=landscape -o doc.pdf
# Numbered sections + TOC
pandoc doc.md -N --toc --toc-depth=3 -o doc.pdf
# Via Typst (no LaTeX needed)
pandoc doc.md --pdf-engine=typst -o doc.pdf
# Via HTML (good for web content)
pandoc doc.md -t html --css=style.css --pdf-engine=weasyprint -o doc.pdf
# Debug: output intermediate LaTeX
pandoc doc.md -s -o debug.tex
Key LaTeX Variables
| Variable | Example | Effect |
|---|---|---|
geometry |
margin=1in |
Page margins |
papersize |
a4, letter |
Paper size |
fontsize |
12pt |
Body font size |
documentclass |
article, book, report |
LaTeX class |
mainfont |
"Noto Sans" |
Body font (xelatex/lualatex) |
monofont |
"JetBrains Mono" |
Code font |
linestretch |
1.5 |
Line spacing |
colorlinks |
true |
Colored links instead of boxes |
linkcolor |
blue |
Link color |
toc |
(flag) | Include TOC |
toc-depth |
3 |
TOC heading depth |
numbersections |
(flag) | Number sections |
3. HTML Output
# Standalone HTML with default styling
pandoc doc.md -s -o doc.html
# Self-contained HTML (inline CSS/JS/images)
pandoc doc.md -s --embed-resources -o doc.html
# With custom CSS
pandoc doc.md -s --css=style.css -o doc.html
# With syntax highlighting style
pandoc doc.md -s --highlight-style=kate -o doc.html
# Section divs (wrap in <section> tags)
pandoc doc.md -s --section-divs -o doc.html
# Math with KaTeX
pandoc doc.md -s --katex -o doc.html
# Math with MathJax
pandoc doc.md -s --mathjax -o doc.html
# Number sections
pandoc doc.md -s -N -o doc.html
# TOC with custom depth
pandoc doc.md -s --toc --toc-depth=2 -o doc.html
HTML Variables
| Variable | Effect |
|---|---|
mainfont |
CSS font-family on html |
fontsize |
Base font-size |
monofont |
Code font-family |
monobackgroundcolor |
Code background |
linkcolor |
Link color |
maxwidth |
Max content width (default 36em) |
margin-left/right/top/bottom |
Body padding |
4. Docx (Word) Output
# Basic conversion
pandoc doc.md -o doc.docx
# With reference doc (styles, margins, fonts)
pandoc doc.md --reference-doc=template.docx -o doc.docx
# Generate default reference doc for customization
pandoc -o custom-reference.docx --print-default-data-file reference.docx
# Extract media (images) from docx input
pandoc input.docx --extract-media=./images -t markdown -o output.md
# Track changes: accept, reject, or keep all
pandoc input.docx --track-changes=accept -o output.md
Reference Doc Styles
When customizing reference.docx, modify these styles:
Normal, Body Text, Title, Subtitle, Author, Date, Heading 1–9,
Block Text, Source Code, Footnote Text, Caption, Table, Hyperlink.
5. Presentations
reveal.js (HTML Slides)
pandoc slides.md -t revealjs -s -o slides.html
# With theme and custom options
pandoc slides.md -t revealjs -s -V revealjs-url=https://unpkg.com/reveal.js@^5 \
-V theme=moon -o slides.html
# Self-contained (offline use)
pandoc slides.md -t revealjs -s --embed-resources -o slides.html
# Incremental lists
pandoc slides.md -t revealjs -s -i -o slides.html
Beamer (PDF Slides)
pandoc slides.md -t beamer -o slides.pdf
# With theme
pandoc slides.md -t beamer -V theme=metropolis -o slides.pdf
# Aspect ratio (default 4:3)
pandoc slides.md -t beamer -V aspectratio=169 -o slides.pdf
Beamer Themes
| Theme | Style |
|---|---|
default |
Clean, minimal |
metropolis |
Modern, flat (requires metropolis package) |
Madrid |
Classic academic |
CambridgeUS |
Formal with colors |
Warsaw |
Sidebar navigation |
Boadilla |
Compact footline |
Singapore |
Clean with mini frames |
Set with -V theme=NAME. Use -V colortheme=NAME for color schemes (crane, dolphin, orchid, whale, etc.).
Slide Syntax
Slides are separated by headings. The --slide-level option controls which heading level creates slides:
# Section Title <!-- creates a section divider -->
## Slide Title <!-- creates a slide (if slide-level=2) -->
Content here.
## Next Slide
More content.
Use horizontal rules --- to force slide breaks when needed.
PowerPoint (PPTX)
pandoc slides.md -o slides.pptx
# With reference template
pandoc slides.md --reference-doc=template.pptx -o slides.pptx
6. Lua Filters
Lua filters modify the pandoc AST between parsing and writing. No external deps — Lua 5.4 is built into pandoc.
Basic Filter Structure
-- Save as myfilter.lua, use with: pandoc --lua-filter=myfilter.lua
return {
Str = function(elem)
-- Transform inline text elements
return elem
end,
Para = function(elem)
-- Transform paragraph blocks
return elem
end,
}
Return Values
nil→ element unchanged- New element → replaces original (must be same type)
- List of elements → spliced in place (empty list = delete)
Global Variables Available
| Variable | Description |
|---|---|
FORMAT |
Output format name (html, latex, etc.) |
PANDOC_VERSION |
Pandoc version as table {major, minor, patch} |
PANDOC_READER_OPTIONS |
Reader options |
PANDOC_WRITER_OPTIONS |
Writer options |
PANDOC_SCRIPT_FILE |
Path to current filter file |
Common Filter Patterns
-- 1. Format-conditional logic
if FORMAT:match 'html' then
function Image(elem)
elem.attributes.style = 'max-width:100%;'
return elem
end
end
-- 2. Replace text with inline elements
return {
Str = function(elem)
if elem.text == "{{date}}" then
return pandoc.Str(os.date("%Y-%m-%d"))
end
end,
}
-- 3. Remove all links, keep text
return {
Link = function(el)
return el.content -- unwrap link, keep inline content
end,
}
-- 4. Add class to all code blocks
return {
CodeBlock = function(el)
table.insert(el.classes, "highlight")
return el
end,
}
-- 5. Extract all headings (metadata → document)
function Pandoc(doc)
local headings = {}
for _, el in ipairs(doc.blocks) do
if el.t == "Header" then
table.insert(headings, pandoc.Para(el.content))
end
end
return pandoc.Pandoc(headings, doc.meta)
end
-- 6. Word count
local words = 0
return {
Str = function(el)
if el.text:match("%P") then words = words + 1 end
end,
Pandoc = function(doc)
doc.blocks:walk({
Str = function(el)
if el.text:match("%P") then words = words + 1 end
end
})
print(words .. " words in body")
os.exit(0)
end,
}
-- 7. Top-down traversal (process parents before children)
local filter = {
traverse = 'topdown',
Div = function(el)
if el.classes[1] == 'skip' then
return el, false -- skip children
end
end,
}
return filter
Traversal Order
Default is typewise: all Inlines → all Blocks → Meta → Pandoc.
Set traverse = 'topdown' for depth-first parent-before-children.
Useful Lua Modules (built-in)
| Module | Use |
|---|---|
pandoc.utils |
stringify, type, sha1, to_simple_table |
pandoc.text |
Unicode-aware upper, lower, len |
pandoc.mediabag |
insert, fetch, list — manage binary assets |
pandoc.system |
with_temporary_directory, with_working_directory |
pandoc.layout |
Pretty-printing, text wrapping |
pandoc.path |
Path manipulation (directory, filename, join) |
lpeg |
Parsing Expression Grammars |
re |
Regex-like pattern matching via lpeg |
Running External Commands in Lua
local output = pandoc.pipe("sed", {"-e", "s/foo/bar/"}, input_string)
Mediabag Pattern (Generated Images)
Use pandoc.pipe + pandoc.mediabag to generate images from code blocks
and embed them in the output document:
function CodeBlock(el)
if el.classes[1] ~= "diagram" then return el end
-- Run external tool to generate image
local img = pandoc.pipe("dot", {"-Tsvg"}, el.text)
-- Store in mediabag (binary assets attached to document)
local fname = pandoc.utils.sha1(img) .. ".svg"
pandoc.mediabag.insert(fname, "image/svg+xml", img)
-- Replace code block with image reference
return pandoc.Para({pandoc.Image({}, fname)})
end
For textual output formats, use --extract-media=DIR to write mediabag files to disk.
For HTML, use --embed-resources to inline them.
Multiple Filters
Filters run in command-line order. Lua filters (--lua-filter) are faster than JSON filters (--filter):
pandoc doc.md --lua-filter=filter1.lua --lua-filter=filter2.lua -o out.html
Programmatic Lua API (No CLI needed)
Use pandoc.read() and pandoc.write() to convert entirely in Lua — ideal for agents
that need document processing without shell commands:
-- Convert markdown to HTML programmatically
local doc = pandoc.read("# Hello\n\nSome **bold** text.", "markdown")
local html = pandoc.write(doc, "html")
print(html)
-- Output: <h1 id="hello">Hello</h1>
-- <p>Some <strong>bold</strong> text.</p>
Build Documents from Scratch
local doc = pandoc.Pandoc({
pandoc.Header(1, {pandoc.Str "Title"}),
pandoc.Para({
pandoc.Str "Hello ",
pandoc.Strong {pandoc.Str "world"},
pandoc.Str "!",
}),
})
print(pandoc.write(doc, "html"))
-- <h1>Title</h1>
-- <p>Hello <strong>world</strong>!</p>
Read File → Transform → Write
-- Read a file, apply a filter, write output
local f = io.open("input.md", "r")
local doc = pandoc.read(f:read("*a"), "markdown")
f:close()
-- Apply a filter inline
doc = doc:walk({
Strong = function(el) return pandoc.SmallCaps(el.content) end,
})
-- Write to HTML
local out = io.open("output.html", "w")
out:write(pandoc.write(doc, "html"))
out:close()
Convert Between Any Formats
-- HTML → Markdown
local html_doc = pandoc.read("<h1>Title</h1><p>Text</p>", "html")
print(pandoc.write(html_doc, "markdown"))
-- Docx → Plain text (via pandoc.pipe)
local docx_content = pandoc.pipe("pandoc", {"-t", "json", "input.docx"}, "")
local doc = pandoc.read(docx_content, "json")
print(pandoc.write(doc, "plain"))
Access and Modify Metadata
local doc = pandoc.read("---\ntitle: My Doc\n---\n\nContent.", "markdown")
-- Read metadata
print(pandoc.utils.stringify(doc.meta.title)) -- "My Doc"
-- Set metadata
doc.meta.date = os.date("%Y-%m-%d")
doc.meta.author = pandoc.Inlines "Agent"
7. EPUB Generation
# Basic EPUB
pandoc book.md -o book.epub
# Multiple chapters
pandoc ch1.md ch2.md ch3.md metadata.yaml -o book.epub
# With cover image
pandoc book.md --epub-cover-image=cover.png -o book.epub
# With custom CSS
pandoc book.md --css=epub.css -o book.epub
# Embed fonts
pandoc book.md --epub-embed-font=DejaVuSans.ttf -o book.epub
# Split at specific heading level (default: 1)
pandoc book.md --split-level=2 -o book.epub
# With embedded metadata
pandoc book.md --epub-metadata=meta.xml -o book.epub
EPUB YAML Metadata
---
title: "My Book"
author: "Author Name"
rights: "Creative Commons"
language: en
cover-image: cover.png
stylesheet: epub.css
---
EPUB Metadata XML
<dc:rights>Creative Commons</dc:rights>
<dc:language>en</dc:language>
8. Multi-File Documents & Jupyter
Merging Chapters into a Book
Combine multiple Markdown files with a shared metadata file:
# Basic: concatenate and convert
pandoc ch1.md ch2.md ch3.md -s -o book.html
# With metadata file for title, author, etc.
pandoc ch1.md ch2.md ch3.md metadata.yaml -s --toc -N -o book.pdf
# With --file-scope (isolates footnotes/links per chapter)
pandoc ch??-*.md --file-scope -s --toc -o book.epub
metadata.yaml for Books
---
title: "My Book"
author: "Author Name"
date: "2024-01-15"
rights: "© 2024 Author"
abstract: |
A book about important things.
keywords: [pandoc, markdown, books]
---
Directory-Based Conversion
# Convert all chapters in order (shell globbing)
pandoc chapters/*.md -s --toc -o book.html
# Using a Lua script for sorted file discovery
pandoc lua -e '
local sys = require "pandoc.system"
local files = {}
for _, f in ipairs(sys.list_directory(".")) do
if f:match("^%d+.*.md$") then files[#files+1] = f end
end
table.sort(files)
for _, f in ipairs(files) do print(f) end
'
Jupyter Notebook Conversion
# Notebook → HTML (self-contained with outputs)
pandoc analysis.ipynb -s --embed-resources -o report.html
# Notebook → Markdown (extract images)
pandoc analysis.ipynb -t markdown --wrap=none --extract-media=./figs -o analysis.md
# Notebook → PDF (requires LaTeX)
pandoc analysis.ipynb -o analysis.pdf
# Control cell output inclusion
pandoc analysis.ipynb --ipynb-output=all -o full.html # keep all outputs
pandoc analysis.ipynb --ipynb-output=none -t markdown -o clean.md # strip outputs
pandoc analysis.ipynb --ipynb-output=best -o report.html # default: pick best format
CSV/TSV Table Input
# CSV → Markdown table
pandoc data.csv -t markdown
# CSV → HTML table
pandoc data.csv -s -o table.html
# TSV (tab-separated)
pandoc data.tsv -t markdown
9. Typst Output
Typst is a modern, fast alternative to LaTeX for PDF generation. No TeX installation needed.
# Markdown → Typst
pandoc doc.md -t typst -o doc.typ
# Markdown → PDF via Typst
pandoc doc.md --pdf-engine=typst -o doc.pdf
Typst Variables
| Variable | Example | Effect |
|---|---|---|
papersize |
a4, us-letter |
Paper size |
margin |
{x: 2cm, y: 3cm} |
Page margins (dict) |
mainfont |
"Noto Sans" |
Body font |
fontsize |
12pt |
Body font size |
codefont |
"Fira Code" |
Code font |
mathfont |
"New Computer Modern" |
Math font |
columns |
2 |
Number of columns |
linestretch |
1.5 |
Line spacing |
section-numbering |
1.A.1 |
Section numbering schema |
page-numbering |
1 or i |
Page numbering schema |
Typst is significantly faster than LaTeX for large documents and has better error messages.
10. YAML Metadata
---
title: "Document Title"
author: "Author Name"
date: "2024-01-15"
abstract: |
This is the abstract.
It can span multiple lines.
keywords: [pandoc, markdown, conversion]
subtitle: "A Subtitle"
lang: en
geometry: margin=1in
fontsize: 12pt
toc: true
toc-depth: 3
numbersections: true
colorlinks: true
linkcolor: blue
header-includes: |
\usepackage{cleveref}
---
# Introduction
Content starts here.
Setting Metadata from CLI
# String value
pandoc doc.md -M title="My Title" -o out.html
# Boolean
pandoc doc.md -M toc=true -o out.html
# From file
pandoc doc.md --metadata-file=meta.yaml -o out.html
Template Variables vs Metadata
-V/--variable→ template variables only (strings, not parsed as Markdown)-M/--metadata→ metadata (affects document, accessible in filters, parsed as Markdown)
11. Templates
Templates control standalone output structure. Variables are interpolated with $var$ syntax.
# View default template for a format
pandoc -D html
pandoc -D latex
pandoc -D revealjs
# Use custom template
pandoc doc.md --template=mytemplate.html -o out.html
Template Syntax
$if(title)$
<h1>$title$</h1>
$endif$
$for(author)$
<p>$author$</p>
$sep$,
$endfor$
${body} <!-- variable interpolation -->
$if(foo)$ <!-- conditional -->
$else$
$endif$
$for(items)$ <!-- loop -->
$sep$, $endfor$ <!-- separator -->
Custom Templates Location
Place in $HOME/.local/share/pandoc/templates/default.FORMAT to override defaults.
12. Citations & Bibliography
# Process citations with citeproc
pandoc doc.md --citeproc -o out.html
# Specify bibliography file
pandoc doc.md --citeproc --bibliography=refs.bib -o out.html
# Specify CSL style
pandoc doc.md --citeproc --csl=ieee.csl -o out.html
Citation Syntax (in Markdown)
See @smith2020 for details.
Multiple: [@smith2020; @jones2019].
Suppress author: [-@smith2020].
YAML for Bibliography
---
bibliography: refs.bib
csl: https://www.zotero.org/styles/ieee
nocite: "@*" # include all entries in bibliography
---
13. Content Extraction & Processing
# Extract images/media from binary formats
pandoc input.docx --extract-media=./media -t markdown -o output.md
# Convert only specific parts via Lua filter
pandoc doc.md --lua-filter=extract-headings.lua -t markdown
# Read URL, convert to Markdown
pandoc -f html -t markdown https://example.com -o page.md
# JSON AST (for piping to external filters)
pandoc doc.md -t json | ./my-filter.py | pandoc -f json -t html
# Get word count via Lua filter
pandoc doc.md --lua-filter=wordcount.lua
14. Defaults Files
Reusable YAML config files instead of long CLI commands:
# defaults/pdf.yaml
# Note: defaults files do NOT expand globs like chapters/*.md.
# List files explicitly, or pass globs on the command line.
input-files:
- chapters/01-intro.md
- chapters/02-usage.md
from: markdown
to: pdf
standalone: true
toc: true
toc-depth: 3
number-sections: true
variables:
geometry: margin=1in
fontsize: 12pt
documentclass: book
pdf-engine: xelatex
# By path
pandoc --defaults=defaults/pdf.yaml
# Or, after installing in $HOME/.local/share/pandoc/defaults/pdf.yaml
pandoc -d pdf
# If you need shell glob expansion, pass files on the command line instead
pandoc chapters/*.md -d defaults/pdf.yaml -o book.pdf
Place in $HOME/.local/share/pandoc/defaults/ for global access by name.
15. Common Agent Pipelines
Web Page → Clean Markdown
pandoc -f html -t markdown --wrap=none \
https://example.com -o page.md
# Or use the helper script
pandoc lua scripts/pandoc-web-to-md.lua https://example.com page.md
Markdown → Professional PDF Report
# With this skill's defaults file (from the skill directory)
pandoc -d references/defaults/pdf-report.yaml report.md
# Or install it globally and use by name
pandoc -d pdf-report report.md
# Or fully specified
pandoc report.md -o report.pdf \
--pdf-engine=xelatex \
-V geometry:margin=1in \
-V fontsize=11pt \
-V colorlinks=true \
--toc --toc-depth=3 \
-N \
--highlight-style=kate
# Or use the helper script (auto-detects a working PDF engine)
pandoc lua scripts/pandoc-md-to-pdf.lua report.md
Docx → Markdown (with images)
# One-liner
pandoc input.docx --extract-media=./images -t markdown -o output.md
# Or use the helper script (auto-extracts to ./media)
pandoc lua scripts/pandoc-docx-to-md.lua input.docx
Docx → PDF (re-format a Word doc)
pandoc input.docx -o output.pdf
# With custom reference doc for styling
pandoc input.docx --reference-doc=template.docx -o output.pdf
Notebook → HTML Report
pandoc analysis.ipynb \
-s --embed-resources \
--highlight-style=tango \
-o report.html
# Strip outputs, get clean Markdown
pandoc analysis.ipynb --ipynb-output=none -t markdown -o analysis.md
Batch Convert Directory (cross-platform)
# Using the Lua helper script (works on Windows too)
pandoc lua scripts/pandoc-batch-convert.lua md html -s -M title="Doc"
pandoc lua scripts/pandoc-batch-convert.lua md pdf --pdf-engine=xelatex
Combine Chapters into Book
pandoc ch1.md ch2.md ch3.md metadata.yaml \
-s --toc -N \
--pdf-engine=xelatex \
-o book.pdf
# Or as EPUB
pandoc ch1.md ch2.md ch3.md metadata.yaml \
--epub-cover-image=cover.png \
-o book.epub
Markdown → Standalone HTML Report
# With this skill's defaults file (from the skill directory)
pandoc -d references/defaults/html-report.yaml doc.md
# Or install it globally and use by name
pandoc -d html-report doc.md
AST Inspection
# JSON AST for debugging
pandoc doc.md -t json | jq '.blocks[0]'
# Native Haskell AST (human-readable)
pandoc doc.md -t native
16. Key CLI Options Reference
| Option | Effect |
|---|---|
-f / --from |
Input format |
-t / --to |
Output format |
-o / --output |
Output file |
-s / --standalone |
Full document (header + footer) |
--toc |
Table of contents |
--toc-depth=N |
TOC depth (default 3) |
-N / --number-sections |
Number sections |
--lua-filter=FILE |
Apply Lua filter |
--filter=PROG |
Apply JSON filter (external) |
-M KEY=VAL |
Set metadata |
-V KEY=VAL |
Set template variable |
--metadata-file=FILE |
Load metadata from YAML |
--template=FILE |
Custom template |
--css=URL |
Link CSS (HTML/EPUB) |
--embed-resources |
Inline all resources (HTML) |
--extract-media=DIR |
Extract media to directory |
--wrap=auto|none|preserve |
Text wrapping |
--columns=N |
Line width (default 72) |
--shift-heading-level-by=N |
Adjust heading levels |
--reference-doc=FILE |
Style reference (docx/odt/pptx) |
--pdf-engine=PROG |
PDF engine |
--citeproc |
Process citations |
--bibliography=FILE |
Bibliography file |
-d / --defaults=FILE |
Defaults file |
--sandbox |
Restrict file access |
--highlight-style=STYLE |
Highlighting style or none |
-H / --include-in-header |
Include raw in header |
-B / --include-before-body |
Include before body |
-A / --include-after-body |
Include after body |
--resource-path=PATH |
Search path for resources |
--file-scope |
Parse files individually |
--fail-if-warnings |
Exit on warnings |
17. Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | IO error |
| 3 | Fail on warnings |
| 5 | Template error |
| 6 | Option error |
| 21 | Unknown reader |
| 22 | Unknown writer |
| 43 | PDF error |
| 47 | PDF engine not found |
| 83 | Filter error |
| 84 | Lua error |
18. Helper Scripts (Cross-Platform)
All scripts are written in Lua and run via pandoc lua — works on Linux, macOS, and Windows. They call pandoc with argv lists via pandoc.pipe (no shell quoting or os.execute).
# Batch convert all .md files in current directory to .html
pandoc lua scripts/pandoc-batch-convert.lua md html -s --toc
# Markdown to polished PDF (auto-detects a working PDF engine)
pandoc lua scripts/pandoc-md-to-pdf.lua report.md
pandoc lua scripts/pandoc-md-to-pdf.lua report.md draft.pdf --draft
# Word docx to Markdown (with image extraction)
pandoc lua scripts/pandoc-docx-to-md.lua input.docx
pandoc lua scripts/pandoc-docx-to-md.lua input.docx output.md
# Fetch URL and convert to Markdown
pandoc lua scripts/pandoc-web-to-md.lua https://example.com page.md
# Word count
pandoc --lua-filter=scripts/pandoc-wordcount.lua doc.md
# Environment diagnostics (versions, formats, PDF engine smoke tests)
pandoc lua scripts/pandoc-doctor.lua
Lua Filter Library
See references/filters/ directory:
strong-to-smallcaps.lua— Convert bold to SMALL CAPScenter-images.lua— Center images in HTML and image-only LaTeX/Beamer paragraphsmeta-vars.lua— Replace%var%placeholders with metadata valuesextract-headings.lua— Extract nested heading outline as real links/list itemsadd-meta-date.lua— Auto-set date metadataremove-empty-paras.lua— Strip empty paragraphsadmonitions.lua— Render:::note/:::warning/:::tip/:::dangerdiv blocks (styled HTML, portable labels elsewhere)autolink.lua— Convert bare URLs into clickable linksdiagram-filter.lua— Generate Graphviz images fromdotcode blocks via pandoc.pipe + mediabag
Reusable Defaults Files
See references/defaults/ directory:
pdf-report.yaml— Polished PDF report with TOC, numbered sections, XeLaTeXhtml-report.yaml— Standalone HTML report with TOC, section divs, styled
Use by path from the skill directory:
pandoc -d references/defaults/html-report.yaml doc.md
Or copy to Pandoc's user defaults dir for global use by name:
$HOME/.local/share/pandoc/defaults/html-report.yaml
$HOME/.local/share/pandoc/defaults/pdf-report.yaml
19. Sandbox Mode
When processing untrusted input, use --sandbox to restrict file access:
# Safe: no file access beyond specified inputs
pandoc --sandbox untrusted.md -o safe.html
# Warning: sandbox does NOT restrict filters or PDF engines
pandoc --sandbox --lua-filter=my-filter.lua untrusted.md -o out.html # filter not sandboxed
Limitations:
- Binary formats (docx, odt) may fail in sandbox mode unless pandoc was compiled with
embed_data_files - Filters and PDF engines run outside sandbox
- Best for: converting untrusted Markdown/HTML to safe output
20. Troubleshooting
--shift-heading-level-by=-1 produces no headings
With shift -1, a level-1 heading (# H1) becomes the document title (level 0), not a heading.
Use --shift-heading-level-by=0 or start your document at ## H2.
# Wrong: H1 disappears into title
pandoc doc.md --shift-heading-level-by=-1 -t markdown
# Right: use H2+ in source, or don't shift
pandoc doc.md -t markdown
Blank PDF from CJK / non-Latin text
Default pdflatex can't handle CJK. Use xelatex or lualatex with a font that has the glyphs:
pandoc doc.md --pdf-engine=xelatex -V mainfont='Noto Sans CJK SC' -o doc.pdf
Images missing in output
- For HTML: add
--embed-resourcesor--resource-path=. - For docx/odt input: use
--extract-media=DIRto write embedded images to disk - Check image paths are relative to the working directory, not the source file
WARNING: This document format requires a nonempty <title> element
Add a title via YAML metadata or -M title="My Title":
pandoc doc.md -M title="Report" -s -o doc.html
Lua filter doesn't modify elements
Filters must return the modified element. This does nothing:
function Str(el)
el.text = string.upper(el.text) -- modifies but doesn't return
end
Correct:
function Str(el)
el.text = string.upper(el.text)
return el -- must return
end
string.upper doesn't work on Unicode
Lua's string library is byte-based. Use pandoc.text for Unicode-aware operations:
local text = require 'pandoc.text'
result = text.upper("café") -- works correctly
PDF engine not found
Install a PDF engine or specify the full path:
pandoc doc.md --pdf-engine=/usr/local/bin/xelatex -o doc.pdf
Or use Typst (no LaTeX needed): pandoc doc.md --pdf-engine=typst -o doc.pdf
Related Skills
- mdbook: Building documentation books from Markdown
- presenterm: Terminal-based presentations
- d2: Diagram syntax used with pandoc preprocessing
- jq: Processing pandoc JSON AST output