pandoc

star 0

Universal document converter — format conversion (Markdown, HTML, LaTeX, docx, PDF, EPUB, presentations), Lua filters, templates, citations, YAML metadata, and document pipelines. Use when converting between document formats, generating PDFs, extracting content, or programmatically transforming documents.

bugabinga By bugabinga schedule Updated 6/9/2026

name: pandoc description: > Universal document converter — format conversion (Markdown, HTML, LaTeX, docx, PDF, EPUB, presentations), Lua filters, templates, citations, YAML metadata, and document pipelines. Use when converting between document formats, generating PDFs, extracting content, or programmatically transforming documents.

Pandoc Skill

Pandoc is a universal document converter. It reads many input formats, transforms them through an AST, and writes to many output formats. Agents use it primarily for: format conversion, PDF generation, content extraction, and document pipelines with Lua filters.

Version: targets pandoc 3.x (tested on 3.6). Features like --sandbox, --variable-json, and the modern Table AST require pandoc ≥ 3.0.

Official docs: https://pandoc.org/MANUAL.html


Decision Guide

I want to... Go to Key command
Convert Markdown to HTML §3 HTML Output pandoc doc.md -s -o doc.html
Convert Markdown to PDF §2 PDF Generation pandoc doc.md -o doc.pdf
Convert Word docx to Markdown §4 Docx Output pandoc lua scripts/pandoc-docx-to-md.lua file.docx
Convert HTML to Markdown §1 Format Matrix pandoc -f html -t markdown page.html -o page.md
Convert Markdown to Word §4 Docx Output pandoc doc.md -o doc.docx
Merge chapters into a book §8 Multi-File & Jupyter pandoc ch*.md metadata.yaml -s --toc -o book.html
Convert Jupyter notebook §8 Multi-File & Jupyter pandoc analysis.ipynb -s -o report.html
Generate PDF without LaTeX §9 Typst Output pandoc doc.md --pdf-engine=typst -o doc.pdf
Create a presentation §5 Presentations pandoc slides.md -t revealjs -s -o slides.html
Create an EPUB §7 EPUB Generation pandoc ch1.md ch2.md -o book.epub
Write a Lua filter §6 Lua Filters pandoc --lua-filter=my.lua doc.md -o out.html
Add citations/bibliography §12 Citations pandoc doc.md --citeproc --bibliography=refs.bib -o out.html
Process untrusted input §19 Sandbox pandoc --sandbox input.md -o output.html
Customize PDF/docx styles §11 Templates --reference-doc=template.docx or --template=custom.tex
Extract content from docx §13 Extraction pandoc input.docx --extract-media=./media -t markdown
Batch convert many files §15 Pipelines pandoc lua scripts/pandoc-batch-convert.lua md html -s
Use reusable defaults §14 Defaults pandoc -d references/defaults/pdf-report.yaml doc.md
Debug a conversion problem §20 Troubleshooting Check exit codes, inspect AST, verify filters
Diagnose pandoc install §18 Helper Scripts pandoc lua scripts/pandoc-doctor.lua
Run pandoc Lua one-liner below pandoc lua -e 'print(PANDOC_VERSION)'

pandoc lua One-Liners

# Print pandoc version
pandoc lua -e 'print(tostring(PANDOC_VERSION))'

# Parse markdown to JSON AST
pandoc lua -e 'io.stdout:write(pandoc.write(pandoc.read(io.stdin:read("*a")), "json"))'

# List available output formats
pandoc lua -e 'for f in pairs(pandoc.writers) do print(f) end'

# Convert a string between formats
pandoc lua -e 'local d=pandoc.read("**hello**","markdown"); print(pandoc.write(d,"html"))'

# Get current date for metadata
pandoc lua -e 'print(os.date("%Y-%m-%d"))'

Quick Reference: Essential Commands

# Markdown → HTML (standalone page)
pandoc input.md -s -o output.html

# Markdown → PDF (requires LaTeX engine)
pandoc input.md -o output.pdf

# Markdown → Word docx
pandoc input.md -o output.docx

# Markdown → LaTeX
pandoc input.md -o output.tex

# HTML → Markdown
pandoc -f html -t markdown input.html -o output.md

# URL → Markdown (fetch and convert)
pandoc -f html -t markdown https://example.com -o page.md

# With table of contents and numbered sections
pandoc input.md -s --toc -N -o output.html

# Multiple input files → single output
pandoc ch1.md ch2.md ch3.md -s -o book.html

# Pipe: stdout to pandoc
echo "# Hello" | pandoc -f markdown -t html

# Check supported formats
pandoc --list-input-formats
pandoc --list-output-formats

1. Format Conversion Matrix

Input Formats (most useful for agents)

Format Flag Notes
Markdown (pandoc) markdown Default input; richest extensions
GFM gfm GitHub-Flavored Markdown
CommonMark commonmark Strict spec
HTML html Parses web pages
LaTeX latex Requires valid LaTeX
Docx docx MS Word
ODT odt LibreOffice
EPUB epub E-books
Org org Emacs Org mode
RST rst reStructuredText
Jupyter ipynb Notebooks
CSV/TSV csv / tsv Tabular data
Typst typst Modern typesetting
Djot djot Lightweight markup

Output Formats (most useful for agents)

Format Flag Notes
HTML5 html Default output
PDF pdf Via LaTeX/Typst/HTML engine
LaTeX latex Intermediate for PDF
Docx docx MS Word
PPTX pptx PowerPoint presentations
EPUB3 epub E-books
Markdown markdown Pandoc's extended MD
GFM gfm For GitHub
Beamer beamer LaTeX slide shows
reveal.js revealjs HTML slide shows
Typst typst Modern typesetting
Plain text plain Stripped formatting
ANSI ansi Terminal output with colors
Man page man Unix manual pages

Extension Syntax

Enable/disable per-format extensions with +EXTENSION / -EXTENSION:

# Markdown with emoji support, without footnotes
pandoc -f markdown+emoji-footnotes input.md

# GFM with raw HTML
pandoc -f gfm+raw_html input.md

2. PDF Generation

Engines

Engine Flag When to Use
pdflatex --pdf-engine=pdflatex Default; fastest, Latin scripts
xelatex --pdf-engine=xelatex CJK, Unicode fonts, system fonts
lualatex --pdf-engine=lualatex Lua scripting, font fallbacks
typst --pdf-engine=typst Fast, modern, no TeX needed
weasyprint --pdf-engine=weasyprint HTML/CSS → PDF, no TeX
context --pdf-engine=context ConTeXt engine

Common PDF Recipes

# Basic PDF
pandoc doc.md -o doc.pdf

# CJK / Unicode (use xelatex or lualatex)
pandoc doc.md --pdf-engine=xelatex -V mainfont="Noto Sans CJK SC" -o doc.pdf

# Custom margins
pandoc doc.md -V geometry:margin=1in -o doc.pdf

# Landscape, two-column
pandoc doc.md -V geometry:margin=1in -V classoption=twocolumn -V classoption=landscape -o doc.pdf

# Numbered sections + TOC
pandoc doc.md -N --toc --toc-depth=3 -o doc.pdf

# Via Typst (no LaTeX needed)
pandoc doc.md --pdf-engine=typst -o doc.pdf

# Via HTML (good for web content)
pandoc doc.md -t html --css=style.css --pdf-engine=weasyprint -o doc.pdf

# Debug: output intermediate LaTeX
pandoc doc.md -s -o debug.tex

Key LaTeX Variables

Variable Example Effect
geometry margin=1in Page margins
papersize a4, letter Paper size
fontsize 12pt Body font size
documentclass article, book, report LaTeX class
mainfont "Noto Sans" Body font (xelatex/lualatex)
monofont "JetBrains Mono" Code font
linestretch 1.5 Line spacing
colorlinks true Colored links instead of boxes
linkcolor blue Link color
toc (flag) Include TOC
toc-depth 3 TOC heading depth
numbersections (flag) Number sections

3. HTML Output

# Standalone HTML with default styling
pandoc doc.md -s -o doc.html

# Self-contained HTML (inline CSS/JS/images)
pandoc doc.md -s --embed-resources -o doc.html

# With custom CSS
pandoc doc.md -s --css=style.css -o doc.html

# With syntax highlighting style
pandoc doc.md -s --highlight-style=kate -o doc.html

# Section divs (wrap in <section> tags)
pandoc doc.md -s --section-divs -o doc.html

# Math with KaTeX
pandoc doc.md -s --katex -o doc.html

# Math with MathJax
pandoc doc.md -s --mathjax -o doc.html

# Number sections
pandoc doc.md -s -N -o doc.html

# TOC with custom depth
pandoc doc.md -s --toc --toc-depth=2 -o doc.html

HTML Variables

Variable Effect
mainfont CSS font-family on html
fontsize Base font-size
monofont Code font-family
monobackgroundcolor Code background
linkcolor Link color
maxwidth Max content width (default 36em)
margin-left/right/top/bottom Body padding

4. Docx (Word) Output

# Basic conversion
pandoc doc.md -o doc.docx

# With reference doc (styles, margins, fonts)
pandoc doc.md --reference-doc=template.docx -o doc.docx

# Generate default reference doc for customization
pandoc -o custom-reference.docx --print-default-data-file reference.docx

# Extract media (images) from docx input
pandoc input.docx --extract-media=./images -t markdown -o output.md

# Track changes: accept, reject, or keep all
pandoc input.docx --track-changes=accept -o output.md

Reference Doc Styles

When customizing reference.docx, modify these styles: Normal, Body Text, Title, Subtitle, Author, Date, Heading 1–9, Block Text, Source Code, Footnote Text, Caption, Table, Hyperlink.


5. Presentations

reveal.js (HTML Slides)

pandoc slides.md -t revealjs -s -o slides.html

# With theme and custom options
pandoc slides.md -t revealjs -s -V revealjs-url=https://unpkg.com/reveal.js@^5 \
  -V theme=moon -o slides.html

# Self-contained (offline use)
pandoc slides.md -t revealjs -s --embed-resources -o slides.html

# Incremental lists
pandoc slides.md -t revealjs -s -i -o slides.html

Beamer (PDF Slides)

pandoc slides.md -t beamer -o slides.pdf

# With theme
pandoc slides.md -t beamer -V theme=metropolis -o slides.pdf

# Aspect ratio (default 4:3)
pandoc slides.md -t beamer -V aspectratio=169 -o slides.pdf

Beamer Themes

Theme Style
default Clean, minimal
metropolis Modern, flat (requires metropolis package)
Madrid Classic academic
CambridgeUS Formal with colors
Warsaw Sidebar navigation
Boadilla Compact footline
Singapore Clean with mini frames

Set with -V theme=NAME. Use -V colortheme=NAME for color schemes (crane, dolphin, orchid, whale, etc.).

Slide Syntax

Slides are separated by headings. The --slide-level option controls which heading level creates slides:

# Section Title <!-- creates a section divider -->

## Slide Title <!-- creates a slide (if slide-level=2) -->

Content here.

## Next Slide

More content.

Use horizontal rules --- to force slide breaks when needed.

PowerPoint (PPTX)

pandoc slides.md -o slides.pptx

# With reference template
pandoc slides.md --reference-doc=template.pptx -o slides.pptx

6. Lua Filters

Lua filters modify the pandoc AST between parsing and writing. No external deps — Lua 5.4 is built into pandoc.

Basic Filter Structure

-- Save as myfilter.lua, use with: pandoc --lua-filter=myfilter.lua
return {
  Str = function(elem)
    -- Transform inline text elements
    return elem
  end,
  Para = function(elem)
    -- Transform paragraph blocks
    return elem
  end,
}

Return Values

  • nil → element unchanged
  • New element → replaces original (must be same type)
  • List of elements → spliced in place (empty list = delete)

Global Variables Available

Variable Description
FORMAT Output format name (html, latex, etc.)
PANDOC_VERSION Pandoc version as table {major, minor, patch}
PANDOC_READER_OPTIONS Reader options
PANDOC_WRITER_OPTIONS Writer options
PANDOC_SCRIPT_FILE Path to current filter file

Common Filter Patterns

-- 1. Format-conditional logic
if FORMAT:match 'html' then
  function Image(elem)
    elem.attributes.style = 'max-width:100%;'
    return elem
  end
end

-- 2. Replace text with inline elements
return {
  Str = function(elem)
    if elem.text == "{{date}}" then
      return pandoc.Str(os.date("%Y-%m-%d"))
    end
  end,
}

-- 3. Remove all links, keep text
return {
  Link = function(el)
    return el.content  -- unwrap link, keep inline content
  end,
}

-- 4. Add class to all code blocks
return {
  CodeBlock = function(el)
    table.insert(el.classes, "highlight")
    return el
  end,
}

-- 5. Extract all headings (metadata → document)
function Pandoc(doc)
  local headings = {}
  for _, el in ipairs(doc.blocks) do
    if el.t == "Header" then
      table.insert(headings, pandoc.Para(el.content))
    end
  end
  return pandoc.Pandoc(headings, doc.meta)
end

-- 6. Word count
local words = 0
return {
  Str = function(el)
    if el.text:match("%P") then words = words + 1 end
  end,
  Pandoc = function(doc)
    doc.blocks:walk({
      Str = function(el)
        if el.text:match("%P") then words = words + 1 end
      end
    })
    print(words .. " words in body")
    os.exit(0)
  end,
}

-- 7. Top-down traversal (process parents before children)
local filter = {
  traverse = 'topdown',
  Div = function(el)
    if el.classes[1] == 'skip' then
      return el, false  -- skip children
    end
  end,
}
return filter

Traversal Order

Default is typewise: all Inlines → all Blocks → Meta → Pandoc. Set traverse = 'topdown' for depth-first parent-before-children.

Useful Lua Modules (built-in)

Module Use
pandoc.utils stringify, type, sha1, to_simple_table
pandoc.text Unicode-aware upper, lower, len
pandoc.mediabag insert, fetch, list — manage binary assets
pandoc.system with_temporary_directory, with_working_directory
pandoc.layout Pretty-printing, text wrapping
pandoc.path Path manipulation (directory, filename, join)
lpeg Parsing Expression Grammars
re Regex-like pattern matching via lpeg

Running External Commands in Lua

local output = pandoc.pipe("sed", {"-e", "s/foo/bar/"}, input_string)

Mediabag Pattern (Generated Images)

Use pandoc.pipe + pandoc.mediabag to generate images from code blocks and embed them in the output document:

function CodeBlock(el)
  if el.classes[1] ~= "diagram" then return el end
  -- Run external tool to generate image
  local img = pandoc.pipe("dot", {"-Tsvg"}, el.text)
  -- Store in mediabag (binary assets attached to document)
  local fname = pandoc.utils.sha1(img) .. ".svg"
  pandoc.mediabag.insert(fname, "image/svg+xml", img)
  -- Replace code block with image reference
  return pandoc.Para({pandoc.Image({}, fname)})
end

For textual output formats, use --extract-media=DIR to write mediabag files to disk. For HTML, use --embed-resources to inline them.

Multiple Filters

Filters run in command-line order. Lua filters (--lua-filter) are faster than JSON filters (--filter):

pandoc doc.md --lua-filter=filter1.lua --lua-filter=filter2.lua -o out.html

Programmatic Lua API (No CLI needed)

Use pandoc.read() and pandoc.write() to convert entirely in Lua — ideal for agents that need document processing without shell commands:

-- Convert markdown to HTML programmatically
local doc = pandoc.read("# Hello\n\nSome **bold** text.", "markdown")
local html = pandoc.write(doc, "html")
print(html)
-- Output: <h1 id="hello">Hello</h1>
--         <p>Some <strong>bold</strong> text.</p>

Build Documents from Scratch

local doc = pandoc.Pandoc({
  pandoc.Header(1, {pandoc.Str "Title"}),
  pandoc.Para({
    pandoc.Str "Hello ",
    pandoc.Strong {pandoc.Str "world"},
    pandoc.Str "!",
  }),
})
print(pandoc.write(doc, "html"))
-- <h1>Title</h1>
-- <p>Hello <strong>world</strong>!</p>

Read File → Transform → Write

-- Read a file, apply a filter, write output
local f = io.open("input.md", "r")
local doc = pandoc.read(f:read("*a"), "markdown")
f:close()

-- Apply a filter inline
doc = doc:walk({
  Strong = function(el) return pandoc.SmallCaps(el.content) end,
})

-- Write to HTML
local out = io.open("output.html", "w")
out:write(pandoc.write(doc, "html"))
out:close()

Convert Between Any Formats

-- HTML → Markdown
local html_doc = pandoc.read("<h1>Title</h1><p>Text</p>", "html")
print(pandoc.write(html_doc, "markdown"))

-- Docx → Plain text (via pandoc.pipe)
local docx_content = pandoc.pipe("pandoc", {"-t", "json", "input.docx"}, "")
local doc = pandoc.read(docx_content, "json")
print(pandoc.write(doc, "plain"))

Access and Modify Metadata

local doc = pandoc.read("---\ntitle: My Doc\n---\n\nContent.", "markdown")

-- Read metadata
print(pandoc.utils.stringify(doc.meta.title))  -- "My Doc"

-- Set metadata
doc.meta.date = os.date("%Y-%m-%d")
doc.meta.author = pandoc.Inlines "Agent"

7. EPUB Generation

# Basic EPUB
pandoc book.md -o book.epub

# Multiple chapters
pandoc ch1.md ch2.md ch3.md metadata.yaml -o book.epub

# With cover image
pandoc book.md --epub-cover-image=cover.png -o book.epub

# With custom CSS
pandoc book.md --css=epub.css -o book.epub

# Embed fonts
pandoc book.md --epub-embed-font=DejaVuSans.ttf -o book.epub

# Split at specific heading level (default: 1)
pandoc book.md --split-level=2 -o book.epub

# With embedded metadata
pandoc book.md --epub-metadata=meta.xml -o book.epub

EPUB YAML Metadata

---
title: "My Book"
author: "Author Name"
rights: "Creative Commons"
language: en
cover-image: cover.png
stylesheet: epub.css
---

EPUB Metadata XML

<dc:rights>Creative Commons</dc:rights>
<dc:language>en</dc:language>

8. Multi-File Documents & Jupyter

Merging Chapters into a Book

Combine multiple Markdown files with a shared metadata file:

# Basic: concatenate and convert
pandoc ch1.md ch2.md ch3.md -s -o book.html

# With metadata file for title, author, etc.
pandoc ch1.md ch2.md ch3.md metadata.yaml -s --toc -N -o book.pdf

# With --file-scope (isolates footnotes/links per chapter)
pandoc ch??-*.md --file-scope -s --toc -o book.epub

metadata.yaml for Books

---
title: "My Book"
author: "Author Name"
date: "2024-01-15"
rights: "© 2024 Author"
abstract: |
  A book about important things.
keywords: [pandoc, markdown, books]
---

Directory-Based Conversion

# Convert all chapters in order (shell globbing)
pandoc chapters/*.md -s --toc -o book.html

# Using a Lua script for sorted file discovery
pandoc lua -e '
  local sys = require "pandoc.system"
  local files = {}
  for _, f in ipairs(sys.list_directory(".")) do
    if f:match("^%d+.*.md$") then files[#files+1] = f end
  end
  table.sort(files)
  for _, f in ipairs(files) do print(f) end
'

Jupyter Notebook Conversion

# Notebook → HTML (self-contained with outputs)
pandoc analysis.ipynb -s --embed-resources -o report.html

# Notebook → Markdown (extract images)
pandoc analysis.ipynb -t markdown --wrap=none --extract-media=./figs -o analysis.md

# Notebook → PDF (requires LaTeX)
pandoc analysis.ipynb -o analysis.pdf

# Control cell output inclusion
pandoc analysis.ipynb --ipynb-output=all -o full.html     # keep all outputs
pandoc analysis.ipynb --ipynb-output=none -t markdown -o clean.md  # strip outputs
pandoc analysis.ipynb --ipynb-output=best -o report.html  # default: pick best format

CSV/TSV Table Input

# CSV → Markdown table
pandoc data.csv -t markdown

# CSV → HTML table
pandoc data.csv -s -o table.html

# TSV (tab-separated)
pandoc data.tsv -t markdown

9. Typst Output

Typst is a modern, fast alternative to LaTeX for PDF generation. No TeX installation needed.

# Markdown → Typst
pandoc doc.md -t typst -o doc.typ

# Markdown → PDF via Typst
pandoc doc.md --pdf-engine=typst -o doc.pdf

Typst Variables

Variable Example Effect
papersize a4, us-letter Paper size
margin {x: 2cm, y: 3cm} Page margins (dict)
mainfont "Noto Sans" Body font
fontsize 12pt Body font size
codefont "Fira Code" Code font
mathfont "New Computer Modern" Math font
columns 2 Number of columns
linestretch 1.5 Line spacing
section-numbering 1.A.1 Section numbering schema
page-numbering 1 or i Page numbering schema

Typst is significantly faster than LaTeX for large documents and has better error messages.


10. YAML Metadata

---
title: "Document Title"
author: "Author Name"
date: "2024-01-15"
abstract: |
  This is the abstract.
  It can span multiple lines.
keywords: [pandoc, markdown, conversion]
subtitle: "A Subtitle"
lang: en
geometry: margin=1in
fontsize: 12pt
toc: true
toc-depth: 3
numbersections: true
colorlinks: true
linkcolor: blue
header-includes: |
  \usepackage{cleveref}
---

# Introduction

Content starts here.

Setting Metadata from CLI

# String value
pandoc doc.md -M title="My Title" -o out.html

# Boolean
pandoc doc.md -M toc=true -o out.html

# From file
pandoc doc.md --metadata-file=meta.yaml -o out.html

Template Variables vs Metadata

  • -V / --variable → template variables only (strings, not parsed as Markdown)
  • -M / --metadata → metadata (affects document, accessible in filters, parsed as Markdown)

11. Templates

Templates control standalone output structure. Variables are interpolated with $var$ syntax.

# View default template for a format
pandoc -D html
pandoc -D latex
pandoc -D revealjs

# Use custom template
pandoc doc.md --template=mytemplate.html -o out.html

Template Syntax

$if(title)$
<h1>$title$</h1>
$endif$

$for(author)$
<p>$author$</p>
$sep$, 
$endfor$

${body}          <!-- variable interpolation -->
$if(foo)$        <!-- conditional -->
$else$
$endif$
$for(items)$     <!-- loop -->
$sep$, $endfor$  <!-- separator -->

Custom Templates Location

Place in $HOME/.local/share/pandoc/templates/default.FORMAT to override defaults.


12. Citations & Bibliography

# Process citations with citeproc
pandoc doc.md --citeproc -o out.html

# Specify bibliography file
pandoc doc.md --citeproc --bibliography=refs.bib -o out.html

# Specify CSL style
pandoc doc.md --citeproc --csl=ieee.csl -o out.html

Citation Syntax (in Markdown)

See @smith2020 for details.
Multiple: [@smith2020; @jones2019].
Suppress author: [-@smith2020].

YAML for Bibliography

---
bibliography: refs.bib
csl: https://www.zotero.org/styles/ieee
nocite: "@*"  # include all entries in bibliography
---

13. Content Extraction & Processing

# Extract images/media from binary formats
pandoc input.docx --extract-media=./media -t markdown -o output.md

# Convert only specific parts via Lua filter
pandoc doc.md --lua-filter=extract-headings.lua -t markdown

# Read URL, convert to Markdown
pandoc -f html -t markdown https://example.com -o page.md

# JSON AST (for piping to external filters)
pandoc doc.md -t json | ./my-filter.py | pandoc -f json -t html

# Get word count via Lua filter
pandoc doc.md --lua-filter=wordcount.lua

14. Defaults Files

Reusable YAML config files instead of long CLI commands:

# defaults/pdf.yaml
# Note: defaults files do NOT expand globs like chapters/*.md.
# List files explicitly, or pass globs on the command line.
input-files:
  - chapters/01-intro.md
  - chapters/02-usage.md
from: markdown
to: pdf
standalone: true
toc: true
toc-depth: 3
number-sections: true
variables:
  geometry: margin=1in
  fontsize: 12pt
  documentclass: book
pdf-engine: xelatex
# By path
pandoc --defaults=defaults/pdf.yaml

# Or, after installing in $HOME/.local/share/pandoc/defaults/pdf.yaml
pandoc -d pdf

# If you need shell glob expansion, pass files on the command line instead
pandoc chapters/*.md -d defaults/pdf.yaml -o book.pdf

Place in $HOME/.local/share/pandoc/defaults/ for global access by name.


15. Common Agent Pipelines

Web Page → Clean Markdown

pandoc -f html -t markdown --wrap=none \
  https://example.com -o page.md

# Or use the helper script
pandoc lua scripts/pandoc-web-to-md.lua https://example.com page.md

Markdown → Professional PDF Report

# With this skill's defaults file (from the skill directory)
pandoc -d references/defaults/pdf-report.yaml report.md

# Or install it globally and use by name
pandoc -d pdf-report report.md

# Or fully specified
pandoc report.md -o report.pdf \
  --pdf-engine=xelatex \
  -V geometry:margin=1in \
  -V fontsize=11pt \
  -V colorlinks=true \
  --toc --toc-depth=3 \
  -N \
  --highlight-style=kate

# Or use the helper script (auto-detects a working PDF engine)
pandoc lua scripts/pandoc-md-to-pdf.lua report.md

Docx → Markdown (with images)

# One-liner
pandoc input.docx --extract-media=./images -t markdown -o output.md

# Or use the helper script (auto-extracts to ./media)
pandoc lua scripts/pandoc-docx-to-md.lua input.docx

Docx → PDF (re-format a Word doc)

pandoc input.docx -o output.pdf

# With custom reference doc for styling
pandoc input.docx --reference-doc=template.docx -o output.pdf

Notebook → HTML Report

pandoc analysis.ipynb \
  -s --embed-resources \
  --highlight-style=tango \
  -o report.html

# Strip outputs, get clean Markdown
pandoc analysis.ipynb --ipynb-output=none -t markdown -o analysis.md

Batch Convert Directory (cross-platform)

# Using the Lua helper script (works on Windows too)
pandoc lua scripts/pandoc-batch-convert.lua md html -s -M title="Doc"
pandoc lua scripts/pandoc-batch-convert.lua md pdf --pdf-engine=xelatex

Combine Chapters into Book

pandoc ch1.md ch2.md ch3.md metadata.yaml \
  -s --toc -N \
  --pdf-engine=xelatex \
  -o book.pdf

# Or as EPUB
pandoc ch1.md ch2.md ch3.md metadata.yaml \
  --epub-cover-image=cover.png \
  -o book.epub

Markdown → Standalone HTML Report

# With this skill's defaults file (from the skill directory)
pandoc -d references/defaults/html-report.yaml doc.md

# Or install it globally and use by name
pandoc -d html-report doc.md

AST Inspection

# JSON AST for debugging
pandoc doc.md -t json | jq '.blocks[0]'

# Native Haskell AST (human-readable)
pandoc doc.md -t native

16. Key CLI Options Reference

Option Effect
-f / --from Input format
-t / --to Output format
-o / --output Output file
-s / --standalone Full document (header + footer)
--toc Table of contents
--toc-depth=N TOC depth (default 3)
-N / --number-sections Number sections
--lua-filter=FILE Apply Lua filter
--filter=PROG Apply JSON filter (external)
-M KEY=VAL Set metadata
-V KEY=VAL Set template variable
--metadata-file=FILE Load metadata from YAML
--template=FILE Custom template
--css=URL Link CSS (HTML/EPUB)
--embed-resources Inline all resources (HTML)
--extract-media=DIR Extract media to directory
--wrap=auto|none|preserve Text wrapping
--columns=N Line width (default 72)
--shift-heading-level-by=N Adjust heading levels
--reference-doc=FILE Style reference (docx/odt/pptx)
--pdf-engine=PROG PDF engine
--citeproc Process citations
--bibliography=FILE Bibliography file
-d / --defaults=FILE Defaults file
--sandbox Restrict file access
--highlight-style=STYLE Highlighting style or none
-H / --include-in-header Include raw in header
-B / --include-before-body Include before body
-A / --include-after-body Include after body
--resource-path=PATH Search path for resources
--file-scope Parse files individually
--fail-if-warnings Exit on warnings

17. Exit Codes

Code Meaning
0 Success
1 IO error
3 Fail on warnings
5 Template error
6 Option error
21 Unknown reader
22 Unknown writer
43 PDF error
47 PDF engine not found
83 Filter error
84 Lua error

18. Helper Scripts (Cross-Platform)

All scripts are written in Lua and run via pandoc lua — works on Linux, macOS, and Windows. They call pandoc with argv lists via pandoc.pipe (no shell quoting or os.execute).

# Batch convert all .md files in current directory to .html
pandoc lua scripts/pandoc-batch-convert.lua md html -s --toc

# Markdown to polished PDF (auto-detects a working PDF engine)
pandoc lua scripts/pandoc-md-to-pdf.lua report.md
pandoc lua scripts/pandoc-md-to-pdf.lua report.md draft.pdf --draft

# Word docx to Markdown (with image extraction)
pandoc lua scripts/pandoc-docx-to-md.lua input.docx
pandoc lua scripts/pandoc-docx-to-md.lua input.docx output.md

# Fetch URL and convert to Markdown
pandoc lua scripts/pandoc-web-to-md.lua https://example.com page.md

# Word count
pandoc --lua-filter=scripts/pandoc-wordcount.lua doc.md

# Environment diagnostics (versions, formats, PDF engine smoke tests)
pandoc lua scripts/pandoc-doctor.lua

Lua Filter Library

See references/filters/ directory:

  • strong-to-smallcaps.lua — Convert bold to SMALL CAPS
  • center-images.lua — Center images in HTML and image-only LaTeX/Beamer paragraphs
  • meta-vars.lua — Replace %var% placeholders with metadata values
  • extract-headings.lua — Extract nested heading outline as real links/list items
  • add-meta-date.lua — Auto-set date metadata
  • remove-empty-paras.lua — Strip empty paragraphs
  • admonitions.lua — Render :::note/:::warning/:::tip/:::danger div blocks (styled HTML, portable labels elsewhere)
  • autolink.lua — Convert bare URLs into clickable links
  • diagram-filter.lua — Generate Graphviz images from dot code blocks via pandoc.pipe + mediabag

Reusable Defaults Files

See references/defaults/ directory:

  • pdf-report.yaml — Polished PDF report with TOC, numbered sections, XeLaTeX
  • html-report.yaml — Standalone HTML report with TOC, section divs, styled

Use by path from the skill directory:

pandoc -d references/defaults/html-report.yaml doc.md

Or copy to Pandoc's user defaults dir for global use by name:

$HOME/.local/share/pandoc/defaults/html-report.yaml
$HOME/.local/share/pandoc/defaults/pdf-report.yaml

19. Sandbox Mode

When processing untrusted input, use --sandbox to restrict file access:

# Safe: no file access beyond specified inputs
pandoc --sandbox untrusted.md -o safe.html

# Warning: sandbox does NOT restrict filters or PDF engines
pandoc --sandbox --lua-filter=my-filter.lua untrusted.md -o out.html  # filter not sandboxed

Limitations:

  • Binary formats (docx, odt) may fail in sandbox mode unless pandoc was compiled with embed_data_files
  • Filters and PDF engines run outside sandbox
  • Best for: converting untrusted Markdown/HTML to safe output

20. Troubleshooting

--shift-heading-level-by=-1 produces no headings

With shift -1, a level-1 heading (# H1) becomes the document title (level 0), not a heading. Use --shift-heading-level-by=0 or start your document at ## H2.

# Wrong: H1 disappears into title
pandoc doc.md --shift-heading-level-by=-1 -t markdown

# Right: use H2+ in source, or don't shift
pandoc doc.md -t markdown

Blank PDF from CJK / non-Latin text

Default pdflatex can't handle CJK. Use xelatex or lualatex with a font that has the glyphs:

pandoc doc.md --pdf-engine=xelatex -V mainfont='Noto Sans CJK SC' -o doc.pdf

Images missing in output

  • For HTML: add --embed-resources or --resource-path=.
  • For docx/odt input: use --extract-media=DIR to write embedded images to disk
  • Check image paths are relative to the working directory, not the source file

WARNING: This document format requires a nonempty <title> element

Add a title via YAML metadata or -M title="My Title":

pandoc doc.md -M title="Report" -s -o doc.html

Lua filter doesn't modify elements

Filters must return the modified element. This does nothing:

function Str(el)
  el.text = string.upper(el.text)  -- modifies but doesn't return
end

Correct:

function Str(el)
  el.text = string.upper(el.text)
  return el  -- must return
end

string.upper doesn't work on Unicode

Lua's string library is byte-based. Use pandoc.text for Unicode-aware operations:

local text = require 'pandoc.text'
result = text.upper("café")  -- works correctly

PDF engine not found

Install a PDF engine or specify the full path:

pandoc doc.md --pdf-engine=/usr/local/bin/xelatex -o doc.pdf

Or use Typst (no LaTeX needed): pandoc doc.md --pdf-engine=typst -o doc.pdf


Related Skills

  • mdbook: Building documentation books from Markdown
  • presenterm: Terminal-based presentations
  • d2: Diagram syntax used with pandoc preprocessing
  • jq: Processing pandoc JSON AST output
Install via CLI
npx skills add https://github.com/bugabinga/pi-ext --skill pandoc
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator