name: pandoc-markdown
description: >
Use when authoring/editing academic Pandoc Markdown (theses, articles, beamer decks,
standalone HTML) with YAML frontmatter, --citeproc+.bib citations, LaTeX math,
cross-refs, or Mermaid/TikZ/PlantUML diagrams via XeLaTeX/Beamer. Triggers: pandoc,
refs.bib, xelatex, beamer, ABNT/CSL, Lua filters, compile debugging.
Loads a self-update trigger when: "results are not good", "the PDF is
wrong", "the skill missed something", "fix the skill".
argument-hint: "[target format: pdf | beamer | html] [source .md file] | self-update [advise-only]"
user-invocable: true
self-update trigger when: "results are not good", "the PDF is
wrong", "the skill missed something", "fix the skill".Pandoc Markdown for Academic Writing
Pandoc is the authoring target for this repo's thesis, articles, and slide decks. This skill encodes the non-obvious conventions and failure modes that make academic Pandoc output actually render correctly — correct math, citations, cross-references, diagrams, and beamer frames — rather than the common cases already in the agent's training data.
Primary outputs covered: PDF (XeLaTeX), Beamer slides (PDF), standalone HTML.
Citation stack: --citeproc + BibTeX .bib + ABNT CSL. Diagrams: Mermaid, TikZ,
PlantUML via Lua filters.
All reference files under references/ are the official Pandoc manual sections — treat
them as authoritative. Load one only when the current task needs its content (see the
References table).
When to use this skill vs. plain markdown
- Load this skill the moment a
.mdfile has a YAML frontmatter withbibliography:,title:+author:,header-includes:,theme:(beamer), orreference-section-title:. Those fields only make sense under Pandoc. - Load it when the user mentions compiling a
.md— pandoc's CLI flags, engines, and filters are the most error-prone surface. - Do not load it for plain GitHub-flavored Markdown (READMEs, issues, PR descriptions)
unless the user explicitly asks to target Pandoc. For those, prefer
markdown-formatter.
Core rules for academic Pandoc markdown
- Every academic document starts with a YAML frontmatter. Minimum fields:
title,author,date,lang, and (for citations)bibliography. Templates live in assets/. - Math uses
$…$and$$…$$. This is thetex_math_dollarsextension, enabled by default inmarkdown. Inline: the opening$must have a non-space char to its right, the closing$must have a non-space char to its left and NOT be followed by a digit — otherwise$20,000 and $30,000is parsed as math. Escape literal dollars with\$. Display math may be separated from the formula by whitespace but no blank lines between the$$delimiters. Source: references/pandocs-markdown.md. - Citations use
[@key]syntax with--citeproc.[@goldberg1989],[@smith2020, p. 42],[-@smith2020](suppress author),[@a; @b](multiple). The.bibfile is declared in frontmatter (bibliography: ./refs.bib) — never hardcode the path in the compile command when the frontmatter already has it. See references/citations.md. - Cross-references use the same
@prefix only with thepandoc-crossreffilter. Without that filter, use explicit\ref{label}inside raw LaTeX blocks for PDF output, or plain Markdown links ([Figure 1](#fig:foo)) for HTML. Do not promise cross-refs "just work" — they require a filter. - Raw LaTeX is allowed and preserved when the output is
latex/pdf/beamer. Wrap in\begin{...}\end{...}or use inline backslash commands. It will be dropped silently for HTML/DOCX output — do not mix raw LaTeX into a document that needs multi-format output without a fallback. - Heading levels drive structure. For
beamer, the slide level is auto-detected as the deepest heading followed immediately by content; override with--slide-level=N. For a thesis, level-1 headings are chapters only if the template expects it (ABNT-like templates do). See references/slide-shows.md. - Language matters for hyphenation, quotes, and citeproc locale. Always set
lang:in frontmatter (lang: pt-BR,lang: en-US). XeLaTeX uses it to pick hyphenation patterns; citeproc uses it to localize "and"/"et al." and date formats. - Always use forward slashes in paths inside the Markdown (
assets/figure.png), even on Windows hosts — backslashes break on Unix builders.
YAML frontmatter by document type
Pick the matching template from assets/ and adapt the placeholders. Do not invent fields —
unknown YAML keys are silently ignored by pandoc, which means typos produce wrong output
without errors.
| Document type | Template |
|---|---|
| Thesis / monograph (XeLaTeX PDF) | assets/thesis-frontmatter.md |
| Research article / paper | assets/article-frontmatter.md |
| Beamer slide deck (PDF) | assets/beamer-frontmatter.md |
| Pandoc defaults file (reusable compile config) | assets/defaults-academic.yaml |
bibliography:must be a string or a YAML list of strings — never bare YAML anchors. Multiple files:bibliography: [./refs.bib, ./extra.bib].reference-section-title:creates the bibliography heading automatically. Without it, the refs list appears with no header in PDF output.link-citations: truemakes in-text citations hyperlink to the bibliography — required for readable PDFs. Off by default.header-includes:is raw passthrough to the LaTeX preamble. A malformed entry breaks the whole build with a cryptic LaTeX error, not a YAML error.- Beamer-specific fields (
theme,colortheme,fonttheme,aspectratio) only apply when-t beameris used. They are ignored for other outputs — safe to keep in a shared source file.
Compile commands
Academic PDF (thesis/article) with citations and Lua filters:
pandoc input.md \
--from markdown \
--to pdf \
--pdf-engine=xelatex \
--citeproc \
--csl=path/to/abnt.csl \
--lua-filter=scripts/diagram.lua \
--resource-path=.:assets \
-o output.pdf
Beamer slide deck (this repo's thesis presentation pattern):
pandoc slides.md \
-t beamer \
--pdf-engine=xelatex \
--citeproc \
--lua-filter=scripts/beamer-table-fix.lua \
-o slides.pdf
Standalone HTML with math via KaTeX:
pandoc input.md \
-s --katex \
--citeproc \
--toc \
-o output.html
--citeprocis required if the document cites anything. Without it,[@key]is rendered as literal text in the output.--pdf-engine=xelatexis required for any document with non-ASCII text (Portuguese, emoji, small-caps, etc.).pdflatexwill fail on Unicode.lualatexalso works.--resource-pathis a colon-separated (Unix) or semicolon-separated (Windows) list of directories pandoc searches for images and\input{}-ed files. When the source file is not in the repo root, always set it.- A defaults file (assets/defaults-academic.yaml) replaces
all those flags with
pandoc -d defaults-academic.yaml input.md. Use it when the same command is run repeatedly. See references/defaults-files.md.
Diagrams that render in the PDF
The user selected three diagram stacks: Mermaid, TikZ, PlantUML. Each has a different integration path. Use the decision table below.
| Stack | Invocation | Filter / engine | Best for |
|---|---|---|---|
| Mermaid | fenced ```mermaid block |
pandoc-ext/diagram Lua filter + mmdc CLI |
Flowcharts, sequence, class, ER |
| TikZ | raw \begin{tikzpicture}…\end{tikzpicture} |
Native XeLaTeX — no filter needed for PDF output | Precise technical figures, math |
| PlantUML | fenced ```plantuml block |
Same pandoc-ext/diagram filter + plantuml CLI |
UML, component diagrams |
See references/diagrams-and-filters.md for installation, version constraints, and the full filter invocation pattern.
- Mermaid and PlantUML are pre-rendered to images by the filter (PNG/SVG) and embedded;
the raw source is not in the PDF. This means: (1) the
mmdc/plantumlCLI must be onPATHat build time, (2) builds fail offline if Mermaid needs to download Chromium. - TikZ does NOT work in Beamer frames that use
\begin{frame}[fragile]without extra care —[fragile]is required for verbatim/code, but TikZ compilation inside[fragile]frames silently produces wrong output. If a slide has both code and TikZ, split into two frames. - Mermaid diagrams in Beamer are known to break when the filter strips the block but
doesn't render (version mismatch). Pre-render to a PNG and embed with
{width=80%}— this repo's thesis presentation does exactly this. Documented inscripts/beamer-table-fix.luacomments. - TikZ output in HTML requires a rasterization pipeline (
dvisvgmortikz-to-svg). Easier path: conditionally embed a pre-rendered SVG using raw HTML blocks with thehtmloutput format.
Known failure modes (and fixes)
[WARNING] Could not convert TeX math ... , rendering as TeXwhen converting HTML → markdown viapandoc -f html -t gfm.gfmhas no native math; pandoc falls back to literal TeX. Fix: use-t markdown(pandoc-flavored) instead of-t gfm, and the math stays as$…$blocks. This was the failure in the user'sarticle-example.mdcase.- HTML-to-markdown conversion produces raw HTML
<div>/<span>noise. Add--wrap=none --strip-commentsand post-process. For arXiv HTML specifically, the cleanest route ispandoc -f html -t markdown+smart --strip-comments input.html -o output.mdthen manually add a YAML frontmatter.gfmwas the wrong target for academic content. - Longtable breaks inside a Beamer frame. Pandoc 3.x generates
longtablefor multi-row markdown tables; beamer frames can't hold a longtable. Fix: use the scripts/beamer-table-fix.lua filter that convertslongtable→tabular. Pass--lua-filter=scripts/beamer-table-fix.lua. ## Referênciasloses{.allowframebreaks}whenreference-section-titleis set. Removereference-section-titlefrom the YAML and write the heading manually with the attribute:## Referências {.allowframebreaks}.mermaid-filter+ Pandoc 3+ hides diagrams from\listoffigures.mermaid-filteremits older Pandoc 2.x AST (Para[Image]) instead of a properFigurenode. LaTeX won't generate\begin{figure}environments for paragraphs. Fix: Use a custom lua interceptor script to convert them:if #elem.content == 1 and elem.content[1].t == 'Image' then ... return pandoc.Figure(...). Reference thescripts/fix-mermaid-figure.luascript if available. Alternatively, migrate topandoc-ext/diagramwhich bridges the AST correctly.mermaid-filterdiagrams squished (vertically crushed) in LaTeX. LaTeX's default\maxheightconstraint for images is0.4\textheight. Diagram images with tall aspect ratios get proportionally scaled down dramatically to fit. Fix: override it in the markdown header:\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}. For infinite scaling, formulate your blocks withformat=pdforformat=svg.toc: true+\AtBeginSectionproduces duplicate section divider slides. Choose one; for beamer, prefer explicit## Roteiroslides over auto-generated ones.- Raw LaTeX in the document body is silently dropped in DOCX output. If the document is
multi-format, replace
\textbf{foo}with**foo**,\emph{foo}with*foo*, and move genuinely LaTeX-only constructs intoheader-includes:macros used via Pandoc spans.
Writing conventions for academic Markdown
- Paragraphs separated by a blank line. A lone newline inside a paragraph is just a space — do not rely on visual line breaks in the source.
- Hard breaks: end a line with two spaces or a backslash + newline (
escaped_line_breaks). Needed inside multiline/grid table cells where trailing spaces are stripped. - Headings: always leave a blank line before a heading (
blank_before_headerextension, on by default inmarkdown). Use ATX style (##) — setext style breaks on long titles. - Heading IDs are auto-generated from the text (lowercased, non-alphanumerics stripped,
spaces → hyphens). To pin a stable ID, use
## Section {#sec:intro}. Required for any heading you want to cross-reference. - Figures with captions:
{#fig:label width=80%}. The#fig:labelrequirespandoc-crossref; without it, just write\label{fig:label}in raw LaTeX immediately after the image. - Tables: pipe tables (
| a | b |) for simple cases; grid tables for cell content with newlines. Always include a header row — pandoc cannot infer alignment otherwise. - Callouts / admonitions: GitHub-style
> [!NOTE]blocks are NOT pandoc-native. For PDF, use raw LaTeX environments (e.g.,\begin{tcolorbox}...\end{tcolorbox}) defined inheader-includes:. For HTML, a<div class="note">...</div>+ CSS. - Footnotes:
[^1]inline and[^1]: texton its own line. Unlike citations, these work without any filter.
Workflow: starting a new academic document
- Pick the document type — thesis, article, or beamer — and copy the matching YAML
from
assets/. - Set the language and bibliography in the frontmatter before writing any body.
Forgetting this turns every citation into literal
[@key]in the output. - Write the body following the rules above. Cite with
[@key], math with$…$, figures with{#fig:label}. - Add diagrams per the decision table. If using Mermaid/PlantUML, verify
mmdcorplantumlis onPATH:which mmdc && which plantuml. - Compile with one of the commands above, or with a defaults file.
- Read the first warning, not the last. Pandoc emits warnings top-down, and an early YAML error cascades into dozens of downstream warnings.
After each compile, verify:
- No
[WARNING] Could not convert TeX mathlines in stderr - No
[WARNING] Citeproc: citation X not foundlines - Bibliography appears in the output with the expected heading
- All figure numbers resolve (no
??in PDF) - Table of contents (if enabled) reflects the actual heading hierarchy
References
Load only the file(s) matching the current task. All are the official Pandoc manual sections verbatim — authoritative but long.
| File | Load when |
|---|---|
| references/pandocs-markdown.md | Any question about syntax, extensions, math, tables, inline formatting |
| references/citations.md | Configuring .bib, CSL, citation syntax edge cases, citeproc locale |
| references/extensions.md | Enabling/disabling a specific pandoc extension (+smart, -raw_tex, etc.) |
| references/templates.md | Customizing the LaTeX/HTML template, defining variables, overriding default.latex |
| references/slide-shows.md | Beamer or reveal.js slide structure, slide levels, incremental lists |
| references/defaults-files.md | Writing or debugging a pandoc -d defaults.yaml file |
| references/options.md | CLI flag reference (--from, --to, --pdf-engine, --lua-filter, …) |
| references/syntax-highlighting.md | Code block highlighting, highlight-style, custom KDE XML themes |
| references/diagrams-and-filters.md | Installing/configuring mermaid/PlantUML/TikZ pipelines (synthesized, not official doc) |
| references/accessible-pdfs-and-pdf-archiving-standards.md | PDF/A, tagged PDF, accessibility |
| references/epubs.md | EPUB output (not primary but supported) |
| references/jupyter-notebooks.md | Converting to/from .ipynb |
| references/reproducible-builds.md | Deterministic output for CI |
| references/format-compatibility.md | Which features survive which output formats (PDF/HTML/DOCX/EPUB cross-matrix) |
| references/quality-rubric.md | Check-mode scoring sheet — items, PASS/FAIL criteria, severity |
| references/self-update-examples.md | Worked transcripts of the self-update protocol (missing bib, fatal LaTeX, dry-run) |
| references/check-mode.md | Full check-mode workflow + symptom→fix auto-fix catalogue |
(all other files in references/) |
Rare edge cases — authors, custom readers, lua interpreter mode, web server, vimdoc |
Scripts bundled with this skill
All filters are opt-in — none load automatically. Pass them via --lua-filter=… on the
command line or list them under lua-filters: in a defaults file. Each file has a 4-part
header comment (purpose / invocation / limits / filter order) — read it before wiring the
filter into a pipeline.
| Script | Trigger / Condition | Mechanism / Purpose |
|---|---|---|
| scripts/beamer-table-fix.lua | --lua-filter beamer-table-fix.lua |
Beamer specific. Pandoc 3 multi-row tables generate longtable environments, which Beamer frames cannot natively support without crashing. Converts them to standard tabular so tables fit seamlessly in slide frames. |
| scripts/fix-mermaid-figure.lua | --lua-filter fix-mermaid-figure.lua with --filter mermaid-filter |
XeLaTeX diagram fix. mermaid-filter emits older Pandoc 2.x Para[Image] nodes. This script intercepts these and upgrades them into block-level Figure nodes, guaranteeing that your diagrams appear natively inside \listoffigures and support \ref. |
| scripts/abstract-to-latex.lua | Top-level ::: {.abstract} divs |
LaTeX specific. Wraps content inside \begin{abstract}…\end{abstract}. Very useful for academic papers where Pandoc's default markdown abstract handles placement poorly. |
| scripts/figure-numbering.lua | When --filter pandoc-crossref is unavailable |
Zero-dependency numbering. Scans unlabeled images and assigns them sequential fig:N tags so you can do \ref{fig:N} without installing the heavy pandoc-crossref binary. |
| scripts/strip-raw-latex-for-html.lua | When compiling to --to html* or epub* |
HTML fallback. LaTeX authors often use pure TeX commands like \textbf{}. This filter sanitizes them into pandoc semantic Markdown equivalents so styles aren't lost in HTML targets. |
| scripts/lang-span.lua | Markdown spans defined as [word]{lang=xx} or ::: {lang=xx} |
Polyglossia integration. Converts the span into the LaTeX command \foreignlanguage{xx}{word}, ensuring correct hyphenation for mixed language terms in a document. |
- Beamer +
strip-raw-latex-for-html.lua= no-op. The filter only activates forhtml*/epub*targets; safe to leave in a shared defaults file. figure-numbering.luavspandoc-crossref. Do not load both — crossref already assignsfig:Nidentifiers. Usefigure-numbering.luaonly as a zero-dependency fallback.lang-span.luarequiresotherlangs:in the YAML. The filter emits\foreignlanguage{english}{…}; that command fails at compile time ifenglishwas not registered viaotherlangs: [en-US](polyglossia) in the frontmatter.
Diagram filters (pandoc-ext/diagram, raghur/mermaid-filter) are installed separately —
see references/diagrams-and-filters.md.
Templates
Copy a template from templates/ into the document root and adapt the placeholders. Each
file is a complete, compilable starting point — frontmatter + a few body examples — not a
fragment.
| Template | Purpose |
|---|---|
| templates/minimal-thesis.md | ABNT-style thesis skeleton (XeLaTeX PDF, citeproc, chapters) |
| templates/minimal-article.md | Single-file research article (XeLaTeX PDF, abstract, refs) |
| templates/minimal-beamer.md | Beamer slide deck skeleton matching this repo's thesis-presentation pattern |
| templates/standalone-html.md | Self-contained HTML with KaTeX math + citeproc |
| templates/tikz-figure.md | Inline TikZ figure block ready to drop into a XeLaTeX document |
| templates/circuitikz-figure.md | CircuiTikZ schematic block (electrical/electronic diagrams) |
| templates/mermaid-workaround.md | Pre-render Mermaid → PNG and embed (beamer-safe; avoids filter failure) |
| templates/multi-language.md | Polyglossia + lang-span.lua setup for mixed-language documents |
Self-update protocol
When a pandoc/LaTeX compile fails in a way that this skill's current guidance did not prevent, the skill should offer to update itself (new rule, new gotcha, new trigger, or new TODO row). The protocol below is mandatory: the agent must classify the failure, ask the user before proposing changes, and ask again before applying them. Silent edits to SKILL.md are forbidden.
Major failure — auto-propose self-update. Any one of the following, observed while this skill was loaded, counts as a major failure and triggers the classify→ask flow:
- pandoc process exit code ≠ 0
[ERROR](any case) present in pandoc stderr- LaTeX log contains
Fatal error occurred, no output PDF file produced! - Citeproc reports one or more missing bib entries for keys cited in the source
- Output PDF exists but has zero pages (
pdfinfo→Pages: 0)
Minor failure — do NOT auto-trigger. Record if the user asks, otherwise stay silent:
[WARNING]lines without any[ERROR]Underfull \hbox/Overfull \hbox/Underfull \vboxmessages- Non-empty output PDF (
pdfinfo→Pages: ≥ 1) even if it contains??placeholders
- Classify. On a major failure, quote the triggering stderr snippet and state the category from the major-failure list.
- First ask. Call
vscode_askQuestionswith exactly these options:Yes — propose patch/No — log to fix/TODO.md/No — ignore/Custom (freeform). One question, freeform input allowed. - On
Yes: draft a minimal unified diff (new rule / new gotcha / new reference row / new TODO entry) and callvscode_askQuestionsa second time with optionsApply/Revise/Cancel. Only onApplydo you write to SKILL.md. - On
No — log to fix/TODO.md: append a row tofix/TODO.mdwith today's date, the trigger snippet, the affected file,declined(ordeferred-for-review), and a one-sentence rationale. Do not edit SKILL.md. - On
No — ignore: take no action; do not log. - On
Custom: treat the freeform text as the user's instruction. If it implies a patch, go to step 3; otherwise log ascustominfix/TODO.md. - Advise-only mode. When another phase/agent invokes this protocol with
mode=advise-only, classify and return{classification, category, suggested_fix, todo_row_draft}as JSON. Do not edit SKILL.md orfix/TODO.mdyourself.
Four worked transcripts (missing bib key, LaTeX fatal, overfull-hbox-only, real orchestrator dry-run) are kept in references/self-update-examples.md to keep this file under its 500-line budget.
User phrases that force a self-update evaluation even without a failing compile. When the
user says any of these verbatim (or very close), run the <workflow> starting at step 1
with the most recent compile output (or N/A if none) as the trigger snippet:
- "results are not good"
- "the PDF is wrong"
- "the skill missed something"
- "fix the skill"
Mode selection
The skill has two opt-in maintenance/QA modes. They never run automatically together; pick based on what the user is asking to fix.
| Mode | Activate when |
|---|---|
| Self-update | A compile failed in a way the skill should have prevented, OR the user uses a self-update trigger phrase ("fix the skill", "the skill missed something"). Edits SKILL.md / fix/TODO.md. |
| Check mode | The compile succeeded but the user wants a quality review of the output PDF. Triggered only by check-mode phrases ("check the PDF", "review the output", /pandoc-check). Edits the document, not the skill. |
| Neither | Default. Normal authoring/editing tasks; both modes stay dormant. |
Check mode
Check mode is an opt-in PDF quality review. The agent only runs it when the user
explicitly asks (see <triggers> below) — never automatically after a compile, never as
part of the self-update protocol. Check mode and self-update are complementary:
self-update edits the skill; check mode edits the current document. The full
scoring sheet lives in references/quality-rubric.md.
Run check mode only when the user says one of these (verbatim or close paraphrase):
- "check the PDF"
- "review the output"
- "did it render right"
- "run the quality check"
- explicit
/pandoc-check
Seven steps (locate PDF → re-run with --verbose → load pdf skill + vision rubric →
classify each item PASS/FAIL/UNKNOWN → per-FAIL vscode_askQuestions → on Fix: yes
propose patch + second confirm → on Fix: no log to fix/TODO.md as check-declined).
Full workflow + the symptom→fix auto-fix catalogue live in
references/check-mode.md. No preflight wrapper —
pandoc … --verbose directly.