name: math-parser-formatter description: Incrementally build Panache's math parser and formatter — a lossless structural TeX CST for inline/display math, then content-aware reformatting behind an experimental gate — one bounded phase at a time.
Use this skill when asked to advance Panache's math parsing/formatting, fix a math-CST or math-formatting issue, or pick the next phase of this effort.
This is a long-horizon, multi-session effort. Each session moves one phase
or sub-task forward; do not attempt sweeping rewrites in one go. The standing
design plan is ~/.claude/plans/i-want-to-plan-foamy-pascal.md.
Scope boundaries
- Parser:
crates/panache-parser/src/parser/math.rs(the TeX content parser), embedded viacrates/panache-parser/src/parser/inlines/math.rs, with AST accessors incrates/panache-parser/src/syntax/{math,inlines}.rs. - Formatter (future phases): a new
crates/panache-formatter/src/formatter/math/mirroringformatter/yaml/, gated behind an experimental option. - Goal: parse math content into a lossless structural CST (Phase 1 done;
operator atoms now tokenized) and reformat it semantics-safely. In scope:
align
&columns, indent environment bodies, normalize\\, collapse spaces (done); operator-precedence-aware spacing (a+b→a + b, class-based); and semantic line-breaking + indenting of long display math (wrap at the lowest-precedence operators, indent continuations). Out of scope: macro rewriting,\frac/\dfraccanonicalization, and anything that needs macro expansion. - There is no pandoc oracle for math formatting — pandoc passes math
content through untouched. Use an external dev-only oracle (latexindent /
KaTeX parser) for cross-validation, à la
pretty_yamlfor YAML.
Locked-in design decisions (do not relitigate)
- Parser is unconditional + lossless; the experimental gate lives on the
formatter side (default off = emit math verbatim, today's behavior). The
gate is a formatter-config option, NOT a Pandoc
Extensionsflag. - texlab, not KaTeX, is the parser model (lossless, error-tolerant vs. lossy, throwing).
- Diagnostics ride a side-channel (
MathParseReport), to be surfaced via linter + LSP — not the CST. - Bookdown equation labels (
(\#eq:label)) are parsed into aMATH_EQUATION_LABELtoken, gated onbookdown_equation_references. MATH_SPACE/MATH_NEWLINEstay distinct from hostWHITESPACE/NEWLINEsomath_content_text()can strip container prefixes the block machinery interleaves intoMATH_CONTENT(blockquote>etc.). See.claude/rules/math-parser.md.- Operators are tokenized but never classified in the CST.
+ - * = < >emit a neutralMATH_OPERATORtoken (one per char); bin/rel/precedence is interpretation (contextual unary minus,\mathbin, macro-dependent) — the analog of YAML scalar cooking. It lives in a shared formatter/LSP module keyed on operator text + command name (class + break-priority), neverMATH_BIN_OP/MATH_REL_OPkinds. That module is the gateway to both precedence-aware spacing and semantic line-breaking.
Related rules to read first
.claude/rules/math-parser.md— the math-parser invariants..claude/rules/parser.md— single-pass + lossless CST..claude/rules/formatter.md— idempotency; idempotency drift is often a parser-shape bug, not a formatter bug.
Phased plan (status)
- Phase 0 — scaffolding. SyntaxKinds, this skill + rule, corpus. Skill/rule DONE; representative TeX corpus still TODO.
- Phase 1 — TeX tokenizer + structural CST (parser). DONE. Lossless
MATH_CONTENTCST, diagnostics side-channel, bookdown labels, accessors/projector/indexers. - Phase 1b — operator atoms (parser). DONE (
feat(parser): tokenize math operators into MATH_OPERATOR). NeutralMATH_OPERATORtoken, no class. - Phase 2 — formatter experimental gate + inline math. DONE. Gate is
[experimental] format-math(default false), mirrored ontoConfig::experimental_format_math, schema regenerated. Off → verbatim; on → inline spacing normalization. - Phase 3 — display math + environments. DONE.
&-column alignment, environment-body indentation,\\normalization; honorshas_unescaped_single_dollar_in_content(). - Phase 4 — dev-oracle cross-validation + idempotency corpus. DONE.
Tier-1 corpus props + Tier-2
pulldown-latexMathML invariance oracle. - Phase 5 — operator interpretation module + precedence-aware spacing.
DONE.
formatter/math/operators.rs(cooking.rsanalog,pubfor LSP): classify char operators + curated command table → class; TeX Bin→Ord coercion; gap-based re-spacer (a+b→a + b, unary-x/f(-x)tight,x=-y→x = -y). Char operators only; command-operator spacing + Tier 3 → Phase 5b; break-priority column → Phase 6. - Phase 5b — command-operator spacing + Tier 3. DONE. Re-spaced
\leq/\cdot(command-terminating space handled, neverTightOp); landed the dev-only vendored symbol→atom-class fixture (tests/fixtures/math_symbol_classes/) cross-checked againstpulldown-latexEvents.\lim/\asympdivergences recorded, not corrected. - Phase 6 — semantic line-breaking + indenting. Wrap long display math at
lowest-precedence operators, indent continuations (uses Phase 5 priorities).
- Commit 1 DONE: parser tokenizes delimiters/punctuation (
( [→MATH_OPEN,) ]→MATH_CLOSE,, ;→MATH_PUNCT;| . /stay text); formatter'stext_tail_classreplaced by kind-keyedoperators::delimiter_class. No behavior change. - Commit 2 DONE (
9d7c2e5b):operators::break_priority(Rel > Bin > 0) + newformatter/math/linebreak.rs. Over-width display free rows break at depth-0 relations (≥2), continuations align under the first relation; depth tracked via open/close counter ((/[/\leftvs)/]/\right), brace groups opaque.line_widththreaded ontoMathFormatOptions. Idempotency:render.rs::split_logical_rowsjoins soft newlines into one logical row (only\\splits) — except a%-comment-terminating newline (significant, or the next line is absorbed into the comment). - Commit 3 DONE: nested binary breaking inside an over-width relation
segment — each
+ termnests one indent step deeper (under the relation RHS).linebreak.rsnow usesspaced_operator_breaks(depth-0, coerced, so unary signs excluded) +break_binary_segment;render_inline_seeded(_, Some(Close))keeps a leading-+continuation binary (not unary) in isolation. Scope: binary breaking only WITHIN a relation chain (≥2 rels); standalone binary chains / single-relation / no-relation rows stay one line. Remaining: binary breaking outside a relation chain, environment-body breaking, min-breaks-to-fit.
- Commit 1 DONE: parser tokenizes delimiters/punctuation (
- Phase 7 — docs + stabilization (
docs/guide/formatting.qmd,configuration.qmd); consider flipping the gate per flavor (separate decision). - Surface math diagnostics via linter/LSP — DONE (promoted Warning→Error).
- Optional structural cooking (parser, orthogonal to operators): script attachment, known-command argument grouping — legit future CST work if a formatting phase needs the structure.
Session workflow
- Read
RECAP.md(status, traps, next sub-targets) and the rules above. - Pick one bounded phase/sub-task.
- TDD: add the failing test first (parser golden / formatter golden / unit).
- Validate before landing:
cargo test --workspacecargo clippy --workspace --all-targets --all-features -- -D warningscargo fmt -- --check- For parser CST snapshot changes: review each diff (byte ranges must still reconstruct the input losslessly).
- Flag-off regression: existing formatter goldens stay byte-identical.
- Rewrite
RECAP.md's Latest-session entry; add a one-line Earlier-sessions note.
Traps
- A background process (suspected pre-commit
git stash) reverted tracked edits once mid-session; untracked files survived. If source edits vanish, re-apply. - Don't read raw math content via
MATH_CONTENT.text()— usesyntax::math::math_content_text()(strips host container prefixes).