name: arxiv-prep user_invocable: true description: Prepare an arXiv submission package from a LaTeX paper. Cleans the source, builds a tarball, and extracts metadata for the submission form. Use before uploading to arXiv.
arxiv-prep — arXiv Submission Preparation
Automate the tedious steps of preparing a LaTeX paper for arXiv upload. Creates a clean copy, removes cruft, verifies compilation, and produces a ready-to-upload tarball.
When to Use
- "prep for arxiv", "make an arxiv package", "I need to upload this to arxiv"
- After the paper is finalized and ready for submission
- Complements
presubmit-checks(content quality) — this skill handles packaging
Workflow
Phase 1: Pre-clean (judgment calls)
1. Find the paper
Same search logic as presubmit-checks:
- User-provided path
paper/current/main.texpaper/main.texmain.tex- Glob for
**/*.texwith\begin{document}
Identify the paper directory (parent of the main .tex file). All subsequent operations are relative to this directory.
2. Optionally run presubmit-checks
Ask the user: "Want me to run the presubmit-checks first?" Skip if they say they already did.
3. Check for supplements
Scan for separate supplement/appendix .tex files (e.g., supp.tex, appendix.tex, si.tex). If found, ask the user whether to:
- Merge into the main file via
\appendix(arXiv prefers single-PDF submissions) - Keep separate (will be included as ancillary files)
4. Review style files
Grep for journal-specific language that should be removed for arXiv:
- "Submitted to", "Under review at", "Accepted by"
- Journal-specific class options (e.g.,
review,preprintmode toggles) - Copyright/license statements that conflict with arXiv's license
Flag these and ask before changing.
Phase 2: Automated cleaning
5. Optimize bibliography
Run bib_optimizer to remove unused citations and reorder entries to match their order of appearance in the text:
uvx bib_optimizer bibopt <main.tex> <references.bib> <references_cleaned.bib>
- If the paper has a supplement with its own
.bib(e.g.,si.tex/supp.texusingsi_references.bib), runbiboptseparately for each:uvx bib_optimizer bibopt <si.tex> <si_references.bib> <si_references_cleaned.bib> - If the supplement shares the main
.bib, runbibopton each.texfile separately against the same.bib, then merge the two outputs (concatenate and deduplicate entries by cite key):
Then merge withuvx bib_optimizer bibopt <main.tex> <references.bib> <references_main.bib> uvx bib_optimizer bibopt <si.tex> <references.bib> <references_si.bib>bibtool(deduplicates by cite key):
Ifbibtool -d references_main.bib references_si.bib -o references_cleaned.bibbibtoolis not available, concatenate both files and manually remove any duplicate@type{key,entries (keep the first occurrence). - Update the
\bibliography{...}command in each.texfile to point to the cleaned file. - The original
.bibis never modified.
If bib_optimizer is not available, skip this step — the .bib will still be handled in step 7.
6. Run arxiv-latex-cleaner
uvx arxiv-latex-cleaner <paper-dir> --resize_images --im_size 500 --compress_pdf
This creates a <paper-dir>_arXiv/ directory with a cleaned copy. The original is untouched. The --compress_pdf flag reduces embedded PDF figure sizes, helping stay under arXiv's 50 MB limit.
If arxiv-latex-cleaner is not available, fall back to manual cleaning:
- Copy the paper directory
- Remove
.git/,__pycache__/,.DS_Store - Remove commented-out text blocks (lines starting with
%that aren't TeX directives) - Remove unused
.texfiles not referenced by\inputor\include
7. Post-cleaner fixes in _arXiv/
Apply these fixes to the cleaned copy:
- 4-pass trick: Add
\typeout{get arXiv to do 4 passes}on the line after\end{document}— this ensures arXiv runs enough LaTeX passes to resolve all references - Ensure
.bblexists: Check if a.bblfile exists. If not, compile withpdflatex+bibtex/biberto generate it. arXiv needs the.bbl, not the.bib - Ask before deleting
.bib: If.bblexists, ask the user whether to remove.bibfiles (reduces package size; arXiv uses.bbldirectly) - Clean aux files: Remove
.aux,.log,.out,.blg,.fls,.fdb_latexmk,.synctex.gz,.toc,.lof,.lot,.nav,.snm,.vrb. Do not uselatexmk -CAhere — it also removes.bbl, which arXiv needs - Remove
.git/if it exists in the copy
Phase 3: Verify & package
8. Test compilation
Run in the _arXiv/ directory:
pdflatex -interaction=nonstopmode main.tex
pdflatex -interaction=nonstopmode main.tex
pdflatex -interaction=nonstopmode main.tex
(Three passes to resolve references.) Report:
- Errors: any lines with
!— these are blockers - Warnings: undefined references, missing citations, overfull hboxes
- If compilation fails, report the error but continue to the next steps
9. Extract clean metadata
Parse the .tex file to extract metadata for copy-paste into the arXiv submission form:
- Title: from
\title{...}— strip LaTeX commands, math mode, line breaks - Abstract: from
\begin{abstract}...\end{abstract}— strip LaTeX commands - Authors: from
\author{...}— extract names, strip affiliations/footnotes - Comments: suggest standard format. If the paper has supplementary material, use: "Main text: X pages, X figures. Supplementary Information: X pages, X figures". Otherwise: "X pages, X figures"
Present this in a clean, copy-pasteable format.
10. Create tarball
cd <paper-dir>_arXiv && tar -cvf ../arxiv-submission.tar *
- Warn if the tarball exceeds 50 MB (arXiv's limit is ~50 MB for source)
- Report the file count and size
- Note: arXiv also accepts
.tar.gz— use gzip if close to the limit
11. Final summary
Present:
- Package location and size
- Clean metadata (title, abstract, authors, comments)
- Compilation status (clean / warnings / errors)
- Remaining manual steps:
- Upload
arxiv-submission.tarat https://arxiv.org/submit - Select primary subject area and cross-list categories
- Paste metadata into the form
- Set license (usually CC BY 4.0 or arXiv perpetual non-exclusive)
- Share the submission password with co-authors
- Upload
Rules
- Never modify the original paper directory — all changes happen in the
_arXiv/copy - Ask before destructive operations (deleting
.bib, merging supplements) - If any step fails, continue with the remaining steps and report all issues at the end
- Keep the final summary concise and actionable