to-markdown

star 34

Convert files to Markdown via markitdown — PDFs, Word, Excel, PowerPoint, images, HTML, and other formats. Triggers on 'convert to markdown', 'extract text from', 'import this document', 'turn this PDF into markdown'.

steveclarke By steveclarke schedule Updated 5/12/2026

name: to-markdown description: "Convert files to Markdown via markitdown — PDFs, Word, Excel, PowerPoint, images, HTML, and other formats. Triggers on 'convert to markdown', 'extract text from', 'import this document', 'turn this PDF into markdown'."

Convert files to Markdown using markitdown — Microsoft's utility that extracts and structures content from many file formats.

Supported Formats

Format Notes
PDF Text extracted; table structure may be approximate
Word (.docx) Clean conversion including tables
Excel (.xlsx) Sheets as markdown tables
PowerPoint (.pptx) Slide text and structure
HTML Cleaned readable content
Images OCR (requires LLM vision for best results)
Audio Transcription via SpeechRecognition
CSV / JSON / XML Structured text
YouTube URLs Transcript extraction
EPub Chapter text

Usage

# Convert to stdout
markitdown input.pdf 2>/dev/null

# Save to file
markitdown input.pdf -o output.md 2>/dev/null

# Or redirect
markitdown input.docx 2>/dev/null > output.md

Always use 2>/dev/null to suppress noisy font/parser warnings that don't affect output quality.

Workflow

  1. Check if markitdown is installed:

    which markitdown || echo "not installed"
    
  2. Install if missing (with all format support):

    pip install 'markitdown[all]'
    
  3. Run the conversion with stderr suppressed:

    markitdown "$INPUT_FILE" 2>/dev/null
    
  4. Handle the output based on user intent:

    • Saving to the knowledge base → write to appropriate .md file
    • Quick review → show in conversation
    • Multiple files → loop and convert each

Output Quality Notes

  • PDFs: Text is extracted faithfully but table cells may land on separate lines (PDF doesn't encode table structure). If the user needs clean tables from a PDF, note this limitation.
  • Word/Excel: Usually clean output with proper table formatting.
  • Complex layouts: Multi-column PDFs or heavily formatted documents may have scrambled reading order.
  • Scanned PDFs: Image-only PDFs produce no text without OCR/LLM vision integration.

Common Use Cases

Import a document into knowledge base:

markitdown report.pdf 2>/dev/null > knowledge/competitive/raw/report.md

Convert a Word doc someone sent you:

markitdown meeting-notes.docx 2>/dev/null > notes.md

Batch convert a directory:

for f in docs/*.pdf; do
  markitdown "$f" 2>/dev/null > "${f%.pdf}.md"
done

Check what a PDF contains before deciding what to do with it:

markitdown document.pdf 2>/dev/null | head -50
Install via CLI
npx skills add https://github.com/steveclarke/dotfiles --skill to-markdown
Repository Details
star Stars 34
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator