name: to-markdown description: "Convert files to Markdown via markitdown — PDFs, Word, Excel, PowerPoint, images, HTML, and other formats. Triggers on 'convert to markdown', 'extract text from', 'import this document', 'turn this PDF into markdown'."
Convert files to Markdown using markitdown — Microsoft's utility that extracts and structures content from many file formats.
Supported Formats
| Format | Notes |
|---|---|
| Text extracted; table structure may be approximate | |
| Word (.docx) | Clean conversion including tables |
| Excel (.xlsx) | Sheets as markdown tables |
| PowerPoint (.pptx) | Slide text and structure |
| HTML | Cleaned readable content |
| Images | OCR (requires LLM vision for best results) |
| Audio | Transcription via SpeechRecognition |
| CSV / JSON / XML | Structured text |
| YouTube URLs | Transcript extraction |
| EPub | Chapter text |
Usage
# Convert to stdout
markitdown input.pdf 2>/dev/null
# Save to file
markitdown input.pdf -o output.md 2>/dev/null
# Or redirect
markitdown input.docx 2>/dev/null > output.md
Always use 2>/dev/null to suppress noisy font/parser warnings that don't affect output quality.
Workflow
Check if markitdown is installed:
which markitdown || echo "not installed"Install if missing (with all format support):
pip install 'markitdown[all]'Run the conversion with stderr suppressed:
markitdown "$INPUT_FILE" 2>/dev/nullHandle the output based on user intent:
- Saving to the knowledge base → write to appropriate
.mdfile - Quick review → show in conversation
- Multiple files → loop and convert each
- Saving to the knowledge base → write to appropriate
Output Quality Notes
- PDFs: Text is extracted faithfully but table cells may land on separate lines (PDF doesn't encode table structure). If the user needs clean tables from a PDF, note this limitation.
- Word/Excel: Usually clean output with proper table formatting.
- Complex layouts: Multi-column PDFs or heavily formatted documents may have scrambled reading order.
- Scanned PDFs: Image-only PDFs produce no text without OCR/LLM vision integration.
Common Use Cases
Import a document into knowledge base:
markitdown report.pdf 2>/dev/null > knowledge/competitive/raw/report.md
Convert a Word doc someone sent you:
markitdown meeting-notes.docx 2>/dev/null > notes.md
Batch convert a directory:
for f in docs/*.pdf; do
markitdown "$f" 2>/dev/null > "${f%.pdf}.md"
done
Check what a PDF contains before deciding what to do with it:
markitdown document.pdf 2>/dev/null | head -50