name: docling description: Expert guidance on document processing with Docling and audio transcription with Whisper.
Docling & Ingestion Expert Skill
This skill provides patterns for processing diverse document formats and audio files.
๐ Document Processing
- Multi-format support: Use
doclingfor converting PDF, DOCX, PPTX, XLSX, HTML, and Markdown to a clean Markdown representation. - HybridChunker: Always use Docling's
HybridChunkerto preserve document structure (headings, lists, tables). - Token Awareness: Configure chunker with
max_tokens=512to align with embedding models (e.g.,text-embedding-3-small).
๐ Audio Transcription
- Whisper ASR: Use
docling's integration with Whisper Turbo for transcribing audio files (.mp3,.wav,.m4a,.flac). - Path Handling: Pass
pathlib.Pathobjects toDocumentConverterfor audio files.
๐๏ธ Pipeline Architecture
- Ingestion Pipeline: Centralize logic in
src/ingestion/ingest.py. - Metadata Extraction: Extract YAML frontmatter and standard document properties.
- Batching: Use batch embedding generation and
insert_manyfor chunks to optimize performance.