docx-content-cleaner - SKILL.md Agent Skill

name: docx-content-cleaner description: Analyzes and cleans up markdown artifacts (like bold, italic, links) inside existing .docx files, converting them into proper Word formatting (runs with styles).

A .docx file contains literal markdown symbols (e.g., **text**, ### Header).
You need to "sanitize" or "beautify" a document generated from markdown that didn't parse formatting correctly.
You want to ensure that all "pseudo-formatting" in text is converted to native Word styles.

Extraction: Uses python-docx to iterate through all paragraphs and runs.
Analysis: Uses regex to find markdown patterns inside the text of each run.
Transformation:
- Splices runs to isolate the marked-up text.
- Applies the corresponding Word formatting (Bold, Italic, Style) to the isolated text.
- Removes the markdown symbols.
Repack: Saves the modernized .docx.

Run the provided script in resources/markdown_to_docx_fixer.py:

python resources/markdown_to_docx_fixer.py <input.docx> <output.docx>