name: docx description: "Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of "Word doc", "word document", ".docx", or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a "report", "memo", "letter", "template", or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, or general coding tasks."
DOCX creation, editing, and analysis
Overview
A .docx file is a ZIP archive containing XML files.
Quick Reference
| Task | Approach |
|---|---|
| Read/analyze content | pandoc or unpack for raw XML |
| Create new document | Use docx-js (Node.js) - see Creating New Documents below |
| Edit existing document | Unpack ZIP → edit XML → repack - see Editing Existing Documents below |
Converting .doc to .docx
Legacy .doc files must be converted before editing:
libreoffice --headless --convert-to docx document.doc
Reading Content
# Text extraction with tracked changes
pandoc --track-changes=all document.docx -o output.md
# Raw XML access - unzip the docx
mkdir -p unpacked && unzip -o document.docx -d unpacked/
# Main content is in unpacked/word/document.xml
Converting to Images
libreoffice --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page
Accepting Tracked Changes
To produce a clean document with all tracked changes accepted (requires LibreOffice):
# Use a LibreOffice macro to accept all changes
libreoffice --headless --invisible --norestore \
"macro:///Standard.Module1.AcceptAllChanges" input.docx
Or use pandoc to extract clean text (without change tracking markup):
pandoc --track-changes=accept document.docx -o clean.md
Creating New Documents
Generate .docx files with JavaScript using the docx npm package, then validate. Install: npm install -g docx
Setup
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
VerticalAlign, PageNumber, PageBreak } = require('docx');
const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
Page Size
// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results
sections: [{
properties: {
page: {
size: {
width: 12240, // 8.5 inches in DXA
height: 15840 // 11 inches in DXA
},
margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins
}
},
children: [/* content */]
}]
Common page sizes (DXA units, 1440 DXA = 1 inch):
| Paper | Width | Height | Content Width (1" margins) |
|---|---|---|---|
| US Letter | 12,240 | 15,840 | 9,360 |
| A4 (default) | 11,906 | 16,838 | 9,026 |
Landscape orientation: docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap:
size: {
width: 12240, // Pass SHORT edge as width
height: 15840, // Pass LONG edge as height
orientation: PageOrientation.LANDSCAPE // docx-js swaps them in the XML
},
Styles (Override Built-in Headings)
Use Arial as the default font (universally supported). Keep titles black for readability.
const doc = new Document({
styles: {
default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default
paragraphStyles: [
// IMPORTANT: Use exact IDs to override built-in styles
{ id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 32, bold: true, font: "Arial" },
paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } },
{ id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 28, bold: true, font: "Arial" },
paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
]
},
sections: [{
children: [
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
]
}]
});
Lists (NEVER use unicode bullets)
// WRONG - never manually insert bullet characters
new Paragraph({ children: [new TextRun("* Item")] }) // BAD
// CORRECT - use numbering config with LevelFormat.BULLET
const doc = new Document({
numbering: {
config: [
{ reference: "bullets",
levels: [{ level: 0, format: LevelFormat.BULLET, text: "\u2022", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
{ reference: "numbers",
levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
]
},
sections: [{
children: [
new Paragraph({ numbering: { reference: "bullets", level: 0 },
children: [new TextRun("Bullet item")] }),
new Paragraph({ numbering: { reference: "numbers", level: 0 },
children: [new TextRun("Numbered item")] }),
]
}]
});
Tables
CRITICAL: Tables need dual widths - set both columnWidths on the table AND width on each cell.
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };
new Table({
width: { size: 9360, type: WidthType.DXA },
columnWidths: [4680, 4680],
rows: [
new TableRow({
children: [
new TableCell({
borders,
width: { size: 4680, type: WidthType.DXA },
shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
margins: { top: 80, bottom: 80, left: 120, right: 120 },
children: [new Paragraph({ children: [new TextRun("Cell")] })]
})
]
})
]
})
Images
// CRITICAL: type parameter is REQUIRED
new Paragraph({
children: [new ImageRun({
type: "png",
data: fs.readFileSync("image.png"),
transformation: { width: 200, height: 150 },
altText: { title: "Title", description: "Desc", name: "Name" }
})]
})
Page Breaks
new Paragraph({ children: [new PageBreak()] })
// Or use pageBreakBefore
new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })
Table of Contents
// CRITICAL: Headings must use HeadingLevel ONLY - no custom styles
new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })
Headers/Footers
sections: [{
properties: {
page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } }
},
headers: {
default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
},
footers: {
default: new Footer({ children: [new Paragraph({
children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
})] })
},
children: [/* content */]
}]
Critical Rules for docx-js
- Set page size explicitly - defaults to A4; use US Letter (12240 x 15840 DXA) for US documents
- Never use
\n- use separate Paragraph elements - Never use unicode bullets - use
LevelFormat.BULLETwith numbering config - PageBreak must be in Paragraph - standalone creates invalid XML
- ImageRun requires
type- always specify png/jpg/etc - Always set table
widthwith DXA - never useWidthType.PERCENTAGE(breaks in Google Docs) - Tables need dual widths -
columnWidthsarray AND cellwidth, both must match - Use
ShadingType.CLEAR- never SOLID for table shading - TOC requires HeadingLevel only - no custom styles on heading paragraphs
- Override built-in styles - use exact IDs: "Heading1", "Heading2", etc.
- Include
outlineLevel- required for TOC (0 for H1, 1 for H2, etc.)
Editing Existing Documents
Follow all 3 steps in order.
Step 1: Unpack
Since .docx is a ZIP archive, unpack it to access the XML:
mkdir -p unpacked
unzip -o document.docx -d unpacked/
# Pretty-print the main document XML for easier editing
python3 -c "
import xml.dom.minidom, sys
doc = xml.dom.minidom.parse('unpacked/word/document.xml')
with open('unpacked/word/document.xml', 'w') as f:
f.write(doc.toprettyxml(indent=' '))
"
Step 2: Edit XML
Edit files in unpacked/word/. See XML Reference below for patterns.
Use "Claude" as the author for tracked changes and comments, unless the user explicitly requests a different name.
Use the edit_file tool directly for string replacement. Do not write Python scripts. Scripts introduce unnecessary complexity. The edit_file tool shows exactly what is being replaced.
CRITICAL: Use smart quotes for new content. When adding text with apostrophes or quotes, use XML entities:
<w:t>Here’s a quote: “Hello”</w:t>
| Entity | Character |
|---|---|
‘ |
left single quote |
’ |
right single quote / apostrophe |
“ |
left double quote |
” |
right double quote |
Adding comments: Edit the XML directly in unpacked/word/comments.xml to add comment entries, then add markers to document.xml (see Comments in XML Reference below).
Step 3: Repack
Repack the directory into a .docx file:
cd unpacked && zip -r ../output.docx . -x ".*" && cd ..
Optionally validate with LibreOffice by opening the result:
libreoffice --calc output.docx
Common Pitfalls
- Replace entire
<w:r>elements: When adding tracked changes, replace the whole<w:r>...</w:r>block with<w:del>...<w:ins>...as siblings - Preserve
<w:rPr>formatting: Copy the original run's<w:rPr>block into your tracked change runs
XML Reference
Schema Compliance
- Element order in
<w:pPr>:<w:pStyle>,<w:numPr>,<w:spacing>,<w:ind>,<w:jc>,<w:rPr>last - Whitespace: Add
xml:space="preserve"to<w:t>with leading/trailing spaces - RSIDs: Must be 8-digit hex (e.g.,
00AB1234)
Tracked Changes
Insertion:
<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:t>inserted text</w:t></w:r>
</w:ins>
Deletion:
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
Inside <w:del>: Use <w:delText> instead of <w:t>, and <w:delInstrText> instead of <w:instrText>.
Minimal edits - only mark what changes:
<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
<w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
<w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>
Deleting entire paragraphs - also mark the paragraph mark as deleted:
<w:p>
<w:pPr>
<w:rPr>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
</w:rPr>
</w:pPr>
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>
</w:del>
</w:p>
Rejecting another author's insertion:
<w:ins w:author="Jane" w:id="5">
<w:del w:author="Claude" w:id="10">
<w:r><w:delText>their inserted text</w:delText></w:r>
</w:del>
</w:ins>
Restoring another author's deletion:
<w:del w:author="Jane" w:id="5">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
<w:r><w:t>deleted text</w:t></w:r>
</w:ins>
Comments
Add comment entries to unpacked/word/comments.xml:
<w:comment w:id="0" w:author="Claude" w:date="2025-01-01T00:00:00Z" w:initials="C">
<w:p>
<w:r><w:t>Comment text here</w:t></w:r>
</w:p>
</w:comment>
Then add markers to document.xml:
<w:commentRangeStart w:id="0"/>
<w:r><w:t>commented text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
CRITICAL: <w:commentRangeStart> and <w:commentRangeEnd> are siblings of <w:r>, never inside <w:r>.
Images
- Add image file to
unpacked/word/media/ - Add relationship to
unpacked/word/_rels/document.xml.rels:
<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>
- Add content type to
unpacked/[Content_Types].xml:
<Default Extension="png" ContentType="image/png"/>
- Reference in document.xml:
<w:drawing>
<wp:inline>
<wp:extent cx="914400" cy="914400"/> <!-- EMUs: 914400 = 1 inch -->
<a:graphic>
<a:graphicData uri=".../picture">
<pic:pic>
<pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>
Dependencies
Install as needed:
- pandoc - text extraction and format conversion
- docx npm package -
npm install -g docx(creating new documents) - LibreOffice - PDF conversion, .doc to .docx conversion (
brew install --cask libreoffice) - Poppler -
pdftoppmfor converting to images (brew install poppler)