hwp

star 4

HWP/HWPX create, read, edit, review, template-fill, QA. Triggers: 한글, .hwp, .hwpx, HWP, HWPX, Korean documents, 한컴오피스, OWPML.

lidge-jun By lidge-jun schedule Updated 6/11/2026

name: hwp description: "HWP/HWPX create, read, edit, review, template-fill, QA. Triggers: 한글, .hwp, .hwpx, HWP, HWPX, Korean documents, 한컴오피스, OWPML."

HWP/HWPX Document Skill

Scope: HWP/HWPX only. Do NOT reference or load skills for other formats (DOCX, PPTX, XLSX, PDF). If the task involves a non-HWP format, stop and tell the user.

OfficeCLI routing and consent rule:

  • First check whether officecli is available with command -v officecli.
  • If installed, recommend OfficeCLI first for HWPX work and experimental rhwp-backed binary HWP read/edit.
  • If missing, do not auto-install. Present choices before proceeding:
    1. Install forked OfficeCLI from https://github.com/lidge-jun/OfficeCLI via bash "$(npm root -g)/cli-jaw/scripts/install-officecli.sh".
    2. Continue only with lightweight HWP/HWPX fallback scripts for the current task, with limitations stated.
    3. Stop or cancel.
  • Before taking a lightweight fallback path, ask the user again and state what HWP/HWPX features may be lost.
  • If the user chooses lightweight mode, save that preference to memory for future Office work.
  • Use upstream/vanilla iOfficeAI/OfficeCLI only when the user explicitly asks for upstream behavior.

OfficeCLI is the recommended advanced backend for HWPX work and experimental rhwp-backed binary HWP read/edit. Lightweight fallback: Python OOXML/OWPML scripts (scripts/*.py) for what OfficeCLI cannot cover, or when the user chooses lightweight mode: HWP→HWPX conversion, template assembly, pattern-match editing, direct XML edits. See §3. Triggers: "한글", ".hwpx", ".hwp", "HWP", "HWPX", Korean documents, 한컴오피스, OWPML.

OfficeCLI install contract inside cli-jaw: after user approval only, install or refresh OfficeCLI through bash "$(npm root -g)/cli-jaw/scripts/install-officecli.sh". This installs the supported fork from https://github.com/lidge-jun/OfficeCLI used for CJK/rhwp workflows. Do not use direct upstream install snippets unless the task explicitly asks for vanilla upstream OfficeCLI behavior.

OfficeCLI discovery rule: run officecli hwp doctor --json and officecli capabilities --json before claiming binary .hwp support. Use officecli hwp --json for current rhwp recipes/policies and officecli help <format> ... --json for DOCX/XLSX/PPTX-style schema help.

Same-file execution rule: run OfficeCLI commands against the same .hwpx or .hwp sequentially. Do not run officecli view, officecli validate, officecli query, or officecli get in parallel against one package. If a file lock occurs, stop and report the exact command and path before making a copy or retrying.


1. Quick Decision

Task OK? Command
Format like existing .hwpx Yes cp source.hwpx target.hwpx && officecli open target.hwpx — inherit styles. See §2
Template-based create Yes python3 scripts/build_hwpx.py --template {base|gonmun|minutes|proposal|report} --output out.hwpx — see §4
Create new .hwpx Yes officecli create file.hwpx
Create from Markdown Yes officecli create file.hwpx --from-markdown input.md
Read / analyze .hwpx Yes view text, annotated, outline, stats, html, markdown, tables, forms, objects
Edit existing .hwpx Yes set, add, remove, move, swap
Label-based fill Yes set /table/fill --prop 'fill:label=val'
Form recognize Yes view forms --auto (label-value auto-detect)
Table map Yes view tables (2D grid + labels)
Markdown export Yes view markdown
Equation (수식) Yes add --type equation --prop 'script={1 over 2}'
Object finder Yes view objects (picture/field/bookmark/equation)
Query (expanded) Yes query 'tc[text~=홍길동]', :has(), > combinator
Template merge Yes merge template.hwpx out.hwpx --data '{"key":"val"}'
Swap elements Yes swap file.hwpx '/p[1]' '/p[2]'
Column break Yes add --type columnbreak --prop cols=2
Image anchor / floating Yes add --type picture --prop anchor=page --prop halign=center
Field types Yes add --type author|title|lastsaveby|filename
Compare documents Yes compare a.hwpx b.hwpx (LCS diff + table compare)
Security validation Yes ZIP bomb, path traversal, symlink, XXE defense
Broken ZIP recovery Yes corrupted HWPX auto-recovery via Local File Header scan
HTML preview Yes view html --browser
Watch live preview Yes watch file.hwpx
Validate .hwpx Yes validate (9-level check)
Raw XML Yes raw, raw-set
Watermark (image) Yes add --type watermark --prop src=img.png (opaque RGB preferred)
Pattern-match editing Python (L4) scripts/hwpx_cli.py open → pattern edit XML → save — see §16
Visual QA Python (L3) scripts/contact_sheet.py + subagent review with reference/visual_qa_prompt.md
New form field creation Blocked source prototype exists; Hancom verification not closed
Diagnose binary .hwp support Yes, experimental officecli hwp doctor --json then inspect officecli capabilities --json
Create new .hwp (binary) Yes, experimental officecli create file.hwp --json; requires packaged rhwp-field-bridge or OFFICECLI_RHWP_API_BIN
Read/render/export .hwp (binary) Yes, experimental officecli view file.hwp text --json; svg, png, pdf, markdown, thumbnail, info, diagnostics, dump, pages are capability-gated
Edit simple .hwp text/fields Yes, experimental Use officecli hwp --json recipes; prefer --prop output=out.hwp
Insert new .hwp body text Yes, experimental officecli add file.hwp /text --type paragraph --prop value=본문 --prop output=out.hwp --json; requires insert_text.ready=true
Edit .hwp/.hwpx table cell by rhwp coordinates Yes, experimental officecli set file.hwp /table/cell ...; .hwpx also routes through rhwp when mutation runtime is ready
Use native rhwp API not yet modeled as a high-level command Yes, experimental officecli view file.hwp native --op get-style-list --json; officecli set file.hwp /native-op --prop op=split-paragraph --prop output=out.hwp --json
Export HWPX to .hwp Yes, experimental officecli set input.hwpx /save-as-hwp --prop output=out.hwp --json
Safe in-place .hwp text replace Yes, experimental Only when capabilities.formats.hwp.operations.replace_text.safeInPlace.ready=true; use --in-place --backup --verify
Convert .hwp to .hwpx Fallback scripts/hwp_convert.py IN.hwp OUT.hwpx only when the requested native rhwp operation is not ready or not yet wired

2. Reference-Based Editing (Edit > Create from Scratch)

When the user says "format like X.hwpx", "공문 양식처럼", "기존 보고서 스타일", or provides a source file — start from the source. Don't rebuild from scratch.

Workflow

  1. Copy the source: cp source.hwpx target.hwpx — inherits header.xml (styles), section0.xml (structure), META-INF
  2. Open with officecli open target.hwpx — daemon returns immediately (do NOT run as run_in_background)
  3. Remove body paragraphs only — keep header.xml (charPr/paraPr/borderFill), META-INF, settings
  4. Add new content using existing styleidref values — they auto-apply

Why This Matters

HWPX header.xml holds all style definitions (charPr, paraPr, borderFill, listItems). Rebuilding these from scratch:

  • Breaks styleidref cross-references in section0.xml
  • Loses consistent 공문/보고서 visual conventions
  • Breaks validation (officecli validate fails)
  • Takes 10× longer than modifying the copy

Template Sources (priority order)

  1. User-provided source file — first-class template
  2. tests/fixtures/agentic/*.hwpx — realistic samples (gonmun with headings, report, minutes with tables)
  3. templates/{base,gonmun,minutes,proposal,report}/Template Assembly system (see §4)
  4. officecli create blank — only when nothing else applies

Example — Official Letter (공문) Reuse

# Method A: direct copy
cp SampleGonmun.hwpx MyGonmun.hwpx
officecli open MyGonmun.hwpx
# Use /table/fill to replace label-value cells
officecli set MyGonmun.hwpx /table/fill --prop '문서번호=2026-123'
officecli set MyGonmun.hwpx /table/fill --prop '수신=관계 부서장'
officecli close MyGonmun.hwpx

# Method B: template assembly (see §4)
python3 scripts/build_hwpx.py --template gonmun --output MyGonmun.hwpx
# Then edit with officecli as above

3. Reference Materials & Script Map

officecli covers most HWPX operations. For template assembly, direct XML editing, HWP conversion, and pattern matching, use these references + Python scripts.

References (reference/ — singular)

File Read when Contains
reference/hwpx-format.md Before any direct XML edit OWPML ZIP structure, namespaces, file layout, mimetype
reference/header-xml-guide.md Adding/modifying charPr/paraPr/borderFill/listItems styles How to add new styles to header.xml — required reading for style customization
reference/section0-xml-guide.md Paragraph/table/mixed-formatting direct XML XML template for section0.xml bodies
reference/style_id_maps.md Style ID lookup for template overlay Complete style ID index for base/gonmun/minutes/proposal/report templates
reference/dependencies.md First-time setup / environment check Python/system packages needed (pyhwp, lxml, soffice, JAVA_HOME)
reference/visual_qa_prompt.md Visual QA via subagent Ready-to-use prompt for PDF-image inspection
reference/table_templates/*.xml Inserting pre-built tables 2x6, 3x3, 4x4, 5x4 grid XML fragments

Scripts (scripts/) — Python OWPML Toolkit

Script Run when Command
scripts/hwpx_cli.py Unified Python CLI (14+ commands) — unpack, save, text, search, replace, batch-replace, tables, fill-table, validate, page-guard, toc, chunk, search-chunks, repair, content-check, insert-table, structure python3 scripts/hwpx_cli.py {command} ...
scripts/build_hwpx.py Template-based creation (§4) python3 scripts/build_hwpx.py --template {type} --output X.hwpx
scripts/analyze_template.py Inspect template structure before overlay python3 scripts/analyze_template.py work/
scripts/create_document.py Create empty or custom HWPX python3 scripts/create_document.py OUT.hwpx
scripts/table_builder.py Build table XML from Python objects Used internally by insert-table command
scripts/page_guard.py Detect paragraph/table/text drift vs reference doc python3 scripts/page_guard.py -r ref.hwpx -o out.hwpx
scripts/contact_sheet.py QA contact sheet (page grid image) python3 scripts/contact_sheet.py INPUT.pdf sheet.png
scripts/validate.py 9-level structural validation (Python fallback) python3 scripts/validate.py INPUT.hwpx
scripts/hwp_reader.py Read HWP 5.0 binary (OLE2, read-only) python3 scripts/hwp_reader.py INPUT.hwp
scripts/hwp_convert.py HWP → HWPX conversion (H2Orestart-based) python3 scripts/hwp_convert.py IN.hwp OUT.hwpx
scripts/text_extract.py Extract plain text from HWPX python3 scripts/text_extract.py INPUT.hwpx
scripts/ooxml/pack.py / unpack.py ZIP atomicity helpers Used internally by other scripts

Fixed-Layout Exam Visual QA

KICE-style exam sheets (국어 영역, 수학 영역, two-column <hp:colPr type="NEWSPAPER" colCount="2">, dense question numbering) are fixed-layout documents. Treat visual fidelity as stricter than generic text mutation:

  1. Do not insert QA/proof markers into the visible body. Strings such as [CU TEMPLATE EDIT ...], VISUAL QA, or edited via Hancom Office HWP UI inside the question body are visual hard failures.
  2. Capture before/after screenshots from Hancom Office HWP at the same zoom and page position. Store proof in screenshot paths, logs, or sidecar evidence, not in the document's first column, answer choices, header tables, or floating title/page-number objects.
  3. If the requested edit changes exam content, replace only the intended anchor text/run. Preserve column breaks, question numbering, answer-choice tab layout, equations, and floating objects.
  4. For direct XML edits, strip all <hp:linesegarray> layout cache entries before repacking so Hancom recalculates layout on open.
  5. Pass criteria: same page count, same two-column structure, no unexpected body marker, no visible drift except the requested content change.

Editing Escalation Ladder

When officecli can't do the job, escalate in this order:

Level When Tool
L1 officecli high-level Typical add/set/remove, label-fill, view modes officecli add/set/remove/query/view/merge
L2 officecli raw / raw-set Direct section0.xml / header.xml tweaks officecli raw FILE /Contents/section0.xml or raw-set
L3 Python script Bulk find/replace, batch-replace, template assembly, pattern-match python3 scripts/hwpx_cli.py ... or scripts/build_hwpx.py
L4 Unpack → edit XML → repack (with lineseg strip) KICE exams, regulations, anything requiring multi-file XML edit scripts/hwpx_cli.py open → edit work/Contents/*.xml → strip lineseg → save

Escalation signals:

  • officecli cannot add custom style → L2 (raw-set header.xml) + read reference/header-xml-guide.md
  • Custom template overlay → L3 (scripts/build_hwpx.py) + read reference/style_id_maps.md
  • HWP binary input → check officecli hwp doctor --json first; use native rhwp-backed officecli operations when ready, otherwise L3 (scripts/hwp_convert.py, then edit HWPX)
  • Multi-file pattern match (exam questions, regulations) → L4 (see §16)
  • Style ID lookup → Read reference/style_id_maps.md FIRST

4. Template Assembly (HWP 전용)

HWP has a unique base + overlay template system. Most HWPX creation for 공문/보고서/회의록/제안서 should use this instead of officecli create blank.

Available Templates

Template Purpose
templates/base/ Empty HWPX skeleton (mimetype, META-INF, empty header/section)
templates/gonmun/ Official letter (공문) styles — 문서번호, 수신, 참조, 제목, 본문
templates/minutes/ Meeting minutes (회의록) styles — 일시, 장소, 참석자, 안건, 결정사항
templates/proposal/ Proposal (제안서) styles — 제안개요, 배경, 내용, 기대효과
templates/report/ Report (보고서) styles — 요약, 현황, 분석, 제언

Method 1: build_hwpx.py (recommended)

python3 scripts/build_hwpx.py --template report --output Q4Report.hwpx
# Then edit content with officecli
officecli open Q4Report.hwpx
officecli set Q4Report.hwpx /table/fill --prop '제목=2026 Q4 보고'
# ... continue editing ...

Method 2: Manual Overlay (for customization)

# 1. Copy base skeleton
cp -r templates/base/ work/

# 2. Overlay domain-specific styles
cp -r templates/gonmun/* work/Contents/

# 3. Edit header.xml and section0.xml as needed
#    Reference: reference/header-xml-guide.md, reference/section0-xml-guide.md
#    Style IDs: reference/style_id_maps.md

# 4. Repack as HWPX (ZIP with strip + minify)
python3 scripts/ooxml/pack.py work/ out.hwpx

# 5. Validate
officecli validate out.hwpx

Style ID Reference

Every template defines charPr/paraPr/borderFill style IDs. To customize without breaking cross-references:

  1. Read reference/style_id_maps.md for the template's complete ID index
  2. Use scripts/analyze_template.py work/ to inspect current structure
  3. Add new styles via reference/header-xml-guide.md patterns — preserve existing IDs

5. Subskill & Resource Map

HWP does not have bonus subskill folders (unlike docx's officecli-academic-paper/ or pptx's morph-ppt/). All auxiliary resources live directly inside this skill:

Resource Location Purpose
Reference docs ./reference/*.md XML guides, style ID maps, QA prompts — see §3
Python scripts ./scripts/*.py OOXML toolkit, HWP conversion, template assembly — see §3
Templates (base+overlay) ./templates/{base,gonmun,minutes,proposal,report}/ Template assembly system — see §4
Test fixtures ./tests/fixtures/ Pre-built .hwpx samples usable as cp sources — see §2

If future bonus subskills are added (e.g., ./creating.md, ./editing.md, ./officecli-kice-exam/), read them only for the specific task at hand.


6. Design Principles for Korean Documents

Korean Government Form Aesthetics (한국 공공양식 미감)

Korean official documents follow strict visual conventions. Respect them:

  • Tables are the backbone: Korean forms are table-driven. Every label-value pair lives in a precisely merged cell grid. Preserve the grid structure exactly.
  • Heading hierarchy: 제1조 (Articles) > 제1항 (Clauses) > 제1호 (Items). Use styleidref for outline levels. Never flatten the hierarchy.
  • Fixed margins: Government forms use standard A4 margins (top/bottom ~15mm, left/right ~20mm). Do not alter margins on existing documents.
  • Alignment: Body text is JUSTIFY (양쪽 정렬) by default in Korean documents. Headings may be CENTER. Never use LEFT for body text in formal documents.

Uniform Spacing Detection (균등분할)

Korean forms often use uniform character spacing for names in cells: "홍 길 동" (spaces between each character). This is a display convention, not data.

  • When reading: strip uniform spaces to get the actual value ("홍길동").
  • When writing: if the template cell uses uniform spacing, insert spaces to match (e.g., 2-char name "이 준", 3-char "홍 길 동", 4-char "남궁민수").
  • Detection regex: ^(\S)\s(\S)\s(\S)$ etc. (single-char groups separated by 1 space).

Document Type Classification

Type Key Signals Example
exam equation 10+, rect objects KICE 수능/모의고사 시험지
form table 3+, checkboxes (□/■) 대학 신청서, 정부 양식
regulation ○ bullets 10+, 별첨/조항 refs, table 10+ 운영지침, 내규, 시행세칙
report long text, few tables 보고서, 논문
mixed none of above 사업계획서

7. Mandatory Verification (NEVER SKIP)

Treat verification as a gate, not a confirmation. Any failure = REJECT, do not deliver. Fix → re-run the checks → repeat until a pass finds zero new issues (one fix often surfaces another). After 3 rounds without convergence, STOP and report the likely root cause (template / engine-limit / ambiguous) for the user to decide.

After ANY HWPX edit operation, ALWAYS execute these in order:

# 1. Structural validation (MUST pass)
officecli validate output.hwpx

# 2. PDF visual verification (MUST check)
soffice --headless --convert-to pdf --outdir /tmp output.hwpx
# Verify: table positions, guide text removed, checkboxes correct,
#         merged cell text in correct row, numbers not corrupted

# 3. Visual QA via subagent (use reference/visual_qa_prompt.md)
#    python3 scripts/contact_sheet.py /tmp/output.pdf sheet.png
#    Then: dispatch subagent with reference/visual_qa_prompt.md

# 4. If Hancom Office available, also open .hwpx directly

Skip PDF verification = unverified output. Always inform user if soffice is unavailable.


8. Prerequisite Check

# OfficeCLI + rhwp: check only; do not auto-install from a skill.
if ! command -v officecli >/dev/null 2>&1; then
  echo "ASK USER: install forked OfficeCLI from https://github.com/lidge-jun/OfficeCLI, continue lightweight with HWP/HWPX limits, or stop."
  echo "Install command after approval: bash \"\$(npm root -g)/cli-jaw/scripts/install-officecli.sh\""
fi

# LibreOffice: check only; ask before installing when PDF conversion is needed.
which soffice >/dev/null 2>&1 || echo "ASK USER: LibreOffice is not installed; install it for PDF conversion or skip PDF output."

python3 -c "import lxml; import pyhwp" 2>/dev/null || echo "OPTIONAL: pip install lxml pyhwp (for Python fallbacks)"
echo "JAVA_HOME=$JAVA_HOME (required for H2Orestart HWP→HWPX conversion)"

9. Tool Discovery

Always confirm syntax from help before guessing:

officecli --help
officecli hwp --json
officecli hwp doctor --json
officecli capabilities --json
officecli view --help
officecli set --help
python3 scripts/hwpx_cli.py --help
python3 scripts/build_hwpx.py --help

10. Core Workflows

Create & Import & Merge

officecli create doc.hwpx                                    # empty doc
officecli create doc.hwpx --from-markdown input.md           # MD->HWPX (JUSTIFY default)
officecli create doc.hwpx --from-markdown input.md --align left  # left-aligned
officecli create doc.hwp --json                              # binary HWP via rhwp-field-bridge
officecli merge template.hwpx out.hwpx --data '{"이름":"홍길동"}'  # template {{key}} replace
officecli merge template.hwpx out.hwpx --data data.json           # JSON file data

# Template assembly (see §4)
python3 scripts/build_hwpx.py --template gonmun --output gonmun.hwpx

View Modes

officecli view doc.hwpx text                    # line-numbered text
officecli view doc.hwpx annotated               # path + style detail
officecli view doc.hwpx outline                 # headings only
officecli view doc.hwpx stats                   # document statistics
officecli view doc.hwpx html --browser          # A4 HTML preview
officecli view doc.hwpx markdown                # GFM markdown export
officecli view doc.hwpx tables                  # table 2D grid + label map
officecli view doc.hwpx forms --auto            # CLICK_HERE + label-value auto-detect
officecli view doc.hwpx forms --auto --json     # JSON for AI pipeline
officecli view doc.hwpx objects                 # picture/field/bookmark/equation list
officecli view doc.hwpx objects --object-type field  # filter by type
officecli view doc.hwpx styles                  # charPr/paraPr styles
officecli view doc.hwpx issues                  # 9-level validation issues

Edit

officecli add doc.hwpx /section[1] --type paragraph --prop text="content" --prop fontsize=11
officecli add doc.hwpx /section[1] --type table --prop rows=3 --prop cols=4
officecli set doc.hwpx '/section[1]/p[1]' --prop bold=true --prop align=CENTER
officecli set doc.hwpx / --prop find="old" --prop replace="new"
officecli remove doc.hwpx /section[1]/p[3]
officecli swap doc.hwpx '/p[1]' '/p[2]'

Label Fill (table auto-fill)

officecli set doc.hwpx / --prop 'fill:대표자=홍길동' --prop 'fill:연락처=010-1234'
officecli set doc.hwpx / --prop 'fill:주소>down=서울시'   # direction: right(default), down, left, up
officecli set doc.hwpx /table/fill --prop '이름=김서준'    # fill: prefix optional

Query (extended syntax)

officecli query doc.hwpx 'p'                          # all paragraphs
officecli query doc.hwpx 'tc[text~=홍길동]'           # cell text search
officecli query doc.hwpx 'run[bold=true]'              # bold runs
officecli query doc.hwpx 'p:has(tbl)'                  # paragraphs containing tables
officecli query doc.hwpx 'tbl > tr > tc[colSpan!=1]'   # merged cells
officecli query doc.hwpx 'run[fontsize>=20]'           # 20pt+ font
officecli query doc.hwpx 'p[heading=1]'                # heading 1

Operators: =, !=, ~= (contains), >=, <= Pseudo: :empty, :contains(text), :has(child), :first, :last Virtual attrs: text, bold, italic, fontsize, colSpan, rowSpan, heading

Resident Mode (live connection)

officecli open doc.hwpx          # returns IMMEDIATELY; daemon in bg
officecli view text               # view without re-opening
officecli set '/p[1]' --prop bold=true
officecli close                   # close session

Do NOT run officecli open as a background shell job. It returns immediately and the daemon lives in the background automatically. Running it as a monitored shell creates zombies and file locks.

Batch Mode (multiple commands)

officecli batch doc.hwpx <<'EOF'
view text
view stats
view forms --auto
EOF

Error decoding: 'X' is an invalid start of a value = shell syntax leaked into JSON-style batch. Use heredoc with single-quoted delimiter <<'EOF'.

Compare

officecli compare a.hwpx b.hwpx                    # text diff (default)
officecli compare a.hwpx b.hwpx --mode outline      # heading diff
officecli compare a.hwpx b.hwpx --mode table --json  # table diff JSON

LCS DP alignment (fallback greedy for >10M cells). Table similarity: dimension weight 0.3 + content weight 0.7. Page range filtering: --pages "1-3,5".

Validate

officecli validate doc.hwpx

9-level: ZIP integrity, package (mimetype/rootfile/version), XML, IDRef, table structure, namespace, BinData orphan, field pairs, section count.

Image & Watermark

# Inline image
officecli add doc.hwpx /section[1] --type picture --prop path=/path/to/image.png

# Page-centered floating image
officecli add doc.hwpx /section[1] --type picture \
  --prop path=/path/to/image.png \
  --prop anchor=page --prop halign=center --prop valign=middle

# Watermark (opaque RGB PNG recommended; avoid transparent PNGs)
officecli add doc.hwpx /section[1] --type watermark \
  --prop src=/path/to/watermark.png --prop bright=0 --prop contrast=0

# Adjust position after creation
officecli set doc.hwpx '/section[1]/p[2]/run[1]/pic[1]' \
  --prop x=1111 --prop y=2222 --prop lock=1 --prop wrap=topbottom

Watch & HTML Preview

officecli watch doc.hwpx           # auto-refresh HTML on file change
officecli unwatch doc.hwpx         # stop
officecli view doc.hwpx html --browser  # one-shot A4 preview

Pre-Delivery Checklist

  • officecli validate passes (0 errors)
  • soffice --headless --convert-to pdf → visual check
  • Table cells in correct positions (cellAddr mapping)
  • Guide text (※, 예시) fully removed
  • Checkboxes □/■ in intended cells only
  • Merged cell text in correct row
  • If Hancom available, open .hwpx directly

11. Common Pitfalls

Pitfall Correct Approach
--props text=Hello --prop text=Hello -- singular --prop always
/body/p[1] path HWPX uses /section[1]/p[1] -- section-based, not body
.hwp (binary) open Run officecli hwp doctor --json; if ready, use native .hwp read/edit/create recipes. Convert to .hwpx only when the requested native rhwp operation is not ready or not yet wired
Unquoted [N] in shell "/section[1]/p[1]" -- always quote paths
fontsize omitted --prop fontsize=11 always -- prevents charPr 0 pollution
officecli view file.hwpx (no mode) Error. Must specify: text, markdown, tables, etc.
Manual table mapping view tables replaces manual inspection
HWP->HWPX text replace Whole-paragraph <t> -- use raw string replace; p[0] may contain page-number fragments
Recreating header.xml styles that exist in template cp source.hwpx target.hwpx first. Read reference/style_id_maps.md before custom styling. See §2
officecli open as background shell Run foreground — returns immediately, daemon runs in bg automatically. Background shell spawn creates zombies
Direct XML edit without lineseg strip Stale <hp:linesegarray> cache causes text overlap. Use scripts/hwpx_cli.py (strips automatically) or apply lineseg strip manually (see §16)
Custom style work without reading reference/ reference/header-xml-guide.md + reference/style_id_maps.md are mandatory reading. See §3

Essential Rules

  • Auto-normalization: PUA removal and uniform spacing collapse are applied automatically.
  • Transport parity: CLI, Resident, and MCP all support the same view modes (tables, markdown, objects, forms).

12. Form Recognition & Fill

4-Strategy Recognition

  1. Adjacent cell label-value -- table label->value detection (default)
  2. Header+data rows -- column-header recognition
  3. In-cell patterns -- checkbox, keyword( ) paren-blank, (label: ) annotation
  4. KV table detection -- 16 Korean keywords trigger auto-detection

3-Phase Fill Pipeline

  1. In-cell patterns -- checkbox ->, paren-blank fill, annotation fill
  2. Table label-value -- exact + prefix 60% matching, 4-directional (right/down/left/up)
  3. Inline paragraph -- regex lookbehind for "label: value" outside tables

AI Form Fill Workflow

officecli view form.hwpx forms --auto --json > fields.json  # Step 1: recognize
# Step 2: AI maps label->value
officecli set form.hwpx /table/fill --prop '성 명=홍길동'   # Step 3: fill

Confidence & Feedback

  • Fill returns unmatched labels (labels without matching cells reported)
  • Font-size heading detection: H1 >= 1.5x, H2 >= 1.3x, H3 >= 1.15x base
  • Multi-<hp:t> in-cell replacement handles fragmented text nodes
  • Form confidence score included in recognition output

Regulation-Specific Patterns

  • Checkbox hierarchy: (section) -> (item) -> - (detail) -> * (footnote)
  • Appendix references: [별첨 제N호], [별지 N] -- linked to form templates
  • Digit-concatenated headings: "3지원금 집행기준" (no space between number and title)
  • Uniform footer: repeated identical footers -> org extraction

Exam XML Structure Patterns (condensed)

Pattern Description Detection
Page/Column breaks pageBreak="1" / columnBreak="1" on <hp:p> Page boundary = question group boundary
p[0] Monster secPr + colPr + title tbl + question 1 text merged Everything in first paragraph
Equation interleaving <t><equation> alternating pattern Skip equations during text extraction
Answer choices + 5 <equation> (5-choice) Auto-detect answer paragraphs
Text fragmentation 1-2 char <t> splits (HWP conversion) Concatenate all text then match
2-column layout <hp:colPr type="NEWSPAPER" colCount="2"> Exam-specific layout

13. Security

Check Limits
ZIP bomb 1000 entries, 200 MB, 100:1 ratio
Path traversal null byte, .., absolute path, drive letter, symlink
XXE DtdProcessing.Prohibit
Table size 200 cols x 10000 rows

14. HWP->HWPX Conversion

Format Detection

file doc.hwpx   # "Zip archive" -> HWPX (ZIP + OWPML XML)
file doc.hwp    # "HWP Document" -> HWP 5.0 binary

Native HWP via rhwp Bridge (experimental)

Before operating on binary .hwp, check runtime readiness:

officecli hwp doctor --json
officecli capabilities --json
officecli hwp --json

Supported claims are capability-gated. Current safe recipes include:

officecli create file.hwp --json
officecli get file.hwp /provider/rhwp --json
officecli view file.hwp text --json
officecli view file.hwp svg --page 1 --json
officecli view file.hwp png --page 1 --out /tmp/hwp-png --json
officecli view file.hwp pdf --page 1 --out out.pdf --json
officecli view file.hwp markdown --json
officecli view file.hwp thumbnail --out thumb.png --json
officecli view file.hwp info --json
officecli view file.hwp diagnostics --json
officecli view file.hwp dump --json
officecli view file.hwp pages --page 1 --json
officecli view file.hwp fields --json
officecli view file.hwp field --field-name 회사명 --json
officecli view file.hwp table-cell --section 0 --parent-para 3 --control 0 --cell 0 --cell-para 0 --json
officecli view file.hwp tables --section 0 --json
officecli view file.hwp native --op get-style-list --json
officecli set file.hwp /field --prop name=회사명 --prop value=리지 --prop output=out.hwp --json
officecli add file.hwp /text --type paragraph --prop value=새본문 --prop output=out.hwp --json
officecli set file.hwp /text --find 마케팅 --replace 브릿지 --prop output=out.hwp --json
officecli set file.hwp /text --find 마케팅 --replace 브릿지 --in-place --backup --verify --json
officecli set file.hwp /table/cell --prop section=0 --prop parent-para=3 --prop control=0 --prop cell=0 --prop value=오피스셀 --prop output=out.hwp --json
officecli set file.hwp /convert-to-editable --prop output=editable.hwp --json
officecli set file.hwp /native-op --prop op=split-paragraph --prop paragraph=0 --prop offset=5 --prop output=out.hwp --json
officecli set input.hwpx /save-as-hwp --prop output=out.hwp --json

Policy:

  • Blank .hwp creation is native when capabilities.formats.hwp.operations.create_blank.ready=true.
  • get /provider/rhwp is metadata-only for binary .hwp; use it to inspect the packaged sidecar and provider capabilities, not as a generic DOM query surface.
  • Body text insertion is native when capabilities.formats.hwp.operations.insert_text.ready=true; use a separate output path and inspect the returned safe-save transaction.
  • PDF/PNG/markdown/thumbnail/info/diagnostics/dump/page exports are native when the matching export_pdf, render_png, export_markdown, thumbnail, document_info, diagnostics, dump_controls, or dump_pages capability is ready.
  • view file.hwp native --op <name> is the read-only rhwp escape hatch for pinned native primitives such as styles and full-text search (--op search-all-text --native-arg text=...). Mutating native primitives still use set /native-op with explicit operation props and output-first mutation. (rhwp engine pinned at v0.7.12; the last five were wired 2026-06-20.)
  • HWPX → HWP export is native when capabilities.formats.hwp.operations.save_as_hwp.ready=true; always verify with readback/Hancom before production use.
  • Prefer output mode for binary .hwp mutation: --prop output=out.hwp.
  • Use in-place mode only for /text replacement, only when explicitly requested, and only after capabilities.formats.hwp.operations.replace_text.safeInPlace.ready=true.
  • In-place mode must include --in-place --backup --verify and must not include --prop output=....
  • Treat table-cell mutation as coordinate-based and experimental.
  • If pinned/latest rhwp exposes a native HWP primitive that OfficeCLI has not wired yet, report it as an OfficeCLI capability/wiring gap, not as proof that native HWP cannot do it.
  • Verify edited output with view text, view svg, and Hancom open/render evidence before relying on it.

v0.7.12 native recipes (wired 2026-06-20; gate on officecli hwp doctor --json first). Read-only native operations exposed through view file.hwp native --op <name> do not require an output path; use that route for search-all-text. Native operations that are still only exposed through set /native-op require --prop op=<name> + --prop output=<path> because OfficeCLI writes a working copy. Verified against officecli 1.0.115+.

# full-text search (read)
officecli view file.hwp native --op search-all-text --native-arg text="검색어" --native-arg case-sensitive=false --native-arg include-cells=true --json
# page overlay images (read)
officecli set file.hwp /native-op --prop op=get-page-overlay-images --prop page-num=0 --prop output=out.hwp --json
# header/footer picture properties (read)
officecli set file.hwp /native-op --prop op=get-hf-picture-properties --prop section=0 --prop outer-para=0 --prop outer-control=0 --prop inner-para=0 --prop inner-control=0 --prop output=out.hwp --json
# insert a numbered-list start (mutate)
officecli set file.hwp /native-op --prop op=insert-new-number --prop section=0 --prop paragraph=0 --prop offset=0 --prop start-num=1 --prop output=out.hwp --json
# set header/footer picture properties (mutate)
officecli set file.hwp /native-op --prop op=set-hf-picture-properties --prop section=0 --prop outer-para=0 --prop outer-control=0 --prop inner-para=0 --prop inner-control=0 --prop props-json='{...}' --prop output=out.hwp --json

Binary .hwp Structured Creation Pattern

Use this pattern when the user asks for a complex native .hwp from scratch. Do not build the body by repeatedly inserting long text into the same paragraph/offset. Long Korean text can be reflowed by the rhwp readback provider, which can make safe-save semantic-delta checks fail even when a temp output was written. Repeated same-position insertion also produces one crowded run instead of a structured document.

Create structure first, then fill it:

  1. Run officecli hwp doctor --json and officecli capabilities --json; continue only when the required native .hwp operations are ready.
  2. Create the blank .hwp.
  3. Add enough paragraphs with native-op insert-paragraph.
  4. Insert short text chunks into explicit paragraph indexes with add /text --type paragraph.
  5. Create tables at known paragraph positions with native-op create-table.
  6. Discover table coordinates with view tables --include-empty.
  7. Fill cells with /table/cell using the discovered section, parent-para, control, cell, and cell-para.
  8. Apply formatting with native-op apply-char-format / apply-para-format.
  9. Verify with view text, view tables, and a PDF/app-open render.

Example:

officecli create complex.hwp --json

officecli set complex.hwp /native-op \
  --prop op=insert-paragraph \
  --prop paragraph=0 \
  --prop output=01_para.hwp \
  --json

officecli add 01_para.hwp /text \
  --type paragraph \
  --prop paragraph=0 \
  --prop offset=0 \
  --prop value="복합 HWP 생성 보고서" \
  --prop output=02_title.hwp \
  --json

officecli set 02_title.hwp /native-op \
  --prop op=create-table \
  --prop paragraph=11 \
  --prop offset=0 \
  --prop rows=5 \
  --prop cols=4 \
  --prop output=03_table.hwp \
  --json

officecli view 03_table.hwp tables \
  --max-parent-para 20 \
  --max-control 4 \
  --max-cell 20 \
  --include-empty \
  --json

officecli set 03_table.hwp /table/cell \
  --prop section=0 \
  --prop parent-para=12 \
  --prop control=0 \
  --prop cell=0 \
  --prop cell-para=0 \
  --prop value="항목" \
  --prop output=04_cell.hwp \
  --json

officecli set 04_cell.hwp /native-op \
  --prop op=apply-char-format \
  --prop paragraph=0 \
  --prop start=0 \
  --prop end=12 \
  --prop props-json='{"bold":true,"fontSize":2200}' \
  --prop output=05_style.hwp \
  --json

officecli view 05_style.hwp text --json
officecli view 05_style.hwp tables --include-empty --json
officecli view 05_style.hwp pdf --out preview.pdf --json

Debug rule: if an insert fails with semantic-delta after a long single string, do not loosen safe-save and do not switch to HWPX. Split the body into shorter visible chunks, write each chunk to an explicit paragraph, then verify anchors and table cells from the output copy.

Conversion

# Python fallback (H2Orestart-based)
python3 scripts/hwp_convert.py input.hwp output.hwpx

# Legacy read-only HWP inspection (no conversion)
python3 scripts/hwp_reader.py input.hwp

Structural Differences

Aspect Native HWPX HWP->HWPX Converted
Text unit Short <t> per run Entire paragraph in one <t>
Title p[0] secPr + tbl + content Page number fragments <t>20</t> + <t>1</t> mixed in
Editing Run-level precise replacement Raw string replace or whole-paragraph swap needed

Editing Strategies for Converted Files

  1. Title: run-aware replacement -- set_run_text(p0, 'old', 'new') (skip page-number runs)
  2. Body: raw string replace on serialized XML -- sec0.replace(old, new)
  3. Multi-<t> cells: use ReplaceTextInCell() -- concatenate all <t> -> match -> redistribute

15. Equation Handling (수식)

HWPX equations use Hancom's proprietary script language. NOT MathML, NOT LaTeX, NOT OMML.

Script Result
{1 over 2} 1/2 (fraction)
sqrt{x} square root of x
x^2, x_i superscript, subscript
int _0 ^1 f(x)dx definite integral
sum _{i=1} ^n sigma summation
lim _{x->0} limit
matrix{a&b # c&d} 2x2 matrix
# Create equation
officecli add doc.hwpx /section --type equation --prop 'script=x^2 + y^2 = r^2'
# View all equations
officecli view doc.hwpx objects --object-type equation
# Edit: modify <hp:script> text nodes via Python XML editing

Equation scripts are opaque -- edit only <hp:script> text, not the binary payload. Math exam docs (KICE) require <hp:equation> for every expression. Never use plain text.

Verified: KICE 수능 수학 template (/private/tmp/kice-full-edit-v2.hwpx, 836 equations) -- text/equation edit + lineseg strip -> Hancom OK.


16. Pattern-Match Editing (Python L4 Fallback)

For complex form editing beyond officecli set/find-replace (KICE exams, multi-section regulations, fragmented text nodes):

Core flow: unpack HWPX → strip lineseg → pattern-match edit XML → repack ZIP. Hancom recalculates layout on open.

Tools: scripts/hwpx_cli.py (unpack/search/replace/batch-replace/fill-table — strips lineseg automatically), scripts/ooxml/pack.py.

Key Patterns (non-exhaustive)

Pattern Description Where
Lineseg strip Remove stale <hp:linesegarray> cache Apply on every direct XML write (see below)
Checkbox substitution , with multi-<t> node handling hwpx_cli.py replace or regex
Label→value detection Label cell adjacent to value cell (right/down/left/up) officecli set /table/fill handles most cases
Uniform-space normalization "홍 길 동""홍길동" conversion Automatic in officecli; manual in direct XML
Checkbox hierarchy (section) → (item) → - (detail) → * (footnote) Regulation-specific
Appendix references [별첨 제N호], [별지 N] linked to form templates Regulation-specific
p[0] Monster secPr + tbl + question 1 text merged in first paragraph (HWP-converted files) scripts/hwp_convert.py output requires paragraph-level replace
Equation interleaving <t><equation> alternating Skip equations during text extraction

For repository-internal pattern catalog references (Plan 99.8 / 99.9) see the devlog; the scripts above implement the concrete operations.

Lineseg Strip (critical for direct XML editing)

When editing HWPX XML directly (NOT via officecli or scripts/hwpx_cli.py), you MUST strip ALL <hp:linesegarray> elements. Stale layout cache causes characters to overlap into a single line.

import re
xml = re.sub(r'<(?:hp:)?linesegarray[^>]*>.*?</(?:hp:)?linesegarray>', '', xml, flags=re.DOTALL)
xml = re.sub(r'<(?:hp:)?linesegarray[^/]*/>', '', xml)  # self-closing

officecli and scripts/hwpx_cli.py handle this automatically. This rule applies only to raw Python XML editing.


17. Anti-Patterns (MUST AVOID)

  1. No equations in math exams = broken output -- KICE docs require <hp:equation> elements
  2. No unguarded HWP binary overwrite -- binary .hwp editing is experimental via rhwp; prefer --prop output=...; only use safe in-place /text replacement when safeInPlace.ready=true and the command includes --in-place --backup --verify
  3. No fake HWPX fallback when rhwp has a native HWP primitive -- if OfficeCLI lacks the route, say the route is not wired/capability-gated yet and stop for approval.
  4. No XML editing without lineseg strip -- stale cache causes overlapping text. Use scripts/hwpx_cli.py (auto-strips) or apply the regex in §16
  5. No visible QA markers in fixed-layout exams -- KICE-style documents fail visual QA if proof text is inserted into the question body; use screenshots or sidecar evidence.
  6. No cross-format skill loading -- this skill is .hwp/.hwpx only
  7. Rebuilding styles that exist in template — when user provides a source .hwpx, cp first and read reference/style_id_maps.md. See §2
  8. Ignoring reference materialsreference/header-xml-guide.md, reference/section0-xml-guide.md, and reference/style_id_maps.md are mandatory reading for custom XML work. See §3

18. Dependencies

Tool Purpose Required?
officecli (global) Recommended advanced HWPX CLI + experimental rhwp-backed HWP bridge; fork source is https://github.com/lidge-jun/OfficeCLI Optional; ask before install
python3 Fallback scripts (scripts/*.py) Required for L3/L4
lxml XML processing for scripts/* Required for L3/L4 (pip install lxml)
pyhwp Legacy HWP 5.0 binary reading/conversion fallback Required for HWP→HWPX fallback
soffice (LibreOffice) PDF conversion + visual verification Recommended
Java (JAVA_HOME) H2Orestart HWP conversion engine For HWP→HWPX only
dotnet Build officecli from source For builds only
Install via CLI
npx skills add https://github.com/lidge-jun/cli-jaw-skills --skill hwp
Repository Details
star Stars 4
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator