name: lean-parsing-completeness description: Use when proving parsing completeness theorems — that a parsing function returns .ok when given well-formed input, or that spec-level success implies native success. Also use when proving position bounds, eliminator lemmas, or chaining monadic parsing steps in Except/Option. allowed-tools: Read, Edit, Write, Bash, Glob, Grep
Parsing Completeness Proof Patterns
Two flavours:
- Spec ⟹ native: spec returns
some (result, rest), prove native returns.ok (result', state')with result/state correspondence. - Native ⟹ properties (eliminator lemmas): native returns
.ok, prove bounds/invariants about the result.
Both share the same core technique. Note: tightening native rejection (rejecting inputs the spec accepts) makes flavour-1 *_complete theorems false — fix by tightening the spec, not the native decoder.
Core technique: destructuring h : f data pos = .ok (r, p)
- Unfold ONE function:
simp only [parseFunction, bind, Except.bind] at h. - Eliminate guard error branches:
by_cases h1 : guard_condition
· rw [if_pos h1] at h; ...
· rw [if_neg h1] at h; exact nomatch h -- error branch impossible
- Case-split on intermediate
.ok/.errorresults:
cases hstep : intermediateCall with
| error e => simp only [hstep] at h; exact nomatch h
| ok v => rw [hstep] at h; dsimp only [Bind.bind, Except.bind] at h
- Extract from the final pure:
simp only [pure, Pure.pure, Except.pure] at h; obtain ⟨rfl, rfl⟩ := h. - Close with
rfl,exact ⟨rfl, rfl⟩, or chain bounds withomega.
Always use explicit case splits (not <;>): different branches need different closers (rfl vs simp_all vs exact nomatch h).
Sentinel/guard elimination: nomatch not simp
exact nomatch h closes an impossible error branch (h : throw _ = .ok _). Prefer it over simp — faster, robust, clear intent.
After rw [hstep] at h, always dsimp only
rw leaves match (.ok v) with ... unreduced. Follow with dsimp only [Bind.bind, Except.bind] at h to collapse it.
Unfold one function at a time
If parseA calls parseB calls parseC, unfold only parseA, then case-split on the parseB call. simp only [parseA, parseB, parseC] explodes the term.
Common sub-patterns
PUnit.unitartifacts (side-effect-free binds):simp only [pure, Pure.pure, Except.pure] at h.getElem!with bounds:simp only [getElem!_def] at h; split at h— one branch has the element, the other contradicts the guard.BEq→Prop:simp only [beq_iff_eq] at h(orbne_iff_nefor the negative).- Format discriminant dispatch (e.g. 2-bit sizeFormat): nested
by_cases h : fmt == 0 / == 1 / == 2, each withrw [if_pos]/rw [if_neg]; the final case (== 3) closes viaomega.
Chaining position bounds
Compose _le_size and _pos_ge lemmas from each step, then omega:
have h1_le := step1_le_size hstep1 -- pos₁ ≤ data.size
have h2_le := step2_le_size hstep2
have h1_ge := step1_pos_ge hstep1 -- pos₁ ≥ pos + k₁
have h2_ge := step2_pos_ge hstep2
omega
WF-recursive parsers
Use f.induct for structural induction:
induction fuel using parseLoop.induct generalizing acc
Visibility
Completeness proofs in Zip/Spec/ reference Zip/Native/ functions. Those must NOT be private — use no modifier (public) or protected. Check early: grep -n 'private def targetFunction'.
Eliminator Lemma Pattern
Factor a duplicated case-analysis chain into one reusable eliminator that extracts all useful facts from parse success; downstream theorems consume it instead of re-analyzing:
theorem parseX_ok_elim (h : parseX data pos = .ok (result, pos')) :
pos' ≥ pos + minSize ∧ pos' ≤ data.size ∧ result.field ∈ validRange := by ...
theorem parseY_complete (h : parseY data pos = .ok ...) : ... := by
have ⟨hge, hle, hvalid⟩ := parseX_ok_elim hX
...
Field characterization via struct projection
Prove a parsed struct's fields equal expressions over input bytes. After unfold/unfold_except and split at h:
- Success branches give
h : {field1 := ..., ...} = result ∧ pos+k = afterPos;obtain ⟨rfl, rfl⟩ := h. - Simple projection (
hdr.blockSize = raw >>> 3): close withrfl. - Conjunction over match branches (
(typeVal=0 → .raw) ∧ ...): usesimp_all(it needs theheq✝ : matchDiscrim = Nhypothesis to kill impossible implications).
theorem parseBlockHeader_blockType_eq ... := by
unfold Zip.Native.parseBlockHeader at h
unfold_except
split at h
· exact nomatch h -- guard failure
· split at h
· obtain ⟨rfl, rfl⟩ := h; simp_all -- typeVal = 0
· obtain ⟨rfl, rfl⟩ := h; simp_all -- typeVal = 1
· obtain ⟨rfl, rfl⟩ := h; simp_all -- typeVal = 2
· exact nomatch h -- reserved type
Existential goals: match hresult not simp only
When the goal has existentials (∃ x y z, f = .ok (x,y,z)), do NOT unfold f in the goal — simp only [f, bind, ...] explodes distributing under ∃. Match on the result instead:
match hresult : parseFunction data pos with
| .ok (a, b, c) => exact ⟨a, b, c, rfl⟩
| .error _ =>
exfalso
simp only [parseFunction, bind, Except.bind, ...] at hresult
Helper definitions: avoid let
let bindings in a helper become opaque have terms after unfold, blocking split from finding the if. Inline conditions directly:
-- Good: split sees the if after `unfold rawSize at hsize; split at hsize`
def rawSize (data : ByteArray) (pos : Nat) : Nat :=
if ((data[pos]! >>> 2 &&& 3).toNat == 0) then 1 + ... else ...
Use contradiction to drop branches conflicting with an outer split context. When split at h meets if cond and the context already has h✝ : cond = true, it auto-resolves — no second split needed.
Goal-side vs hypothesis-side guard elimination
Constructing an existential .ok works on the goal; the split at h patterns above do not transfer. Goal-side:
simp only [hguard_false, ↓reduceIte] -- eliminate known-false guard
split -- case-split if/match in goal
· rename_i hbad; exact absurd hbad (by omega) -- error case
· split
· rename_i hbad; exact absurd hbad (by ...)
· apply ih; ...
Differences: use split (not split at h); omit dsimp only [letFun] (no-op on goal); close impossible branches with absurd + contradiction (not nomatch).
Composed completeness (single block): 6-step recipe
Prove a high-level op succeeds given only raw-byte preconditions. Preconditions are data[off]! expressions (usable without running the parser); field characterization theorems bridge raw bytes ↔ struct fields.
theorem composed_completeness (data : ByteArray) (off : Nat)
(hsize : data.size ≥ off + minBytes) (htypeVal : rawByteExpr = expectedValue)
(hlastBit : ...) (hpayload : data.size ≥ off + headerSize + payloadSize) :
∃ result pos', topLevelFunction data off ... = .ok (result, pos') := by
-- 1. Guard satisfiable, e.g. typeVal ≠ 3 (reserved)
have htypeNe3 : rawTypeExpr ≠ 3 := by rw [htypeVal]; decide
-- 2. Sub-parser completeness
obtain ⟨hdr, afterHdr, hparse⟩ := parseFunction_succeeds data off hsize htypeNe3
-- 3. Field characterizations
have htype := (parseFunction_blockType_eq ... hparse).1 htypeVal
have hpos_eq := parseFunction_pos_eq ... hparse
-- 4. High-level constraints from raw-byte hyps (decide closes typeVal implications)
have hlast : hdr.lastBlock = true := by rw [hlast_eq, hlastBit]; decide
-- 5. Sub-operation completeness (payload size flows through hpos_eq)
have hpayload' : data.size ≥ afterHdr + neededBytes := by rw [hpos_eq]; omega
obtain ⟨block, afterBlock, hraw⟩ := subOperation_succeeds data afterHdr ... hpayload'
-- 6. Close via composition lemma
exact ⟨_, _, composition_lemma ... hparse hbs htype hraw hlast⟩
Prerequisites: parseFunction_succeeds, parseFunction_fieldName_eq (each field), subOperation_succeeds, composition_lemma. Concrete: Zip/Spec/Zstd.lean decompressBlocksWF_succeeds_single_raw/_single_rle.
Two-block composed completeness
Extends the single-block recipe to two consecutive blocks (first non-last, second last). Key addition: position threading — the second block's offset depends on the first's output position. Introduce explicit off2 with hoff2 : off2 = off + 3 + blockSize1.toNat, then subst:
obtain ⟨hdr1, afterHdr1, hparse1⟩ := parseBlockHeader_succeeds ...
have hpos1_eq := parseBlockHeader_pos_eq ... hparse1
obtain ⟨block1, afterBlock1, hraw1⟩ := decompressRawBlock_succeeds ...
have hAfterBlock1_eq := decompressRawBlock_pos_eq ... hraw1
have hoff2_eq : off2 = afterBlock1 := by rw [hoff2, hpos1_eq]; omega
subst hoff2_eq -- rewrites every off2 hyp to afterBlock1; eliminates the later var
-- block 2 via single-block recipe, then:
exact ⟨_, _, raw_step ... hparse1 hraw1 hlast1_false (single_raw ... hparse2 ...)⟩
Position formulas: off + 3 + blockSize (raw), off + 4 (RLE), variable (compressed). For N blocks, use induction on the WF recursion instead (see lean-content-preservation).
Multi-level composition chain
Each level wraps the previous:
decompressBlocksWF_succeeds_* → decompressFrame_succeeds_* → decompressZstd_succeeds_*
- Frame level: wraps block completeness with frame-header parsing + dict/checksum/size checks. Block hypotheses are universally quantified over
(hdr, afterHdr)fromparseFrameHeader, so byte-level conditions can depend on the (format-varying) header size. - API level: the "frame fills data" condition is a quantified hypothesis
hterm : ∀ content pos', decompressFrame ... = .ok (content, pos') → pos' ≥ data.size, cleaner than computing the exact end position.
Cross-references
- lean-monad-proofs: general
Except/Optionbind unfolding - lean-zstd-patterns / lean-zstd-spec-pattern: Zstd parsing impl + spec structure
- lean-wf-recursion / lean-fuel-induction: recursive parser unfolding