name: lean-roundtrip-proofs description: Use when proving encode/decode roundtrip theorems, suffix invariance (_append lemmas), goR (decode-with-remaining) patterns, padding extraction, or composing per-level roundtrip proofs into a unified theorem. allowed-tools: Read, Bash, Grep
Lean 4 Roundtrip Proof Patterns
Suffix Invariance (_append Lemmas)
Appending extra bits to decoder input doesn't affect the result — the decoder processes only its content and leaves trailing bits untouched.
Lemma Shape
Every _append lemma follows this template:
theorem f_append (bits suffix : List Bool) (result : T) (rest : List Bool)
(h : f bits = some (result, rest)) :
f (bits ++ suffix) = some (result, rest ++ suffix)
Hypothesis: f succeeds on bits, consuming them down to rest.
Goal: f succeeds on bits ++ suffix, leaving rest ++ suffix.
Building the Chain
Suffix invariance composes bottom-up. Each layer's proof uses the layer below:
- Bit-level:
readBitsLSB_append— induction on bit count, case-split on list - Decoder-level:
decodeStored_append,decodeLitLen_append,decodeSymbols_append— usereadBitsLSB_append+ Huffman decode invariance - Table-level:
decodeDynamicTables_append— chains multiple decoder-level appends - Block-level:
decode_go_suffix— case-splits on block type, applies per-type append
Proof Pattern
theorem f_append (bits suffix : List Bool) ... :
f (bits ++ suffix) = some (result, rest ++ suffix) := by
-- Induction on fuel or structure
induction n generalizing bits result rest with
| zero => simp_all [f]
| succ k ih =>
-- Unfold one layer
unfold f at h ⊢
-- Case-split on sub-operations
cases hx : subOp bits with
| none => simp [hx] at h
| some p =>
obtain ⟨val, rem⟩ := p
-- Apply lower-level _append to transform goal
rw [subOp_append bits suffix val rem hx]
-- Reduce monadic bind
dsimp only [bind, Except.bind] -- or Option.bind
simp only [hx] at h
-- Recurse or close
exact ih rem suffix ...
Key Tactics
dsimp only [bind, Except.bind]after rewriting with_append— reducesmatch .ok x with ...without looping (seelean-monad-proofsskill)by_cases hcond : conditionforifbranches — applyrw [if_pos/if_neg]at bothhand⊢- Do NOT use
simp [hx] at h ⊢simultaneously in suffix proofs —hhasf bitswhile⊢hasf (bits ++ suffix), so the same simp won't match both
goR: Decode-With-Remaining Pattern
goR is a variant of the decoder's recursive go that returns both the decoded
result AND the remaining (unconsumed) bits. Use it to prove properties about what
the decoder leaves behind: padding extraction (bits remaining), framing proofs
(gzip/zlib need where content ends), byte-alignment (remaining bits < 8).
Definition Pattern
def decode.goR (bits : List Bool) (acc : List UInt8) :
Nat → Option (List UInt8 × List Bool)
| 0 => none
| fuel + 1 => do
let (bfinal, bits) ← readBitsLSB 1 bits
let (btype, bits) ← readBitsLSB 2 bits
match btype with
| 0 =>
let (bytes, bits) ← decodeStored bits
let acc := acc ++ bytes
if bfinal == 1 then return (acc, bits) else decode.goR bits acc fuel
| 1 => ... -- same structure, returns (acc, remaining_bits)
| _ => none
Connection Theorems
Two key theorems connect goR to the original go:
- First projection:
decode_goR_fst—(goR bits acc fuel).map Prod.fst = go bits acc fuel - Existence:
decode_goR_exists— ifgosucceeds,goRreturns some remaining bits
Proof Pattern for goR Theorems
theorem decode_goR_fst ... :
(goR bits acc fuel).map Prod.fst = go bits acc fuel := by
induction fuel generalizing bits acc with
| zero => simp [goR, go]
| succ n ih =>
unfold goR go
-- Case-split on each monadic operation
cases h1 : readBitsLSB 1 bits with
| none => simp [h1]
| some p =>
simp only [h1, bind, Option.bind]
-- ... continue matching structure of go ...
-- Recursive call: exact ih ...
Padding Extraction Pattern
The encoder's output must be byte-aligned: it decomposes as contentBits ++ padding
with padding.length < 8.
Structure
theorem deflateRaw_pad (data : ByteArray) (level : UInt8) :
∃ (contentBits padding : List Bool),
bytesToBits (deflateRaw data level) = contentBits ++ padding ∧
padding.length < 8 := by
unfold deflateRaw
split
· -- Level 0 (stored): byte-aligned, padding = []
exact ⟨..., [], ..., by simp⟩
· split
· -- Level 1 (fixed Huffman): extract from deflateFixed_spec
obtain ⟨bits, _, hbytes⟩ := deflateFixed_spec data
exact ⟨bits, List.replicate ((8 - bits.length % 8) % 8) false,
hbytes, by simp [List.length_replicate]; omega⟩
· ... -- similar for other levels
Key Elements
- Per-level spec theorems (e.g.,
deflateFixed_spec) provide thecontentBitsand provebytesToBits output = contentBits ++ padding - Padding formula:
List.replicate ((8 - bits.length % 8) % 8) false - Length bound:
simp [List.length_replicate]; omegacloses the< 8goal - Stored blocks: padding is
[](empty) since stored blocks are byte-aligned
Composing Per-Level Roundtrips
The top-level roundtrip composes per-level proofs: unfold the encoder, split on
level/strategy, exact to each per-level theorem (passing size bounds via by omega).
theorem inflate_deflateRaw (data : ByteArray) (level : UInt8)
(hsize : data.size < maxSize) :
inflate (deflateRaw data level) = .ok data := by
unfold deflateRaw
split
· exact inflate_deflateStoredPure data (by omega)
· split
· exact inflate_deflateFixedIter data (by omega)
· split
· exact inflate_deflateLazy data hsize
· exact inflate_deflateDynamic data (by omega)
Per-Level Roundtrip Structure
Each per-level roundtrip follows the same pyramid:
encode_spec: encoder produces specific bit structure
↓
decode_go_suffix: decoder preserves trailing bits (suffix invariance)
↓
goR connection: decoder leaves exactly the padding bits
↓
inflate_complete: bit-to-byte alignment → original data recovered
Iterative/Recursive Equivalence (Accumulator Pattern)
To prove an optimized accumulator-based iterative version equals the recursive one:
Lemma Shape
theorem iterFunc_eq_recFunc (acc : Array T) (args...) :
iterFunc args acc = acc ++ (recFunc args).toArray
Proof Pattern
induction h : measure using Nat.strongRecOn generalizing pos acc with
| _ n ih =>
unfold iterFunc recFunc
-- Normalize shared helpers (non-recursive = definitionally equal)
simp only [show @iterHelper = @recHelper from rfl]
split
· -- Recursive case: rewrite with IH, then array manipulation
rw [ih _ (by omega) _ _ rfl]
rw [List.toArray_cons, ← Array.append_assoc, Array.push_eq_append]
· -- Base case
simp
Key Tactics
show @iterHelper = @recHelper from rfl— when helper functions are definitionally equal (non-recursive), use this as asimplemma to normalizeArray.push_eq_append+Array.append_assoc— for accumulator manipulationList.toArray_cons— converts(x :: xs).toArrayto#[x] ++ xs.toArray- Strong induction on decreasing measure (e.g.,
data.size - pos)
Simp Loop: push_eq_append + toArray_cons
simp only [Array.push_eq_append, List.toArray_cons, Array.append_assoc] loops.
These two lemmas create a rewriting cycle. Use explicit rw in this order instead:
-- Single push: acc.push x ++ rest.toArray = acc ++ (x :: rest).toArray
rw [Array.push_eq_append, Array.append_assoc, ← List.toArray_cons]
-- Double push: (acc.push x).push y ++ rest.toArray = acc ++ (x :: y :: rest).toArray
rw [Array.push_eq_append, Array.push_eq_append,
Array.append_assoc, Array.append_assoc,
← List.toArray_cons, ← List.toArray_cons]
Order matters: normalize pushes to appends first, reassociate, then fold back.
Composition
Build bottom-up: helper equivalences first, then compose in the main theorem:
theorem lz77GreedyIter_eq_lz77Greedy (data : ByteArray) (ws : Nat) :
lz77GreedyIter data ws = lz77Greedy data ws := by
unfold lz77GreedyIter lz77Greedy
split
· rw [trailing_eq]; simp
· rw [mainLoop_eq]; simp