name: Deobfuscation description: >- Systematic binary deobfuscation — string decryption, control flow flattening (CFF) removal, opaque predicate elimination, mixed boolean-arithmetic (MBA) simplification, bogus control flow, instruction substitution reversal, dead code removal, and anti-disassembly fixes. Trigger: deobfuscate, unobfuscate, deobfuscation, CFF, flatten, opaque predicate, MBA, obfuscated, OLLVM, Tigress, VMProtect, string decryption, junk code, bogus control flow, instruction substitution, anti-disassembly tags: - deobfuscation - obfuscation - cff - mba - opaque-predicates - hexrays - microcode - ollvm - tigress - string-decryption - bogus-control-flow - instruction-substitution mode: plan
Deobfuscation Mode
Deobfuscate fully first. Analyze afterward. Never draw conclusions from obfuscated code — it misleads.
Host-specific tools, API details, and full algorithm implementations are in the auto-loaded references:
- IDA:
references/ida/tools.md(available tools),guide.md(workflow & technique rules),microcode-guide.md(reading/writing microcode),algorithm-reference.md(recognition & methodology) - Binary Ninja:
references/binja/tools.md(available tools),guide.md(workflow & technique rules),il-guide.md(reading/writing BNIL),algorithm-reference.md(recognition & methodology)
Read the host-specific tools reference first to know what primitives are available.
Order of Operations
1. String Decryption (always first — unlocks context)
Skip only if the user asked for a specific deobfuscation type.
Strings are the fastest path to understanding a binary. Encrypted strings signal intentional obfuscation.
- Check for readable strings — very few in a large binary means strings are encrypted.
- Find the decode stub: a small function called frequently, often before string use.
- Decompile it and identify the algorithm (XOR, RC4, custom).
- Use xrefs on the decode function to locate all call sites.
- At each call site, trace arguments to extract encrypted data and key.
- Reimplement the decode logic to compute plaintext.
- Annotate decrypted strings at each call site (C2 addresses, file paths, registry keys, API names).
- Rename the decode function (e.g.,
decrypt_string).
2. Structural Deobfuscation
You are the deobfuscator. Read the IL/pseudocode, understand the obfuscation, then use write primitives to undo it. Read, understand, modify, verify.
Work through these techniques in order — each one may reveal patterns hidden by the previous layer.
Control Flow Flattening (CFF)
A state machine dispatcher replaces linear control flow.
What it looks like:
- A loop with a back edge to a header block.
- The header compares a variable (state variable) against many constants.
- Each case body assigns a new constant to the state variable.
- At low optimization: state var may appear as memory stores (
[rbp-0x10].d = 0x25), not register vars.
How to remove:
- Identify the state variable — the variable compared most frequently against constants. No entropy filtering — state values may not look random.
- Assume multiple dispatcher blocks — any block comparing the state variable is a dispatcher.
- Map state to handler: the non-dispatcher jump target for each state value.
- Rewire each handler to jump directly to its successor handler (bypass dispatcher).
- NOP all state variable assignments (dispatcher bookkeeping).
- Remove dead dispatcher blocks (the decompiler handles this automatically after rewiring).
- Redecompile — the function should read as linear code.
Opaque Predicates
Conditions that always evaluate the same way.
Forms:
- Algebraic:
x * (x-1) % 2 == 0(always true),x^2 + xis always even. - Constant-returning functions hidden behind indirection.
- Environment checks that are constant at analysis time.
Check the decompiler output first — it may already resolve some to if (true)/if (false).
How to remove:
- Always-true: force the branch to the taken target (convert to unconditional goto).
- Always-false: NOP the branch instruction entirely.
- The dead branch becomes unreachable — the decompiler prunes it automatically.
- For non-trivially-constant predicates, use Z3 via scripting to prove the predicate, then install an optimizer/transform to force it.
Bogus Control Flow (BCF)
Conditional jumps with opaque predicates branching to junk/cloned blocks. Common in OLLVM.
What it looks like:
- A conditional branch where one successor contains junk code (no real data flow contribution) or a clone of the original block with added noise.
- The opaque predicate ensures the junk path is never taken.
How to remove:
- Resolve the opaque predicate (same technique as above) — force the branch to the real successor.
- Dead block elimination handles the junk blocks automatically.
Mixed Boolean-Arithmetic (MBA)
Simple operations disguised as complex boolean-arithmetic expressions.
Common identities:
(x ^ y) + 2*(x & y)=x + y(x | y) - (x & ~y)=y~(~x & ~y)=x | y(x & 0xFF) | (x & ~0xFF)=x(x | y) + (x & y)=x + y(alternative form)
How to remove:
- When both operands are constants: compute the result, replace with a mov/constant.
- When operands are symbolic: pattern-match known identities and replace with the simplified form.
- Work bottom-up (simplify inner expressions first).
- Apply
(1 << (size * 8)) - 1bitmask when constant-folding to handle unsigned overflow. - Multiple passes may be needed — MBA expressions can be nested.
Instruction Substitution
Simple operations replaced with equivalent complex sequences. Common in OLLVM.
Common patterns (match and reverse):
a + breplaced by:a - (-b),(a ^ b) + 2*(a & b),(a | b) + (a & b)a - breplaced by:a + (-b),(a ^ b) - 2*(~a & b)a ^ breplaced by:(a | b) - (a & b),(~a & b) | (a & ~b)a & breplaced by:(a | b) - (a ^ b),~(~a | ~b)a | breplaced by:(a & b) + (a ^ b),~(~a & ~b)
How to remove: Pattern-match the complex sequence and replace with the original simple operation. Multiple passes may be needed (substitutions can be chained).
Junk Code / Dead Stores
Instructions computing values never used.
- Identify via def-use chains: if a variable is defined but has no uses before its next definition, the definition is dead.
- A
movto a variable that is immediately overwritten is a dead store — NOP it. - After CFF removal, many state variable assignments become dead — clean them up.
Anti-Disassembly
Junk bytes inserted to confuse linear disassembly.
Common patterns:
- Junk bytes after unconditional jumps:
jmp +2; db 0xE8(fake call prefix). - Overlapping instructions exploiting x86 variable-length encoding.
How to fix: NOP the junk bytes at the byte level, or redefine function boundaries. Check disassembly first — the disassembler may already handle some patterns.
3. VM Boundary (if detected)
Not reversible with IL-level tools.
- Identify: VM entry point, bytecode buffer, handler table address, handler count.
- Document all addresses and the dispatch mechanism.
- Focus deobfuscation on non-virtualized code paths.
- Recommend specialized VM lifting tooling (e.g., Miasm, Triton).
4. Post-Deobfuscation
- Redecompile all modified functions.
- Code should now be readable. If not, check for more obfuscation layers.
- Apply all annotations (renamed functions, decrypted string comments, recovered types).
- Proceed with normal analysis.
Critical Rules
- Read the IL before acting. One read call beats 20 guesses.
- String decryption ALWAYS comes first (unless user specifies otherwise).
- After each modification, redecompile to verify improvement.
- Iterative approach: complex obfuscation needs multiple passes — install optimizer/transform, redecompile, check, refine.
- If IL-level approach fails, fall back to byte-level patching.
- Never draw conclusions from obfuscated code.
- Check the host-specific reference files for tool-specific rules (maturity guards, operand comparison, dirty marking).