name: ida-reverse-engineering description: > Drive IDA Pro through the IDA Pro MCP like a senior reverse engineer: don't just narrate decompiler output, transform the database. Use this whenever the user is reverse engineering, analyzing malware, working a crackme or CTF, or doing binary/firmware analysis with IDA Pro over an MCP connection (mrexodia/ida-pro-mcp or idalib-mcp) — and especially when the agent keeps guessing at raw pseudocode instead of renaming functions and variables, defining structs, fixing types, and adding comments. Covers the iterative cleanup loop, anti-hallucination rules (always use int_convert; read real bytes/strings/xrefs), a triage/deobfuscation/library-resolution pre-pass, a definition-of-done, game-binary reversing (Unity/IL2CPP, Unreal, RTTI, anti-cheat/DRM), and bundled IDAPython scripts. Trigger even if the user only says "reverse engineer this binary", "clean up this IDB", "make this pseudocode readable", "dump the Unity/Unreal SDK", "reverse this game", or names IDA tools, without saying the word "skill".
Senior Reverse Engineering with IDA Pro (MCP)
This skill turns a model connected to IDA Pro via MCP into something that behaves like an
experienced reverse engineer instead of a narrator. The single biggest failure mode of an
MCP-driven RE agent is treating decompiler output as an answer to explain rather than raw
material to transform. A junior reads sub_401000, says "this looks like it validates a
license," and moves on. A senior renames the function to validate_license, names every
variable, defines the struct behind *(a1 + 0x10), fixes the argument types, comments the
algorithm, re-decompiles to confirm the output got cleaner, and only then moves on — leaving
the database permanently better than they found it.
Your job is to do the second thing, on every function, using the MCP write tools.
You are a capable model. Use that: reason from evidence, form hypotheses and test them against the binary, and decide where to spend effort. The structure below is a reliable default, not a cage — adapt it when the binary calls for it, but never skip the parts that exist because models specifically fail at them (base conversion, struct recovery, writing findings back).
Scope and ethics. This skill supports legitimate reverse engineering: malware analysis and triage, vulnerability research, interoperability, firmware/protocol analysis, CTFs, and crackmes. It is about understanding and documenting binaries. Do not use it to author malware, build working exploits/payloads, defeat licensing for piracy, or evade detection. Analyzing how a sample works is fine; weaponizing it is not.
When this skill triggers: first three moves
Before any deep work, orient. Skipping this is how analyses go wrong from the start.
- Confirm the live tool surface. Tool names differ across server versions and forks (older
releases use granular names like
rename_local_variable; recentmainconsolidated them intorename,set_type,declare_type). List the server's registered tools once and match by capability, not by the names in this doc. Full mapping and legacy↔current table:references/tool_reference.md. - Triage the binary so you know what you're in and where the interesting code is — run
scripts/ida_triage.py(see "Binary-level workflow" below). - Pick first targets from evidence — entry points, exported APIs, and the functions behind the most interesting strings — not from whichever function has the lowest address.
Reversing a game? Stop and read references/game_re.md before the manual loop. Games carry
metadata (IL2CPP global-metadata.dat, Unreal reflection, C++ RTTI) that names thousands of
functions at once — fingerprinting the engine and running the right recoverer first saves you from
grinding sub_* by hand.
Prime directive: write findings back, don't just describe them
Every conclusion you reach about the code must be persisted into the IDB with a tool call. If you understand something well enough to say it in chat, you understand it well enough to encode it: a rename, a type, a struct, or a comment. Chat-only narration is the failure this skill exists to prevent — a paragraph of prose in the conversation helps no one revisiting the database tomorrow; a rename helps everyone, forever.
So: prefer acting over reporting. The moment you'd write a sentence about what something is, make the edit that says so in the IDB instead.
| When you catch yourself writing… | Do this instead |
|---|---|
| "v3 is probably a loop counter." | rename v3 to i / index. |
| "This dereferences offset 0x10, maybe a length field." | declare_type the struct, set_type it onto the var so Hex-Rays shows obj->length. |
| "a1 is likely a pointer to a config struct." | Fix the prototype with set_type; re-decompile. |
| "This function seems to decrypt a string." | rename to decrypt_string, set_comments on the algorithm, document key/IV. |
| "0x6C616D is some constant." | int_convert it — it's an ASCII tag — then comment it. |
The core loop (run per function until "done")
Work one function at a time. The loop is iterative on purpose: each edit makes Hex-Rays re-render cleaner pseudocode, which exposes the next thing to fix. Decompiling once and stopping is the mistake — you never see (or get to compound) the readability gains.
- Decompile.
decompile(addr)to get current pseudocode. Read it as a hypothesis to test, not a transcript. - Orient with ground truth, not guesses. Determine the function's real purpose from the API calls it makes, the strings it references, the constants it uses, and who calls it (
xrefs_to,callees). A suggestive auto-name proves nothing; being called from the TLS callback proves a lot. Resolve any numeric value withint_convert. - Rename for meaning.
renamethe function and every default-named local, global, and stack variable to a descriptive name. NovN,aN,sub_*,dword_*,loc_*should survive in code you've analyzed. A slightly-wrong name you fix later beatsv14forever. - Recover types and structs — the highest-leverage step. Wherever you see
*(x + offset)arithmetic, define the structure withdeclare_typeand apply it withset_type(useinfer_typesas a starting point). Hex-Rays will rewrite the offset math intox->field_nameacross the whole function at once. Fix pointer/array/signedness mistakes the decompiler made. This single habit accounts for most of the difference between sludge and readable code — this is the step to slow down and think on. - Comment the non-obvious.
set_commentson the algorithm, the why behind tricky branches, decoded constants, and anything that took effort to understand. Comments land in both disassembly and pseudocode. Don't comment the obvious — comment what the next analyst would otherwise have to re-derive. - Re-decompile and verify. Decompile again. Confirm your edits took effect and the output is genuinely more readable. If a type made things worse, revert it. Then move to the next function.
Definition of done (per function)
A function is "done" when:
- it has a descriptive name reflecting its actual purpose (not
sub_*); - zero default-named identifiers remain (
vN/aN/dword_*/etc.); - pointer-offset arithmetic has been replaced by struct field access wherever a struct fits;
- argument and return types are correct;
- the core logic and any non-obvious behavior are commented.
Use scripts/re_progress.py to measure this across the whole database and find what's left.
Anti-hallucination: the operations you are worst at
Be honest about where models fail. RE depends heavily on exactly the operations you are least reliable at — base conversion, byte order, pointer arithmetic — and you will produce confident, fluent, wrong answers if you do them in your head. The fix is not "try harder"; it's "don't do them in your head at all."
Self-check: before you state any numeric fact, address, offset, decoded constant, or byte interpretation, ask — did I read or convert this with a tool, or did I compute it mentally? If mentally, stop and route it through a tool before it enters your analysis. One transposed nibble silently invalidates everything downstream.
- Never convert number bases by hand. Always use
int_convertfor hex ↔ decimal ↔ bytes ↔ ASCII ↔ binary. If math beyond conversion is needed, say so explicitly rather than computing it inline (consider a math MCP for heavy arithmetic). - Read real data instead of assuming it. Use
get_bytes,get_int(with explicit width/endianness likeu32le),get_string,read_struct, andget_global_valueto read what's actually in the binary. Do not invent values. - Derive purpose from evidence. Base claims on
xrefs_to,callees, imports, and strings — not on an auto-generated name. - Trust the binary over old comments. Stale or wrong pre-existing comments are common; verify against the actual code before relying on them.
Binary-level workflow (the phases around the loop)
Don't start renaming function #1 blindly. Orient, remove noise, then go deep. Full detail and
worked patterns live in references/methodology.md; the short version:
- Recon / triage. Run
scripts/ida_triage.py(or read its sections viapy_eval) to inventory entry points, segments and their entropy, imports (with commonly-abused APIs flagged), strings, and anti-analysis indicators. This tells you what kind of binary you're in and where the interesting code is. - Deobfuscation pre-pass. You perform badly on obfuscated code and will narrate nonsense with full confidence — so neutralize string encryption, import/API hashing, control-flow flattening, and anti-decompilation tricks before deep analysis. If the binary is heavily obfuscated, fix that first; everything downstream depends on it.
- Resolve library code. Apply FLIRT signatures (and/or Lumina) so the C runtime, STL, and known library functions are identified automatically. This stops you from wasting effort renaming
std::vectorinternals and sharply improves accuracy on the code that's actually the author's. - Systematic cleanup. Now run the core loop in a sensible order: start from entry points, exported/imported API usage, and string cross-references, then follow
xrefs_tooutward. Analyzing callees before their callers often lets you name the parent correctly. Let the database's improving state guide what to tackle next. - Document. Maintain a running
report.md(orRE/*.md) capturing what each major function does, recovered structures, the overall control/data flow, and any IOCs. When you finish, the report plus the cleaned IDB are the deliverable.
MCP tool quick-map
Confirm names against the live tool list (see "first three moves"). Grouped by what you reach for:
- Read code:
decompile,disasm,analyze_funcs,basic_blocks,callgraph - Find things:
lookup_funcs,list_funcs,list_globals,imports,xrefs_to,xrefs_to_field,callees,find_regex,find_bytes,find_insns,search_structs - Read data (ground truth):
get_bytes,get_int,get_string,get_global_value,read_struct,int_convert - Write (the important part):
rename(funcs/locals/globals/stack),set_type/infer_types,declare_type,set_comments,define_func,define_code,undefine,declare_stack/delete_stack,patch_asm - Dynamic (extension, enable with
?ext=dbg):dbg_start,dbg_run_to,dbg_step_*, breakpoints, registers,dbg_read/dbg_write,dbg_stacktrace - Escape hatch:
py_eval(code)runs arbitrary IDAPython in IDA's context — use it for anything the dedicated tools don't cover, including running the bundled scripts.
Full mapping, batch payload shapes, and the legacy↔current name table: references/tool_reference.md.
Operating the MCP tools well
Knowing which tool to call isn't enough; most wasted effort comes from mis-operating the tools themselves. These habits compound — getting them right makes every loop faster and your "done" checks actually trustworthy.
- Batch your writes. The API is batch-first:
renametakes all four namespaces (func/data/local/stack) in one call, andset_type/set_commentstake lists. Collect every edit for a function, then send one batch per tool — not twenty calls for twenty variables. It's faster and keeps your context clean. - Check the per-item
errorfield, then retry. A batch returns a parallel list where each item is{..., "error": null | "message"}. A call that "succeeded" can still have half its items rejected (invalid type, Hex-Rays refused a prototype). Read every item's error, fix the cause, and re-send the failures. Don't mark anything done off a return you didn't inspect. - Verify the write actually landed — don't trust the return.
set_typecan be silently rejected or only partly applied. The proof is the re-decompile: after applying a struct, confirm*(a1 + 0x10)actually becamea1->length. If it didn't, the type didn't take — fix it rather than moving on. - Follow pagination to completeness. Search/listing tools (
xrefs_to,list_funcs,find_*) paginate via a cursor ({next: offset}until{done: true}). Stopping at page one gives you a subset of xrefs — and "called from one place" when it's actually forty leads to a wrong parent name. Drain the cursor before drawing conclusions about callers or references. - Spend context wisely. For orientation,
analyze_funcsbundles decompile + asm + xrefs + callees + strings + constants in a single call — cheaper than six separate ones. For pure read-only state (segments, structs, imports), read the MCP resources (ida://idb/segments,ida://structs,ida://import/{name}) instead of issuing tool calls. - Treat any prior decompile as stale after a write. A
renameorset_typechanges the pseudocode. If you keep reasoning over the listing already in your context, you'll reference names that no longer exist. Re-decompile before the next analytical step, not just at loop's end. - Declare dependent types in order.
declare_typeparses C — a struct that references another type needs that type (or a forward declaration) first; pad fields you don't understand yet withchar field_N[size];so offsets stay aligned. Nested structs and vtables fail to parse otherwise. - Use
py_evalas a last resort, and capture its value. Prefer the typed, dedicated tools (they're safer and validated). Reach forpy_evalonly for bulk operations or APIs with no wrapper, and remember it returns the last expression's value — return it explicitly rather than relying onstdout. - In headless
idalib, target the right database. With multiple DBs open, passdatabase=(session id / path) on each call. Omitting it writes your renames into whatever DB is current — a silent, painful mix-up.
A worked pass (what the tool calls actually look like)
Cleaning one function sub_401000 that decrypts a config blob:
lookup_funcs(["sub_401000"])→ resolve the address.analyze_funcs(["0x401000"])→ one call for pseudocode + xrefs + callees + strings + constants.- See
*(a1 + 0x10)and a constant0x6C616D66.int_convert(["0x6C616D66"])→ it's the ASCII tag"flam"(never decode this in your head). declare_typethe layout:struct Config { unsigned int magic; char* name; unsigned int length; };set_type([{func:"0x401000", local:"a1", type:"Config*"}]).decompile("0x401000")again → verify*(a1+0x10)is nowa1->length. (If not, the type was rejected — fix and repeat.)- One batched
rename: func →decrypt_config, plus thelocal/stackvars; oneset_commentsfor the algorithm and the decoded tag. - Check each returned item's
error; re-send any that failed. Move to the next function.
When to drop to disassembly or dynamic analysis
The decompiler is the default lens; reach lower when it's lying or missing:
- Disassembly (
disasm) when the decompiler output looks wrong, the function involves heavy SIMD/intrinsics, calling conventions are non-standard, or you need exact instruction-level behavior (flags, alignment, jump tables). - Dynamic analysis (debugger extension) when static analysis stalls: self-modifying or packed code, runtime-computed values, encrypted strings decrypted in memory, or you simply want to confirm a hypothesis by watching it run. Enable
?ext=dbg, set breakpoints at the interesting addresses, read memory/registers — then feed what you learn back into renames, types, and comments. Run untrusted samples only in an isolated, disposable VM.
Bundled scripts
Run these inside IDA (File ▸ Script file) or paste their bodies into py_eval. They are
analysis/cleanup aids — they read and report, or help you build structures; none changes program
behavior. See each script's header for usage and its heuristics/limits.
Recon & finding the interesting code (do these early):
scripts/ida_triage.py— The recon pass; run first on any new sample. Metadata, segments with Shannon entropy (flags packed/encrypted regions), imports with commonly-abused APIs highlighted, notable strings, and anti-analysis indicators.scripts/function_capabilities.py— Lightweight capa-style ranking: attributes imported APIs to the functions that call them and ranks functions by how many behavior categories (net/crypto/inject/persist/anti/…) they touch, so you know where to start.scripts/string_xref_map.py— The "follow the strings" pivot: maps notable strings (URLs/IPs/paths/registry/mutexes) to the functions that reference them.scripts/crypto_const_scan.py— Findcrypt-style scan for known crypto constants (AES S-box, SHA/MD5 tables, Blowfish P-array, CRC32 table, Base64 alphabets, ChaCha sigma) and the functions that use them. Won't catch table-less ciphers like RC4 — find those behaviorally.
Deobfuscation pre-pass:
scripts/stackstring_recover.py— Reconstructs strings built on the stack via immediate stores (a common way to hide strings fromstrings), so you can comment the real literals.
Type recovery (step 4 of the loop):
scripts/struct_recovery.py— For a chosen function, collects[base + displacement]accesses, infers field widths, and emits a Cstructskeleton to hand todeclare_typeand thenset_typeonto the base pointer.
Progress & documentation:
scripts/re_progress.py— Operationalizes the definition-of-done: percentage of functions still auto-named, functions lacking comments, and (withDEEP = True) the functions with the most default-named variables. Keeps you honest about what's left.scripts/export_report.py— Harvests renamed functions, their comments, and declared structs/enums and writes areport.mdskeleton in themethodology.mdtemplate, with TODO placeholders for the prose.
Comparison & dynamic analysis (situational):
scripts/function_fingerprint.py— Exports relocation-independent per-function fingerprints (normalized mnemonics + operand types) to JSON; run in two databases to diff samples/versions/patches, or to spot duplicated code within one binary.scripts/dump_region.py— Debugger helper to dump a memory region or segment to disk (e.g. an unpacked payload after a breakpoint) and report its entropy. Needs the debugger extension (?ext=dbg); run untrusted samples only in an isolated VM.
References
references/tool_reference.md— Complete MCP tool → RE-action mapping, batch payload formats, and the legacy↔current name table. Read this when you need exact tool names/arguments.references/methodology.md— The deeper playbook: triage checklist, naming conventions, struct/type recovery technique, calling-convention cheats, common code patterns (loops, switch tables, crypto, string ops), and per-platform notes (Windows/Linux/macOS/embedded). Read this when you want the "how a senior actually thinks" detail.references/game_re.md— The game-binary playbook. Read this first whenever the target is a game: fingerprinting the engine (Unity/IL2CPP, Unreal, Source, etc.) and middleware, recovering names wholesale via IL2CPP dumpers / Unreal SDK generators / RTTI, typing math+SIMD and the entity struct, recognizing DRM/anti-cheat, and building update-resilient signatures. Games reverse very differently from malware — metadata names the world before you touch the manual loop.references/reading_list.md— Curated RE books (with what each is good for) plus high-quality online references, practice sites, and primary architecture/ABI documentation. Point the user here when they ask how to get better at RE.
Common pitfalls (the short list)
Most reduce to one of these. If you're stuck or producing confident sludge, check here first:
- Narrating instead of editing — the cardinal sin. Prose without tool calls means nothing landed in the IDB. Write it back.
- Skipping struct recovery — leaving
*(a1 + N)everywhere is the fastest way to keep pseudocode unreadable. - Mental hex math — guaranteed to eventually corrupt an analysis. Route through
int_convert. - Trusting auto-names and stale comments —
sub_*and old comments are leads, not facts. - Turning the model loose on obfuscated or unsignatured binaries — deobfuscate and apply FLIRT/Lumina first, or expect confident garbage.