Internals of Poylmorphic Engines | Version | v2 | |
---|---|---|---|
Updated | |||
Author | Joshua Finley | License | DBE |
Table of Contents | |
---|---|
1 | Introduction |
2 | Defining Polymorphic Engines |
3 | Encryption Primitives |
4 | Polymorphism Techniques |
5 | Conclusion |
This file documents the workings of polymorphic engines in malware, primarily focusing on historical examples implemented in x86 assembly. While modern malware favors detection reduction techniques in memory, viruses from the 1990s and early 2000s had only one main oponent – static signatures. The result of this evolutionary pressure was the development of polymorphic and metamorphic techniques, which are used to mutate the virus’s byte sequence, breaking any static signatures. Despite the proliferation of in-memory detection techniques, polymorphic malware retains its edge as a useful evasion even now.
1. Introduction
To understand the concept of poylmorphic code (not to be confused with the programming concept of polymorhpism), we must understand the demands and constraints of computer viruses.
The goal of viral code is simple - propagate. This code mimics an organism in that it uses resources available in the environment to reproduce. An extremely simple virus might just outright replace other programs on the host machine with itself, causing destructive damage to infected machines. More commonly, viral code would exploit methods to run the original program it infected like normal, before executing the viral payload.
Antivirus tools cropped up to contend with this threat. Given the technical and technological constraints of the time, these tools would simply scan files for byte patterns of known malware, and if a match was found, delete the infected file.
Virus developers worked around this threat by first developing code to encrypt their payloads at rest and decrypt them at runtime. They would swap out these routines with new ones as variants became detected. In this way, the precious viral infection code did not need to be replaced as signatures were released for antivirus tools. Once decryptors for these viruses started being detected, the author could just swap the decryptor with a new one an release the virus again. Some viruses did this automatically, choosing between one of several available, pre-written decryption routines. These viruses were dubbed oligomorphic [1].
Shortly after, or perhaps concurrently with the first oligomorphic viruses, virus writers were already improving their trade craft. Since the decryption stubs would quickly become detected, the next solution would be to generate new stubs automatically. This could be done for every release of the virus (creation time polymorhpishm) or on each infection and execution (execution time polymorphism). Creative programming techniques could be used to generate a new variant of the encryption and execution code rather than require manual development. Authors of these viruses and virus development toolchains would leverage understanding of the target machine instruction set to permute the machine code of the virus decryption and loading routines.
A few took this to the extreme, permutating the entire virus body payload on each infection, requiring no encryption (called metamorphic viruses).
This article looks at runtime polymorphism. In the simplest terms, a polymorphic virus of this sort is one that will create a new descryption routine for each infection. The control flow will resemble the following:
- Infected host is executed
- Hijacked control flow runs decryption routine implanted by main virus code
- Decryption routine decrypts main virus payload and passes it control
- Main virus payload searches for new victims
- Main virus payload generates new decryption routine
- Main virus payload writes encrypted copy of itself and decryption routine to new victims
- Main virus routine overwrites control flow in victims to launch viral code.
2. Defining Polymorphic Engines
Reviewing different works on polymorphic viruses will reveal a myriad of strategies virus authors would use to fool defense products. These authors were truly creative in their exploration of ways to write their viruses and make them survive in hardened environments.
According to one 1998 work from The Black Baron, a polymorphic engine can be distilled into three components [2]:
- Random number generator
- Junk code generator
- Decryptor generator
Focusing on the last of the three first – the decryptor generator – According to [2] the algorithm is as follows:
- Select a random set of registers
- Choose a compressed pre-written decryptor
- Enter a loop where junk code is added to the real decryptor (which has been argued as unnecessary at worst and unhelpful at best, unless done intelligently [3], [4])
Compared to this sort of hacker definition, academic formalizations of polymorphic engines also exist [5]. These formalizations attempt to understand polymorphic malware in a scientific context.
Commercial definitions for polymorphic viruses in general have also been submitted over the years, like this Example from Symantec:
"
In retaliation, virus authors developed the polymorphic virus. Like an encrypted virus, a polymorphic virus includes a scrambled virus body and a decryption routine that first gains control of the computer, then decrypts the virus body.
However, a polymorphic virus adds to these two components a third — a mutation engine that generates randomized decryption routines that change each time a virus infects a new program. In a polymorphic virus, the mutation engine and virus body are both encrypted. When a user runs a program infected with a polymorphic virus, the decryption routine first gains control of the computer, then decrypts both the virus body and the mutation engine. Next, the decryption routine transfers control of the computer to the virus, which locates a new program to infect
"
From these sources can arrive at a very general description of a polymorphic virus:
An infectious program which spreads in an encrypted state and is also able to generate new permutations of its decryption code, as a technique to evade antivirus.
A polymorphic virus of this sort would thus appear as the following components, when attached to a host:
- Host file, control flow hijacked to execute decryptor
- Decryptor
- (Encrypted) Infection code
- (Encrypted) Decryptor generation code
+-----------------------------------------------------+
| Host File |
| +-----------------------------------------------+ |
| | Control Flow Hijack --> [Decryptor] | |
| +-----------------------------------------------+ |
| | | |
| | +-----------------------------------------+ | |
| | | Decryptor | | |
| | +-----------------------------------------+ | |
| | | | | |
| | | +---------------------------------+ | | |
| | | | (Encrypted) Infection Code | | | |
| | | +---------------------------------+ | | |
| | | | | |
| | | +---------------------------------+ | | |
| | | | (Encrypted) Decryptor Generation| | | |
| | | | Code | | | |
| | | +---------------------------------+ | | |
| | +-----------------------------------------+ | |
| +-----------------------------------------------+ |
+-----------------------------------------------------+
If we accept this as a concise definition of a polymorphic virus, then the polymorphic engine (i.e., code generator, mutator, etc.) itself is the code responsible for generating unique decryptors. Then, the engine itself is a code generation mechanism, specially crafted to generate decryptors which are valid for the encrypted viral payload, and are capable of executing it.
3. Encryption Primitives
Before the poly engine comes the encryption primitives.
In [7],[8], and [9], the author ‘MidNyte’ discusses the fundamentals of encryption/enciphering in the context of the virus or poly engine. In Part I, four x86 assembly techniques are presented. In Part II, they then present four methods of ‘armouring’ the encryption. These articles seem to have everything necessary to understand the basics of ciphers as applied to polymorphic malware.
The four ciphers presented in [7] are the following: - substitution - sliding key - long key - transposition
In [8], the following armoring techniques are presented: - variable length transposition - boundary scrambling - integrity-dependent decryption - date-dependent decryption
While they’re important, I’m going to avoid discussing the latter four techniques for the time being.
4. Polymorphism Techniques
The code polymorphism mechanisms used in polymorphic engines are fourfold:
- Permutation of the bytes of individual instruction opcodes
- Permutation of the positions of instructions
- Permutation of used registers
- Insertion of garbage instructions
Different viruses may choose any combination of these techniques. For example, many polymorphic viruses opt not to include garbage instruction generation 1 .
The first three techniques are more than enough to generate semantically equivalent but signature unique ciphers. The following sections dive into how each works.
4.1. Permutating Instruction Bytes
The mechanisms of a polymorphic engine rely on understanding of instruction encoding in order to generate valid decryptor code. By leveraging different encodings for functionally equivalent operations, code which serves the same semantic purpose may have a completely different byte sequence.
4.1.1. Opcodes
For example, most instructions in x86_64 have different encodings for different addressing modes. Take the XOR instruction for example [11]:
hex bin mode operand 1 mode operand 2
0x30 110000 r/m8 r8
0x31 110001 r/m16/32/64 r16/32/64
0x32 110010 r8 r/m8
0x33 110011 r16/32/64 r/m16/32/64
0x34 110100 AL imm8
0x35 110101 rAX imm16/32
The commonality between all six instructions is that they begin with the first two bits set and the third unset. What differs between them is the last three bits, each indicating a different addressing mode. Suppose an example of a virus decryptor in the form of a XOR cipher. By leveraging different encodings, for one single instruction we have six different bytes. If antivirus is dependent on a specific byte being present in that position and we choose a different encoding, it will fail to detect the virus.
However, we need find permutations for most or all instructions in the decryptor. In [2], the author mentions building a ‘skeleton instruction table’. What is meant by this is that most instructions will have relatively close values, and thus similar bit sequences. For example, if we examine an opcode table for x86_64,{< citation id=“x86asm” />}} we can see that the instructions ADD, OR, ADC, SBB, AND, SUB, XOR, and CMP share close opcode values for each addressing mode. Thus a skeleton instruction table would simply hold the first variation, which we can OR as needed. To make things easier, these groups of different opcodes for each operation are aligned nicely (0x00-0x08, 0x1-=0x0d). Other opcodes aren’t as neatly organized (e.g. PUSH, POP). Regardless, just by using [11] we can simply pick out a subset of the instruction set which applies to most simple ciphers:
hex opcode permutation method permutations
0x00 ADD Increment 6
0x08 OR Increment 6
0x10 ADC Increment 6
0x18 SBB Increment 6
0x20 AND Increment 6
0x28 SUB Increment 6
0x30 XOR Increment 6
0x38 CMP Increment 6
0xe9 JMP Increment 2
0x0f80 JO Increment 32
These 10 base instructions can thus produce 82 permutations only based on the opcode. In implementation the code signature would be even more changed due to semantic differences in conditional jump instructions and adjustments that may need to made based on the mode for each computation operation.
4.1.2. Modifiers
If we look at the encoding of real instructions from an assembled encryption loop, we can quickly see that there is much more going on than the just addressing mode variations between each instruction:
a) 48 C7 C3 9A 02 00 00 // mov rbx, 29Ah
b) 48 29 D9 // sub rcx, rbx
Why do both instructions begin with 0x48
? What does the third byte do, and how does the processor know how many bytes should be moved in a
?
If we take the following instruction as an example, we can decode the meanings of the bytes bit positions:
__REX.W
/ ____MOV r/m16/32/64
/ / ______________Mod-Reg-R/M Byte
/ / / ,__________,_________________Immediate
a) 48 C7 C3 9A 02 00 00
b) 48 29 d9
instruction
a) mov rbx, 29Ah
b) sub rcx, rbx
binary
______0x29a_____
a) / \
01001000 11000111 11000011 [10011010 00000010] 00000000 000000000
\_____/ \_____/ \/ \__________________________________/
| | | |
rex prefix MOV register quad word
b)
01001000 01010110 11001011
\__/wrxb \_____/ \/ \/
fixed___/ \_/ | | \
| sub immediate rcx rbx
64-bit operand ('W' bit set)
This means that in addition to the various instructions themselves, we have other opcodes which give information to the processor about the interpretation. Under x86_64, we see a REX byte prefixing some 64 bit mode operations. For instructions with variable operand encodings, we have the MOD RM byte, which holds data about operand interpretation. This means that every instruction opcode will have even more possible encodings for these surrounding bytes, offering fertile ground for even more permutations 2 .
4.2. Permutating the Instruction Positions
In addition to opcodes and modifiers themselves, we may also add a more advanced layer of polymorphism by introducing the permutation of the positions of individual instructions. In any given cipher, some instructions can be freely before or after others, within certain bounds. Others can be moved if we introduce new code to handle the changed position, introducing even more complexity.
In one post “Lord Julus” [12] provides a notation and description for reasoning about instruction position permutation. The following is an adaptation of the described algorithms and the permutation rules provided:
4.2.1. Cipher Algorithm Operations
decrypt proc near
1) mov length_register, length get the code length
2) mov pointer_register, startcode load pointer register
3) mov destination_register, startcode load the destination reg.
4) mov key_register, key get the key
main_loop
5) mov code_register, pointer_register take an encrypted word
6) call unscramble decrypt it (*)
7) mov destination_register, code_reg. write the decrypted word
8) add key_register, key_increment increment the key
9) dec length_register decrement length
10) inc pointer_register increment pointer (x2)
11) jnz main_loop loop until length=0
12) ret return pc
decrypt endp
4.2.2. Cipher Algorithm Order of Operations
- permutable, can be placed anywhere
- permutable, can be placed anywhere
- permutable, can be placed anywhere
- permutable, can be placed anywhere
- not permutable
- not permutable
- permutable, can be placed anywhere after [6]
- permutable, can be placed anywhere after [6]
- permutable, can be placed anywhere after [6]
- permutable, can be placed anywhere after [6]
- not permutable
- not permutable
Just as in [12], this general description of the algorithm with permutation rules can be used to “make a matrix of bytes”. By adding notation to each semantic operation, can now describe permutations of the same algorithm that are logically equivalent but differ in signature, for example:
matrix a)
permutation: [4, 1, 2, 3, 5, 6, 8, 9, 7, 10, 11, 12]
place: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
matrix b)
permutation: [3, 1, 2, 4, 5, 6, 7, 10, 8, 9, 11, 12]
place: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
4.3. Permuting Instruction Registers
Now that we know what logical operations must be included and have an idea of how what bit-level characteristics each instruction might have (fig 3.), the next fundamental characteristic of the polymorphic engine to understand is the random selection of registers.
Under x86_64, we have many registers to work with, but most commonly we will use the following 16:
- RAX - Accumulator register
- RBX - Base register
- RCX - Counter register
- RDX - Data register
- RSI - Source index register
- RDI - Destination index register
- RBP - Base pointer register
- RSP - Stack pointer register
- R8 - General-purpose register
- R9 - General-purpose register
- R10 - General-purpose register
- R11 - General-purpose register
- R12 - General-purpose register
- R13 - General-purpose register
- R14 - General-purpose register
- R15 - General-purpose register
To get a random register from this list, one need only to generate a random number in the range and use it as an index. From there, it must be understood how to encode instructions to use the designated register for whichever operation. In this sense, a polymorphic engine which uses register permutation will virtualize the registers it uses, providing a layer of abstraction between which registers are actually used and which are necessary.
5. Conclusion
In summary, a polymorphic engine in malware is a complex mechanism designed to evade detection by traditional security measures. At its core, this engine generates unique decryption routines and cipher codes each time the malware propagates, ensuring that the malware’s signature remains elusive to static signature-based scanning. By constantly permuting the decryption routine and generating new random keys for each iteration, the engine effectively obscures the true nature of the viral payload. This process not only masks the malicious code but also minimizes the likelihood that the decryptor itself will be recognized as harmful. The polymorphic engine achieves this through a dynamic reshuffling of instruction encodings and ordering, producing a virtually endless array of unique cipher variations. As a result, the malware becomes significantly harder to detect and analyze, presenting a formidable challenge for traditional antivirus solutions.
Bibliography
- Wikipedia contributors. (n.d.). Oligomorphic code. Wikipedia. ↩
- The Black Baron. (1998). A general description of the methods behind a polymorph engine. VX Underground. Archived version.
URL: https://web.archive.org/web/20210227235004/https://vx-underground.org/archive/VxHeaven/lib/vbb01.html ↩ - The Slammer. (n.d.). Polymorphic Engines. VX Underground
URL: https://vx-underground.org/archive/VxHeaven/lib/vts01.html ↩ - Pr0mix. 2011. Smart trash: building of logic. VxHeaven.
URL: https://web.archive.org/web/20130819080018/http://vxheaven.org/lib/vpo01.html ↩ - Jacob, Fillion, and Debar. (2008). Functional polymorphic engines: Formalisation, implementation and use cases. Journal of Computer Virology and Hacking Techniques. ↩
- Symantec. Understanding and Managing Polymorphic Viruses. VxHeaven archived version. ↩
- MidNyte. (n.d.). Polymorphic Encryption Algorithms Part I. VX Underground (Archived version).
URL: https://vx-underground.org/archive/VxHeaven/lib/vmn04.html ↩ - MidNyte. (n.d.). Polymorphic Encryption Algorithms Part II. VX Underground (Archived version).
URL: https://vx-underground.org/archive/VxHeaven/lib/vmn05.html ↩ - MidNyte. (n.d.). Polymorphic Encryption Algorithms Part III. VX Underground (Archived version).
URL: https://vx-underground.org/archive/VxHeaven/lib/vmn06.html ↩ - SolarWinds Breach Analysis.
URL: https://vxug.fakedoma.in/samples/Exotic/UNC2452/SolarWinds%20Breach/ ↩ - x86 Opcode and Instruction Reference.
URL: http://ref.x86asm.net/coder64.html ↩ - Lord Julus. (n.d.). Permutation Decryptor. VX Underground (Archived version).
URL: https://vx-underground.org/archive/VxHeaven/lib/vlj04.html ↩
Footnotes
Garbage code generation is still code generation. At that, the quality of the garbage matters [3]. It is even suggested that it is more worthwhile to use stronger and more complicated encryption than to add junk code at all [3].
One contesting idea regarding this however is the benefits of the sheer simplicity of XOR, ADD, and SUB ciphers. For example, in [10] one of the most advanced cyber attacks in history, a sliding XOR cipher was employed to success.
↩A note on visualizing instruction encodings with Python:
↩# One off ... data = "48 C7 C3 9A 02 00 00" ... bytes = bytearray.fromhex((data)) ... [print(bin(bytes[i]), end=" ") for i in range(len(bytes))] ... # As a function ... def bin_from_bytes(data): ... bytes = bytearray.fromhex(data) ... [print(bin(bytes[i]), end=" ") for i in range(len(bytes))]