,, MMP""MM""YMM `7MM P' MM `7 MM MM MMpMMMb. .gP"Ya MM MM MM ,M' Yb MM MM MM 8M"""""" MM MM MM YM. , .JMML. .JMML JMML.`Mbmmd' `7MMF' `7MF' `7MMF' `7MMF' `MA ,V MM MM VM: ,V `7M' `MF' MM MM .gP"Ya ,6"Yb.`7M' `MF'.gP"Ya `7MMpMMMb. MM. M' `VA ,V' MMmmmmmmMM ,M' Yb 8) MM VA ,V ,M' Yb MM MM `MM A' XMX MM MM 8M"""""" ,pm9MM VA ,V 8M"""""" MM MM :MM; ,V' VA. MM MM YM. , 8M MM VVV YM. , MM MM VF .AM. .MA..JMML. .JMML.`Mbmmd' `Moo9^Yo. W `Mbmmd'.JMML JMML. ,, ,, ,, .g8"""bgd `7MM `7MM mm db .dP' `M MM MM MM dM' ` ,pW"Wq. MM MM .gP"Ya ,p6"bo mmMMmm `7MM ,pW"Wq.`7MMpMMMb. MM 6W' `Wb MM MM ,M' Yb 6M' OO MM MM 6W' `Wb MM MM MM. 8M M8 MM MM 8M"""""" 8M MM MM 8M M8 MM MM `Mb. ,'YA. ,A9 MM MM YM. , YM. , MM MM YA. ,A9 MM MM `"bmmmd' `Ybmd9'.JMML..JMML.`Mbmmd' YMbmd' `Mbmo.JMML.`Ybmd9'.JMML JMML. -- Contact -- https://twitter.com/vxunderground email@example.com
It's absolutely amazing, people how quickly the programming envolves. But what is really amazing is that things that we do now are actually available from the very beginning. It was just us who didn't see them from the start, or it was "them" that didn't let us get a grip on the information with fear about what we could do with it... The darn thing is that if you already have the information, well... there's just no more fun to play with so I guess none of the situations is ok. But for the moment here we are at a peak in Win32 programming. The tools have developed, the styles were defined and more and more tips do appear everyday. That's why I decided that something new is supposed to come from me too. But, then again, nothing is new on this planet. Everything was already invented and we simply find easier, better, faster, simpler ways of doing the same thing. We reinvent sometimes the wheel in order to make little by little reach perfection. Or sometimes we simply don't know that something was proven to be impossible and we manage to solve it (Einstein's theory on inventions). But, still, a wheel is still a wheel. What you use it for is more important...
This time the object of my study is metamorphism. I think this is the next step after polymorphism, a step that will reach coding up at a new level: the highest peak of self mutating, the biggest step toward perfect stealth, the best highway to the assembly heaven... If there exists something like that... Personally I think there's only a programmer's hell, because I'm sure that Windows is not allowed in Heaven...
I hope you will enjoy this article. If so, please drop me a note at my e-mail address: firstname.lastname@example.org. I am always ready to hear new infos and theories.
Literaly, the term "metamorphism" is wrongly used in relation with code. By the definition given in the Webster, "metamorphism" means the following: "change in the mineralogical, structural, or textural composition of rocks under pressure, heat, chemical action, etc., which turns limestone into marble, granite into gneiss, etc."
Metamorphose, however might be a better term: "to change in form or nature; transform; subject to or undergo metamorphosis or metamorphism SYN. transform."
Anyway, as the term metamorphism was widely accepted and as long as it sounds much better, well, I will use this term to define the concept of self mutating code.
So, the metamorphism is a very interesting concept that basically means the following: it means that the code modifies itself from one carrier to another. The difference between this feature and the polymorphism is obvious. Polymorphism means creating different looking decryptors that assure the lack of signatures in the body, while the metamorphism means modifying the code itself, the whole code. Of course, as one might expect, it is almost impossible to create a fully metamorphic code. I will not enter the details here, but if you don't believe me, try it out. A big part of this demo will have as purpose to show where the metamorphism should be used in order to be really useful.
The main reason metamorphism is used is to protect the code against automatic attackers. As I said many times, it is less likely to fool a human being, especially a good programmer, but fooling the machine is much more easier. Also, making it very difficult for the human to create an automatic attacker against your code is also one goal reached by the use of metamorphism.
This is, let's say, closest to a paradigm. A paradigm means a change so important as it changes the concepts, the methods of working with the concept and the theory about the concept. When an anti-virus software is written and it succesfuly detects by the use of a concept, method and theory, it will be able to detect all future viruses that are based on the same concepts. Easy to understand. But once you modify all three aspects, the av software must be fully redesigned. Doing this might lead in lowering the eficiency in the approach regarding the old concepts also, because any new method has the ability of making old methods work worse than before. I will not continue this disertation here, but think about it...
Let's think about the general disinfector. What does it need? It needs to know a few important values and that is all that it takes to disinfect the file:
We don't care about the decryption algorithm here so we will only check out the rest of the stuff.
Basically all the above values are stored inside your code at a certain address. Things like these are most common:
The human will look in your code and when he finally locates and understands the above lines he will program his automatic code to look at the address of OldEip and get the value from there. There's no need for human interference when scanning for such a simple thing. Now the software has located the original eip of the infected program and can safely remove your hook just by restoring it. This is just a very simple way of disinfecting.
How can we prevent such a thing, or how can we at least make it harder? This is explained by some lite metamorphism methods.
Let's imagine that you replace the above codings with this:
Now, our metamorphic engine has to do the following thing: decide randomly which address to use, fill it with the right value, fill the other with random values and go at address ebp+metamorph1 and fill in the address of the needed value.
Where does it lead? Everytime the virus propagates the place where the old entrypoint is stored will be different... And also, the instruction that accesses it will differ from generation to generation. I don't know if you realise the strength of this thing. Of course it is easily beatable by locating the access instruction itself and getting the address from there. But, think of this:
Now, the two instructions both look like this when debugged:
Starting to get my point? Imagine you have 10 values you metamorph around the code, each having 10 possible places and each being accessed around 3 times, needed, and other 7 times just as junk... Do you realise how many generation should one person generate to understand what is the actual meaning of the code, and how hard would it be to locate the values needed for disinfection?
You will say, no problem... if one can locate the metamorphic engine he can decipher your code... Think so? Read furthure on the implementation of the engine. Now let's check some other ways of using metamorphism.
This is a little bit tricky and you have to learn a little about instruction lengths. It's not very hard, but you will have to create it by testing it many times under a debugger. Remember that here you are not generating a polymorphic decryptor (where you have an empty buffer and you can fill it downwards), but you are working in compiled code that has a definitive size and links all over. The idea is to modify a certain instruction so that it cannot be located easily.
First step: instruction relocation
For this you will need to save some space in different parts of your code and they should look somehow like a subroutine:
You can have, let's say, around 10 places for each part of metamorphic code. Whenever this instruction is to be called you must rearrange the call to it. Imagine for the above:
Now, if your random generator decides that the code should be metamorphized into another place (let's say place2) all that it'll need to do is move the instruction there and modify also the call to read "call place2" (check later for insights).
This is the first idea: your instruction can roam around the code. Think that you can have let's say 15 places like that and 10 or more instruction to metamorph. Your random number generator will choose a place for each one and still you will have some left to fill with junk.
Second step: actual code mutation
Here you need to take care of the instruction length. As you noticed I choosed randomly the size for a place to 20 bytes (btw: you may have different place sizes). This means that you cannot put an instruction or group of instructions there longer than 20 bytes, because otherwise they will overwrite the code that follows.
Let's return to our instruction here:
i1) mov [ebp+Oldeip], eax ret
Let me be imaginative and create other groups of instruction that do the same thing:
i2) push [ebp+Oldeip] pop eax ret i3) push edx lea edx, Oldeip add edx, ebp mov [edx], eax pop edx ret i3) push eax lea eax, [ebp+Oldeip-1] inc eax pop [eax] ret
Now, your random number generator will choose one of the above instruction and will simply fill in it's place. What does this bring? It makes it even harder for the automatic scanner (provided that it can look up all the places) to know which address are you addressing (oldeip1, 2, etc...).
For the moment let's take a break and see what all the above can generate:
┌─────────────┐ │ call placeX │ └──────┬──────┘ ┌───────┬───────┬───────┬───────┼───────┬───────┬───────┬───────┐ │ │ │ │ │ │ │ │ │ ┌──────┐┌──────┐┌──────┐┌──────┐┌──────┐┌──────┐┌──────┐┌──────┐┌──────┐ │place1││place2││place3││place4││place5││place6││place7││place8││place9│ └───┬──┘└───┬──┘└───┬──┘└───┬──┘└───┬──┘└───┬──┘└───┬──┘└───┬──┘└───┬──┘ └───────┴───────┴───────┴───────┼───────┴───────┴───────┴───────┘ ┌────────────────┬─────────┴───────┬─────────────────┐ ┌──────┴────────┐┌──────┴────────┐┌───────┴───────┐┌────────┴──────┐ │ i1 ││ i2 ││ i3 ││ i4 │ └──────┬────────┘└──────┬────────┘└───────┬───────┘└────────┬──────┘ └────────────────┴─────────┬───────┴─────────────────┘ ┌───────────┬────────────┼───────────┬───────────┐ ┌────┴─────┐┌────┴─────┐┌─────┴────┐┌─────┴────┐┌─────┴────┐ │ Address1 ││ Address2 ││ Address3 ││ Address4 ││ Address5 │ └──────────┘└──────────┘└──────────┘└──────────┘└──────────┘
Basically any route that goes downward can be generated by the metamorphic process (for example call to place5, with instruction set i1 that accesses address Address5). Almost all places and Addresses should be used, each one for a different instruction. The Instruction set should be wider because for different instruction we must metamorph the specific code. But the places and the addresses can be common to all instructions.
Of course, I don't have to say that the address of the places and of the addresses should be as mangled as possible inside the real code.
Now let's move to a deeper thing. Imagine that there exists a really, really masochistic person who realised the way your code behaves and he wants to find all the addresses where your code stores the EIP (in order to properly disinfect the victims). He could generate for example 500 samples of your code and have 10 people analyze them. It wouldn't be very hard, all they would need would be a table to be filled in with the offsets for the places, addresses and where to look-up the address inside the instruction. Do you think that all the situations would be met in such many generations? Sure, if you do not use a smart slow metamorphism. This kind of slow metamorphism would mean this: each of the three variables (place, address and instruction set) should be changed at different moments, once a counter passed a value of 20. So, every 20 generations the place would change. Every 20 generations the address will change, etc. This assures us that at least 20 generations something wouldn't change. This means that to get all the 10 possibilities for the place at least 200 generations should be created and everytime the random number should generate a different number... which is almost impossible. 200+ 200+ 200, that means 600 generations and with the assumption that the randomizer generates exactly what you want. I think in 6000 generations the conditions should be hardly met. To analyze 6000 generations is... well, at least suicidal...
To add even more complexness to this one might use an invention I call "Madness Jump Table".
Let's assume that you made your code metamorphize the instruction:
into a "call place", with all the links presented above. And let's imagine that this instruction will appear 5 times in your code (maybe a few times only as decoy). It wouldn't be very nice to encode it everytime by a call to place. The use of a Madness Jump Table would solve this.
instruction1: call treeEntry1 instruction2: call treeEntry2 instruction3: call treeEntry3 instruction4: call treeEntry4 instruction5: call treeEntry5 treeEntry1: jmp subEntry11 treeEntry2: jmp subEntry12 treeEntry3: jmp subEntry13 treeEntry4: jmp subEntry14 - these are equal treeEntry5: jmp subEntry14 / subEntry11: jmp subEntry21 subEntry12: jmp subEntry22 subEntry13: jmp subEntry23 - these are equal subEntry14: jmp subEntry23 / subEntry21: jmp subEntry31 subEntry22: jmp subEntry32 - these are equal subEntry23: jmp subEntry32 / subEntry31: jmp subEntry41 subEntry32: jmp subEntry41 subEntry41: jmp place
Ok, let's trace the instruction3:
treeEntry3-> subEntry13-> subEntry23-> subEntry32-> subEntry41-> place
No matter what instruction you start with, you wind up in the same adress: place (note that the call place was replaced by a jmp place, because the call is already done from the beginning and we don't want two addresses on the stack).
Now, please look carefully at the above table. Imagine that in each tree block you mangle the left side (the jumps) between them completely random. Does anything happen? No, because anyway, the trace will still lead to the same place. But you will have 5 instructions that will jump each through 6 everytime different jump places, everytime reaching a different place, where a diferent set of instructions is applied in order to use a value which is stored in a different place, which is absolutely necessary for the run of the program... Did you compile what I just said?
Will this decrease the speed of your code? Not at all... Will it increase it's size. Sure, a little but not so much. 20 jumps and call in total means 100 bytes, plus 20 bytes per instruction set (provided we have 10 instruction sets), gives another 200. So, a total of 300 bytes added to your code as functional side. Plus the additional place took by the address storage and instruction sets storage.
Of course, as I said, the metamorphism should only be used in places where you really, really need to make the data hard to be understood and reached, because too much metamorphism could lead to huge executables and no actual substance, not to mention your additional useless work.
Where to apply?
Let me give you a few hints on where I think metamorphism should be applied. First of all, I assume you work on a self relocating code (see an example in Win32.Thunderpick and Win32.Rammstein); in these type a part of the original code is moved someplace at the end of the file encrypted, as well as the rest of original code. The virus inserts in the freed place and when it finishes the job it decrypts the code and puts it back. This is needed because otherwise some smart AV's could find this way : load the infected sample as a debugee process, locate your seh handler (if any) and find a way to force it to return to host. Then monitor the address of the return inside the code section and so the original entrypoint is disclosed. By placing yourself over the same area where the original entrypoint was (as Rammstein do) the av software cannot make the assumption that the eip will somehow "escape" that area and by jumping some very far away it means that that is the original entrypoint. To get the original entrypoint it should either trace the entire execution, which is dangerous for it and almost impossible, or to scan the code for values. And here comes our metamorphic instructions.
So, let's see where should the metamorphic paradigm apply (I just looove this kind of diryt talking... ;-):
If you are smart enough to create such a metamorphic engine to hide the above things and the instructions that access them, this is it!! You do not need to metamorph things like ordinary math instructions and so on. You have to phocus on the important instructions and metamorphize them!
Additional stealthing tips
As I said you might want to create a dedicated set of addresses for each of the metamorphized concepts (e.g. oldEip, code hunk address, etc.). However, in the light of the above techniques, only one of the multiple addresses will be actually used while the rest should be only for decoy. To make it perfect you should not leave under any circumstances those values 0. This would be a fatal mistake. If the av locates all the addresses all the rest in our algorithm is useless, because it will consider the one that is not equal to 0. Also, you should not at all put there random values. Why? An inteligent av software could locate the actual eip from a set of many values just by checking which is bigger then the RVA of the code section and smaller than the RVA+the raw size. To solve this, simply make your random number generator generate small positive numbers, negate them if you want (another random assumption) and add those randoms to the original eip. In this way all the addresses will have very similar values going around the original eip rva. Renders all assumptions to null...
To optimize the things a little bit: do not store the instruction sets someplace and just move them to the place at metamorphization. Just create the places with the instructions already there and when you want to modify just exchange two of them between them. Or everytime you want to mutate just exchange all of them between them randomly.
How to place all these infos and not get lost into your own code? This implies that you know exactly what you start from and you put everything on paper. Then, the Madness Jump Table offers a very good place for data hidding. Design the table and then put the addresses between the jumps. You might even insert some decoy there (like 0FFh prefixes before the jumps to make the compiled code look horible ;-).
Encrypt very well the core of the metamorphic engine. For this I suggest a non-linear algorithm with multiple passes (like an endless loop). Inside the metamorphic engine use address decoy. I will not enter in details with this technique, I will only present it briefly:
Instead of saying:
In this way, by disassembling your code it would be much harder for the analyzer to understand what you thought there. The last instruction might appear many times inside the code.
The components of the metamorphic engine
This is the part that moves the data from one address to another. It requires a table like this:
AddressTable: Ahunk1: size1 = x _addr11 dd offset address11 _addr12 dd offset address12 ... _addr1x dd offset address1x Ahunk2: size2 = y _addr21 dd offset address21 _addr22 dd offset address22 ... _addr2y dd offset address2y ... AhunkN ...
,where each hunk is used for a specific value (like oldEip or code address), and each addressAB represents possible places inside the data area where the actual value can be stored.
The engine will parse then each hunk, given it's size, go at each address (aligned with the delta handle of course) and fill it with either a random value, or the real value, as it decides. Just when the address for the real value is decided, the instruction filler should be called directly to prevent future passes over the tables. The instruction filler tells the instruction to address on the specific address where the actual data is placed.
This one also needs a table, like this:
InstructionTable: Ihunk1: __size = a _instr11 dd offset instruction11 _byteoffset11 = 3 ... ...
,where each hunk is correspondant to the hunks above. Each instructionAB represents the address of the instruction that wants to use a value (a virtual mov [ebp+oldEip], eax for example), and byteoffset represents at what offset should the addres s of the data be placed. For example in this case:
instruction11: push edx mov edx, [ebp+oldEip] mov [edx], eax pop edx
the first instruction is 1 byte long and the second is 6 bytes long, and the address of oldEip is stored on the fourth byte starting from the instruction11 address. You can simply compute these values by entering TurboDebugger, typing the instructions and instead of oldEip put 8888888h and see on what byte does it start.
This part of the engine receives the address of the data from the address generator. It will then go at each instruction's offset and fill in at the proper byte offset the address it received. Then it will choose one of the instructions and pass it's number to the place filler.
This part doesn't need another table. It will simply mangle the instruction sets between them as held in the InstructionTable table, and for the instruction to be executed (as received from the instruction filler) it will pass this value to the jump table filler.
The jump table filler simply mangles between them the jumps in each jump block (look above) and then replaces the 'jmp place' instruction with the proper jump to the address it received from the place filler (the instruction to be executed). Then, for each caller it will choose a random entry into the jump table tree and fill it in using this table:
FinalTable: Fhunk1: ____size = 5 _call11 db offset _caller11 ...
All this been set up, your code will have somewhere inside it this instruction:
The store eip jump table tree will guide the call through the random tree. It will finally reach at a proc which will hold one of the many instruction sets you prepared to put a value in [ebp+oldEip], where the oldEip address will be one of the many places you have to store this value.
As you can see, it is very easy to understand how it works, as it builds up the metamorphic code, but it is very difficult to understand how if you only have the disassembly and a bunch of (encrypted) tables. Also note that using the above way, all data can still be metamorphized again and again.
I wrote a very simplistic metamorphic demo to help you out. It simply applies the above things on 3 instructions 5 times. After the metamorphic engine acts once it calls the instructions so you can trace through the code and see how it changes bu t still do the same thing!!
Well, at the end of another tutorial I feel I learned more already. Actually, this tutorial was thought as it was written. I do hope you enjoyed it too and I am very eager to hear more and more ideas on this issue so that we can improve it more and more.