,, MMP""MM""YMM `7MM P' MM `7 MM MM MMpMMMb. .gP"Ya MM MM MM ,M' Yb MM MM MM 8M"""""" MM MM MM YM. , .JMML. .JMML JMML.`Mbmmd' `7MMF' `7MF' `7MMF' `7MMF' `MA ,V MM MM VM: ,V `7M' `MF' MM MM .gP"Ya ,6"Yb.`7M' `MF'.gP"Ya `7MMpMMMb. MM. M' `VA ,V' MMmmmmmmMM ,M' Yb 8) MM VA ,V ,M' Yb MM MM `MM A' XMX MM MM 8M"""""" ,pm9MM VA ,V 8M"""""" MM MM :MM; ,V' VA. MM MM YM. , 8M MM VVV YM. , MM MM VF .AM. .MA..JMML. .JMML.`Mbmmd' `Moo9^Yo. W `Mbmmd'.JMML JMML. ,, ,, ,, .g8"""bgd `7MM `7MM mm db .dP' `M MM MM MM dM' ` ,pW"Wq. MM MM .gP"Ya ,p6"bo mmMMmm `7MM ,pW"Wq.`7MMpMMMb. MM 6W' `Wb MM MM ,M' Yb 6M' OO MM MM 6W' `Wb MM MM MM. 8M M8 MM MM 8M"""""" 8M MM MM 8M M8 MM MM `Mb. ,'YA. ,A9 MM MM YM. , YM. , MM MM YA. ,A9 MM MM `"bmmmd' `Ybmd9'.JMML..JMML.`Mbmmd' YMbmd' `Mbmo.JMML.`Ybmd9'.JMML JMML. -- Contact -- https://twitter.com/vxunderground firstname.lastname@example.org
Ehrm... Super should do this instead me, anyway, as i'm his pupil, i'm gonna write here what i have learnt in the time while i am inside Win32 coding world. I will guide this tutorial through local optimization rather than structural optimization, because this is up to you and your style (for example, personally i'm *VERY* paranoid about the stack and delta offset calculations, as you could see in my codes, specially in Win95.Garaipena). This article is full of my own ideas and of advices that Super gave to me in Valencian meetings. He's probably the best optimizer in VX world ever. No lie. I won't discuss here how to optimize to the max as he does. No. I only wan't to make you see the most obvious optimizations that could be done when coding for Win32, for example. I won't comment the very obvious optimization tricks, already explained in my Virus Writing Guide for MS-DOS.
I'm sick of see the same always, specially in Win32 coders, and this is really killing me slowly and very painfully. No, no, my mind can't assimilate the idea of a CMP EAX,0 for example. Ok, let's see why:
Heh, i know life's a shit, and you are wasting many code in shitty comparisons. Ok, let't see how to solve this situation, with a code that does the same, but with less bytes.
And there is a way to do this even more optimized, anyway it's okay if it doesn't matter where should be the content of EAX (after what i am going to put here, EAX content will finish in ECX). Here you have:
Do you see? No excuses about "i don' t optimize because i lose stability", because with this tips you will optimize without losing anything besides bytes of code ;) Heh, we passed from a 8 bytes routine to 3 bytes... Heh? what do you say about it? Hahahaha.
As many APIs in Ring-3 return you a value of -1 (0FFFFFFFFh) if the function failed, and as you should compare if it failed, you must compare for that value. But there is the same problem as before, many many people do it by using CMP EAX,0FFFFFFFFh and it could be done more optimized...
Let's do it as it could be more optimized:
And another thingy could be this:
Heh, maybe it occupies more lines, but occupies less bytes so far (4 bytes against 8).
The most clear example is what all viruses do when loading the number of sections of PE file in AX (as this value occupies 1 word in the PE header). Well, let's see what do the majority of VX:
I'm still wondering why all VX use this "old" formula, specially when you have a 386+ instruction that avoids us to make register to be zero before putting the word in AX. This instruction is MOVZX.
Heh, we avoided 1 instruction of 2 bytes. Cool, huh?
Heh, this is another thing that some VX do, and makes me to go crazy and scream. Let me remember it to you:
We can call to an address directly guys... It saves bytes and doesn't use any register that could be useful for another things.
Another time again, we are saving an unuseful, and not needed instruction, that occupies 2 bytes, and we are making exactly the same.
Almost the same as above, but with push. Let's see what to don't do and what to do:
We could do the same with 1 byte less. See.
Cool, huh? ;) Well, if we need to push many times (if the value is big, is more optimized if you push that value 2+ times, and if the value is small is more optimized to push it when you need to push the value 3+ times) the same variable is more optimized to put it in a register, and push the register. For example, if we need to push zero 3 times, is more optimized to xor a register with itself and later push the register. Let's see:
And let's see how to optimize that:
Another thing passes while using SEH, as we need to push fs: and such like. Let's see how to optimize that:
Instead that we should do this:
Heh, seems a silly thing, but we have 7 bytes less! Whoa!!!
This is very useful, specially in our API search engines. And of course, it could be done more optimized rather than the typical way in all viruses. Let's see:
This same code could be very reduced, if you code it in this way:
Hehehe. Useful, short and good looking. What else do you need? ;)
For example, while seeing the code for get the last section, the code most used includes this (we have in EAX the number of sections - 1):
And this saves the result in EAX, right? Well, we have a much better way to do this, with an only one instruction:
IMUL stores in the first register indicated the result, result that is given to us multiplying the second register indicated with the third operand, in this case, it's an immediate. Heh, we saved 4 bytes of substituing only 2 instructions of code!
It should work, anyway i'm not sure, because it doesn't in my computer. Pff, maybe an intel bug, or my system is crazy or something. Not sure, but anyway try it, as it is very interesting. Look how it should be unoptimized:
Optimized, this should be in this way (i already said that it SHOULD work, but it doesn't in my PC):
Pfff, a really good optimization, 16 bytes reduced to 11 bytes ;)
There are many to do here. Specially done for Ring-0 viruses, there is a VxD service for do that, firstly i'm gonna explain how to do the optimization based in the use of this service, and finally i'll show Super's method, that saves TONS of bytes. Let's see the typical code (assumming EBP as ptr to ioreq structure and EDI pointing to file name:
Well, particulary only 1 improve could be done to that code, substitute the third line with this:
Heh, but i said that Super improved this to the max. I haven't copied his code to get the ptr to the unicode name of file, because is almost ununderstandable, but i catched the concept. Assumptions are EBP as ptr to ioreq structure and buffer as a 100h bytes buffer. Here goes some code:
Heh, the first of all routines (without local optimization) is 26 bytes, the same with that local optimization is 23 bytes, and the last routine, the structural optimization is 17 bytes. Whoaaaa!!!
This title is an excuse for show you another strange opcode, very useful for VirtualSize calculations, as we have to add to it a value, and get the value that was there before our addition. Of course, the opcode i am talking about is XADD. Ok, ok, let's see the unoptimized VirtualSize calculation (i assume ESI as a ptr to last section header):
And let's see how it should be with XADD:
With XADD we saved 3 bytes ;) Btw, XADD is a 486+ instruction.
Another Ring-0 thingy. Let's see it unoptimized:
And if we optimize...
Charming, isn't it? ;)
Here i will put unclassificalble tricks for optimize, or if i assumed that you know them while making this article ;)
I expect you understood at least the first optimizations put in this article because they are the ones that make me go mad. I know i am not the best at optimization, neither one of them. For me, the size doesn't matter. Anyway, the obvious optimizations must be done, at least for demonstrate you know to something in your life. Less unuseful bytes means a better virus, believe me. And don't come to me using the same words that QuantumG used in his Next Step virus. The optimizations i showed here WON'T make your virus to lose stability. Just try to use them, ok? It's very logic, guyz.