,, MMP""MM""YMM `7MM P' MM `7 MM MM MMpMMMb. .gP"Ya MM MM MM ,M' Yb MM MM MM 8M"""""" MM MM MM YM. , .JMML. .JMML JMML.`Mbmmd' `7MMF' `7MF' `7MMF' `7MMF' `MA ,V MM MM VM: ,V `7M' `MF' MM MM .gP"Ya ,6"Yb.`7M' `MF'.gP"Ya `7MMpMMMb. MM. M' `VA ,V' MMmmmmmmMM ,M' Yb 8) MM VA ,V ,M' Yb MM MM `MM A' XMX MM MM 8M"""""" ,pm9MM VA ,V 8M"""""" MM MM :MM; ,V' VA. MM MM YM. , 8M MM VVV YM. , MM MM VF .AM. .MA..JMML. .JMML.`Mbmmd' `Moo9^Yo. W `Mbmmd'.JMML JMML. ,, ,, ,, .g8"""bgd `7MM `7MM mm db .dP' `M MM MM MM dM' ` ,pW"Wq. MM MM .gP"Ya ,p6"bo mmMMmm `7MM ,pW"Wq.`7MMpMMMb. MM 6W' `Wb MM MM ,M' Yb 6M' OO MM MM 6W' `Wb MM MM MM. 8M M8 MM MM 8M"""""" 8M MM MM 8M M8 MM MM `Mb. ,'YA. ,A9 MM MM YM. , YM. , MM MM YA. ,A9 MM MM `"bmmmd' `Ybmd9'.JMML..JMML.`Mbmmd' YMbmd' `Mbmo.JMML.`Ybmd9'.JMML JMML. -- Contact -- https://twitter.com/vxunderground vxug@null.net

Tracing here considered as a part of the win32 debug features. As i think, some stuff described here is used in the turbodebugger.

Now, what do we need it for.

Lets consider PE format infecting.

  1. We can directly replace address of the PE entrypoint with viral one. Such thing may be detected even without heuristic analysis; the condition is that entrypoint points to the end of file.
  2. We can insert jmp to the virus body into program startup address; so, emulation will easy skip such jmps. If we will replace jmps with something more complex, then avers will just write special subprogram.
  3. We can insert jmp to the viral body into random program place. This method is many times better than first ones. But, we can not know if selected address is code or data, and, moreover, if this address is first instruction byte. In case we use PUSH EBP/MOV EBP,ESP and alike as a signature for our patch, then it may be easy found. Anyway, avers may scan file at all addresses pointed by jmps and calls. And, the main disadvantage is that in this method we can not know if (and when) our instructions will be executed.
  4. Code analysis without execution. The best implementation of this method may be performed using IDA and some brains. Alike, but greatly simplified algorithms i've been used in the ZCME/AZCME(dos) and RPME/CODEPERVERTOR(win32). These things works fast and can correctly distinguish code and data. But, here we have big disadvantage too. In the case if some part of code is never executed, but is linked, for example with conditional jmp (jxx), such code will be marked as "normal" one.
  5. Tracing. This method allows us to mark only executable (not all!) code, and even to find order of instructions and subprograms execution.

But, what the fuck do we need this order to? Here is hidden the amazing thing, whose mission is to completely fuckup antiviral asshole. Such attempts were made under dos, but as it seems, without a result.

So, badly encrypted viral body may be anywhere in file. And this body should be called not by single jmp -- because such thing is easy detectable -- but with some instructions inserted into different program places, but with defined execution order.

But this text describes only tracing, and not a bit more.

So, lets begin.

For process which is not executing now, debugging begins with CreateProcessA function with DEBUG_PROCESS+DEBUG_ONLY_THIS_PROCESS flags. If process is opened, then DebugActiveProcess may be used.

After called function returned success, we may run main cycle. In the beginning of this cycle WaitForDebugEvent is called, and each debug_event passed to our program is analyzed.

de                      label   byte
de_code                 dd      ?
de_pid                  dd      ?
de_tid                  dd      ?
de_data                 db      1024 dup (?) ; no matter that real length is

After this structure is processed, ContinueDebugEvent is called, and we can execute WaitForDebugEvent again.

Now, about event structure. de_code member holds event's id. de_pid and de_tid members are identifiers of the current debugging process and thread, this event generated for.

Now, events which may be passed to our debugger.

seems it is first event passed to us; de_data contains such things as file, process and thread handles, main thread start address and other shit.
such events are generated when each new DLL is loaded into debugging context.
such events tells us that we should break
main cycle
should be handled, if we're going to debug all threads
use it when tracking information for each thread
we dont need it
mostly interesting event, means INT1, INT3 and all the possible exceptions.

Here are some interesting things.

  1. Before calling debugging program, system calls kernel's DebugBreak function, which performs INT3.
  2. By default, TF (trace) flag is cleared, and INT1 events are not generated. As it seems, system clears this flag each time it is possible, so debugger must set it on each INT1/INT3 event.

To work with debugging process memory ReadProcessMemory and WriteProcessMemory functions are used.

To work with debugging thread registers GetThreadContext and SetThreadContext functions are used.

So, structure of the tracer/debugger is the following:

        if (int1 or int3)

Such tracer will have big disadvantage: it will trace not only debugging process (i mean .exe file), but all the DLLs used. If such DLLs has own entrypoints, they will be traced BEFORE main entrypoint, which is sux. Moreover, when debugging process calls one of these DLLs, the calling subroutine is traced too. Also, such simple algorithm doesn't support threads.

To skip DLL's entrypoints in the beginning of the tracing:

  1. skip kernel's INT3 (in the DebugBreak), and do not set TF.
  2. write two subroutines: to insert INT3 into the program and to restore original byte, by address.
  3. on CREATE_PROCESS_DEBUG_EVENT event insert INT3 into main thread's lpStartAddress

To skip DLL's subroutines, check current address, and if it is out the of main program's range, then

  1. take DWORD from the stack, it is the return address; insert INT3 there
  2. clear TF

To support multiple threads:

  1. insert INT 3 into lpStartAddress when each new thread is created
  2. keep information about all the threads, i.e. ThreadId<-->ThreadHandle
  3. on int1/int3 exceptions, convert given TheadId into ThreadHandle
  4. update information about id's<-->handle's when threads are created and deleted.

In addition. When exception is generated and handled by SEH, SEH handlers receives control with TF cleared. So this thing may be used to fuck tracer.

How to solve this problem? This is hard.

  1. on exception (not INT1/INT3), take debugging thread's FS
  2. knowing FS, use GetThreadSelectorEntry to find VA of the FS:[0]
  3. analyze SEH's structure and find out SEH handler address
  4. insert INT3 there

Seems, this is all. All the things described here are shown in the tracer32 program.

Now, how to use this stuff in real life.

  1. select a program
  2. trace it 2-3 seconds
  3. use found address to insert JMP to

Download tracer32.zip (28K)