,, MMP""MM""YMM `7MM P' MM `7 MM MM MMpMMMb. .gP"Ya MM MM MM ,M' Yb MM MM MM 8M"""""" MM MM MM YM. , .JMML. .JMML JMML.`Mbmmd' `7MMF' `7MF' `7MMF' `7MMF' `MA ,V MM MM VM: ,V `7M' `MF' MM MM .gP"Ya ,6"Yb.`7M' `MF'.gP"Ya `7MMpMMMb. MM. M' `VA ,V' MMmmmmmmMM ,M' Yb 8) MM VA ,V ,M' Yb MM MM `MM A' XMX MM MM 8M"""""" ,pm9MM VA ,V 8M"""""" MM MM :MM; ,V' VA. MM MM YM. , 8M MM VVV YM. , MM MM VF .AM. .MA..JMML. .JMML.`Mbmmd' `Moo9^Yo. W `Mbmmd'.JMML JMML. ,, ,, ,, .g8"""bgd `7MM `7MM mm db .dP' `M MM MM MM dM' ` ,pW"Wq. MM MM .gP"Ya ,p6"bo mmMMmm `7MM ,pW"Wq.`7MMpMMMb. MM 6W' `Wb MM MM ,M' Yb 6M' OO MM MM 6W' `Wb MM MM MM. 8M M8 MM MM 8M"""""" 8M MM MM 8M M8 MM MM `Mb. ,'YA. ,A9 MM MM YM. , YM. , MM MM YA. ,A9 MM MM `"bmmmd' `Ybmd9'.JMML..JMML.`Mbmmd' YMbmd' `Mbmo.JMML.`Ybmd9'.JMML JMML. -- Contact -- https://twitter.com/vxunderground email@example.com
NOTE: This article was made using kernel version 2.0.34, where the segment distribution is different from the actual kernel versions like 2.2.XX.
Hi, and welcome to the worlds' first ever Virus Writing Guide alike for the LiNUX system. This tutorial is NOT written by me, and my only intention is to translate it to all the viral community in general. If you want to take a look to the original version (that is in spanish) you can find it on my own website (http://beautifulpeople.cjb.net).
The author, who wants to remain anonymous, has shown impressive LiNUX skills and aswell a good assembler level (rare for a LiNUX coder), but he has a problem with his lack of optimization ;)
My conclussions after reading this article are various: LiNUX kicks ass, LiNUX kicks Windoze's ass... You can take a look to its heavy and very intelligent protection: It's almost impossible to achieve Ring-0 (at least not being root), it's impossible to make a Ring-3 global residence (with the impressive mechanism of copy-on-write), and all those details that makes the LiNUX system to be the best choice actually in matter of operative systems. Really.
My english is not very good, but seems that i'm the only one that wants to take the trouble to translate this. Besides, i think that leave this jewel only in spanish is an unforgivable sin. So, here i am, translating it 4 u ;)
© 1999 Mr Anonymous [ Original Article ]
© 1999 Billy Belcebu/iKX [ Translation ]
PS: I have the author permission for this. Don't blame me with copyrightz...
The neverending question, Why aren't viruses for linux?. It seems that the viral community, accustomed to Real Mode systems (DOS), find that is hard to adapt themselves to protected mode systems. Even for Win95/98, systems with important dessign problems, there exists moreless 30 viruses where the great majority are non-resident viruses or VxD infectors (Ring-0 devices).
It seems that the answer is in the important memory protection implemented by Linux.
Systems like Win95/NT use a memory dessign with a limited use of segments. In this systems with user and kernel selectors, we can directionate all the virtual space, i.e. from 0x00000000 to 0xFFFFFFFF (That doesn't means that you can write to all the memory, because the memory pages also have some protection attributes).
However in Linux the dessign is very different, there are two different zones very differenced by segmentation, one dedicated to user processes, that go from 0x00000000 to 0xC0000000 and other for the kernel, that go from 0xC0000000 to 0xFFFFFFFF.
Let's see a dump of registers with GDB, taken from the beginning of the execution of a command like GZIP.
(gdb)info registers eax 0x0 0 ecx 0x1 1 edx 0x0 0 ebx 0x0 0 ebp 0xbffffd8c 0xbffffd8c esi 0xbffffd9c 0xbffffd9c edi 0x4000623c 1073766972 eip 0x8048b10 0x8048b10 eflags 0x296 662 cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x2b 43 gs 0x2b 43
We can see that Linux uses the selector 0x23 for code, and the selector 0x2B for the data. Intel uses 16-bit selectors, the two less significan bits store the RPL (information about the privilege level of that selector, Intel implements 4 protection rings, but the actual operative systems like Win95/NT or Linux use only 2, Ring-0 for the kernel (maximum privilege level) and Ring-3 for the user processes)). The next bit shows where is the descriptor of the segment that contains information about the segment, 0 for the GDT (GLOBAL DESCRIPTOR TABLE) or 1 for the LDT (LOCAL DESCRIPTOR TABLE). The other bits are simply an index of a segment descriptor that will be in the LDT or the GDT according to the information of below.
Selector [ 14 bits, Index to descriptor ] [ 1 bit, GDT/LDT ] [ 2 bits, RPL ]
Then, if we pass to binary 0x23 we got
[ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 ] [ 0 ] [ 1 1 ]
So we know that it is a Ring-3 selector (it's used by a process) and also we know that tge information of such segment lies in the GDT, at 4th entry. If we analyze the next descriptor (0x2B) we'll obtain a similar information, but the descriptor will be at 5th entry.
If we take a look to the kernel's code, more concretly in the file called /usr/src/linux/arch/i386/kernel/head.S (painfully in assembler :)) we can appreciate the segment initialization in linux.
As you can see, Linux initializes 4 segments: 2 for kernel and 2 for user, depending if they are of code or data. In each entry is stored information like the base address of the segment and its limit, if it's resident in memory or not, the kind of segment, if its code is in 16 or 32 bits. Meanwhile there are an user selector in the DS segment, we can't ever handle an address over 0xC0000000 because we would be out of the memory that can be accessed by the segment, we would receive a SIGSEGV signal and our process would be finished painfully.
I know i can directionate from 0x00000000 to 0xC0000000 but, what can i modify?. Here begins the real protection mechanism. The memory is divided in pages of 4Kb each one in the case of Intel, and each page has its own attributes: if they are read/write, if it's in memory (it can be at disk temporally), if it's of kernel, etc.
All the information about pages in memory is located in a page table that contains descriptors for each mapped page in memory. There is one page table for each process in memory, this makes that each process has its own virtual space and besides, that any other process could access to another one.
This makes possible to load programs in the same memory address, and really it's what it does. Windows 95/98 and Linux do it. In Linux the usual load address is 0x08040000 while in Windows it is 0x00400000.
This page table is pointed by a control register of the processor (the CR3) so it changes with each change of context modifying also the virtual space of the process.
But then, if a process can only handle directionate the perprocess memory, how is it able to execute system calls that reside over 0xC0000000? Intel brings us mechanisms for jump to Ring-0 in a safe way when we need to make system calls. Intel uses two methods: the TRAP GATES and the CALL GATES. Usually are used the TRAP GATES (WinNT/98/95, Linux); even i believe that some other unix systems use the CALL GATES for make the Ring jump.
The Trap Gates occupy one entry in the IDT (INTERRUPT DESCRIPTOR TABLE), and allow the jump to Ring-0 with the generation of one interrupt. For that, the jump address defined in the IDT must have a Ring-0 selector and the DPL (Descriptor Privilege Level) must be 3, allowing an user to execute it. In Linux the interrupt used for the jump is the 0x80, while Win95 uses the int 0x30, for example.
Let's see the disassembly of the getpid function of the LIBC library. For that we create a C file like this:
After compile it, we debug the binary file with GDB:
(gdb)disass 0x8048480 <main>: pushl %ebp 0x8048481 <main+1>: movl %esp,%ebp 0x8048483 <main+3>: call 0x8048378 <getpid> 0x8048488 <main+8>: movl %ebp,%esp 0x804848a <main+13>: popl %ebp 0x804848b <main+11>: ret
As you can see the call to getpid is dessigned in Linux (and in other systems) as a CALL to a special section inside the binary file (0x8048378). There we could find a jump to the desired library function. This jumps are built in memory by the OS for choose the dynamic links with the libraries. With this, any file could execute exported functions of others, if it's pointed in this way by the information in the ELF header. Let's continue debugging:
(gdb)disass getpid 0x40073000 <__getpid>: pushl %ebp 0x40073001 <__getpid+1>: movl %esp,%ebp 0x40073003 <__getpid+3>: pushl %ebx 0x40073004 <__getpid+4>: movl $0x14,%eax 0x40073009 <__getpid+9>: int $0x80
These are the first instructions of the getpid library call. Its work is simple: we are only preparing a jump to Ring-0. If the function would have some parameters, it would have prepared the registers for that parameters before doing the jump to Ring-0. It would have put in EAX the number of function, and it would have called to the int 0x80. As you can see, the code of the libraries is in the PerProcess memory, below 0xC0000000, so it's Ring-3 code and it lacks of privileges for access ports,to privileged memory areas, etc. That's the reason because the libraries are really intermediary between the calls made by the processes and the calls generated via int 0x80
All the system calls that need to jump to Ring-0 will use the int 0x80, and the int 0x80 has only a descriptor, we'll jump always to the same memory address. That makes us to need to put in EAX register the number of the function we want to call to. In Ring-0, the kernel evaluates the value of EAX for know what function if has to satisfy, and according to its value, it would jump to one function or to another using an internal table of pointers to function called sys_call_table. The list of function accedped with the int 0x80 is in the file /usr/include/sys/syscall.h
With the execution of an
int 0x80 the processor will change the selector of code active. It'll change from the selector 0x23 to 0x10, so we'll pass from directionate from 0x00000000-0xC0000000 to 0xC0000000-0xFFFFFFFF.
The next method of jump, rarely used, is based in an entry in the GDT or excepcionally in the LDT. There we'll define what's denominated a CALL GATE, that allows jumps to Rings of more privilege via the instruction
CALL FAR or
JUMP FAR of assembler.
In Linux there are two formats of executables: a.out and ELF; however every executable and library of Linux nowadays use the second format. The ELF format is very powerful, and contains information for handle applications under different processors. It contains information about the processor where the executable was compiled, or if it has to use little endian or big endian. As it is a format of processors in extended mode, besides the information about the physical sections that are in the file, there is some information about how the OS has to map the file in memory.
The ELF file has one first part that occupies the first 0x24 bytes of the executable, and contains, among other things, a mark 'ELF' for show us that it is an executable file with ELF format; the kind of processor, the base address (that is the virtual address of the first instruction that will be executed in the file) and after, 2 pointers to 2 tables.
The first table pointed is the Program Header (located physically after the ELF header) that contains entries with information about how will be mapped in memoy the file. Each entry will contain the size of each segment in the memory and in the file, also the address of the init of the segment.
The next table is the Section Header, and it's just at the end of the file. It'll contain information about each logical section, it'll also contain protection attributes, but this information won't be used for map the code of the file in memory.
With the GDB command 'maintenance info sections' we can see the section structure with all the protection attributes of each section. If you take a look at it, you'll realize that all the readonly sections are situated the first ones, and the read/write sections, altogether at the end. This is necessary because the code sections are mapped altogether in memory in consecutive pages by means of an entry in the program header. That's why all the section that share the same protection attributes will be able to share memory pages, meanwhile the sections with different attributes won't be able to do so. With this we avoid the internal fragmentation in the executables, because if every section would have to map separately, the last page of every section would be empty, and many space would be wasted.
Also look to the last readonly page doesn't share a page with the first one with readwrite attributes. The dump of this instruction with a command like gzip would be the following:
(gdb)maintenance info sections Exec file: '/bin/gzip', file type elf32-i386. 0x080480d4->0x080480e7 at 0x000000d4: .interp ALLOC LOAD READONLY DATA HAS_CONTENTS 0x080480e8->0x08048308 at 0x000000e8: .has ALLOC LOAD READONLY DATA HAS_CONTENTS 0x08048308->0x08048738 at 0x00000308: .dynsym ALLOC LOAD READONLY DATA HAS_CONTENTS 0x08048738->0x08048956 at 0x00000738: .dynstr ALLOC LOAD READONLY DATA HAS_CONTENTS 0x08048998->0x08048b08 at 0x00000958: .rel.bss ALLOC LOAD READONLY DATA HAS_CONTENTS 0x08048b10->0x08048b18 at 0x00000b10: .init ALLOC LOAD READONLY CODE HAS_CONTENTS 0x08048b18->0x08048e08 at 0x00000b18: .plt ALLOC LOAD READONLY CODE HAS_CONTENTS 0x08048e10->0x08050dac at 0x00000e10: .text ALLOC LOAD READONLY CODE HAS_CONTENTS 0x08050db0->0x08050db8 at 0x00008db0: .fini ALLOC LOAD READONLY CODE HAS_CONTENTS 0x08050db8->0x08051f25 at 0x00008db8: .rodata ALLOC LOAD READONLY DATA HAS_CONTENTS 0x08052f28->0x08053960 at 0x00009f28: .data ALLOC LOAD DATA HAS_CONTENTS 0x08053960->0x08053968 at 0x0000a960: .ctors ALLOC LOAD DATA HAS_CONTENTS 0x08053968->0x08053968 at 0x0000a968: .dtors ALLOC LOAD DATA HAS_CONTENTS 0x08053970->0x08053a34 at 0x0000a970: .got ALLOC LOAD DATA HAS_CONTENTS 0x08053a34->0x08053abc at 0x0000aa34: .dynamic ALLOC LOAD DATA HAS_CONTENTS 0x08053abc->0x080a4078 at 0x0000aabc: .bss ALLOC 0x00000000->0x00000178 at 0x0000aabc: .comment READONLY HAS_CONTENTS 0x00000178->0x000002b8 at 0x0000ac34: .note READONLY HAS_CONTENTS
Take a look to that curious jump between .rodata and .data sections caused by all that i exposed before. This command allows you to visialize how will be in memory the program, but its information in not important for its load. We won't even need to modify the section header for insert more executable code in the file. The Program Header is the true informer of the load process. It contains 5 entries, but it's possible to insert more.
So one solution for insert more executable code could be the expanding of the data segment. This is problematic, because if we copy all the viric code to the end of the executable, i.e. just after the section header, and we expand the entry of the Program Header that corresponds with the data segment, the viral code would overwrite one logical section of the archive, the .bss section. As we had seen with the gdb dump, the .bss section is the last one that is part of the space of the process, and contains the ALLOC attribute, however it doesn't contains the LOAD attribute, so it doesn't load data from the file. This is caused by the fact that the .bss section contains uninitialized data (still) by the host code. If the viric code is mapped over that section is not very problematic, because the virus will be executed before the infected host, so after the virus execution, the host wouldn't care about it. This section, at load time, if filled of zeroes, so a bad programming, like suppose an uninitialized variable set to 0, would show the presence of the virus. In any case,the virus can avoid this copying itself to any other memory address, and filling its old position in .bss with zeroes.
Another possibility could be to create another entry in the program header, but we would have to shift almost all the archive, and this would take too much infection time.
If we execute this in a directory where is the gzip file, we will obtain the following message in the screen:
If we execut the gzip, we will obtain this:
HELLO WORLDgzip: compressed data not written to a terminal. Use -f to force compression. For help, type:gzip -h
As you can see,the viral code is executed before the host, and after that it returns to it the control without any kind of dificulty.
However there are other methods that allow the infection without expanding any section of the Program Header. The Staog virus and the Elves virus use alternative methods.
Staog, for example, overwrites the entrypoint of the host with the code of the virus, and the overwritten code is copied to the end of the host. The virus, when receives the control at the execution moment, opens the file (for know the name it takes a look in the stack),takes the code of the virus and make a temporal file in the /tmp directory. After doing that, it calls to fork and while an execution thread is executing the viral code of the temporal archive by meand of execve, other execution thread copies that code to the stack of the program and give the control to that code, that will rebuild the code of the host, and return the control to the original entrypoint.
Elves, however, made by Super of the group 29A, uses a method much more advanced that makes perprocess residency, and avoids that the infected files grow up in size (cavity infection).
NOTE: For more information about perprocess residecy and the structure and use of the PLT, take a look to the article of perprocess residency.
The method consists in introduce the viral code in the PLT. The PLT is a necessary structure of the executable that allows the dynamic link of the functions. For that it doesn't move the PLT to other part of the executable or anything similar, the viral code overwrites it, but it continues working perfectly.
As i will explain in the article about PerProcess residency, there're 2 ways to make a call to a library: by means of the dynamic linker (when we don't know what's the address of the function), or directly with a specific entry for that function in the PLT (when we've already obtained in the GOT the address). After Elves infection, the second method is disabled, and all the calls are made by means of the dynamic linker. The virus overwrites from the second entry, leaving untouched the first one (the one that makes the jump to the dynamic linker).
As we can see in the article about PerProcess residency, an entry in the PLT has the following form:
As you can see, it's not a very optimized code, the first jump would occupy 5 bytes, the push other 5 bytes, and the next jump another 5 bytes, so the entry would have 15 bytes. So the virus is divided in blocks of 15 bytes, and this allows a sequential execution of the code in a normal way, but in the case that we try to make a jump to the beginning of a PLT entry, it would found a jmp previous_PLT_entry codified only with 2 bytes, with the opcodex 0xEB, 0xEE.
Let's see an example:
By means of that, when a jump to any PLT entry is done, the execution thread would find miraculously 0xEB opcodes, that will go making little jumps until the virus_start label. From here, the virus will be execute sequentially garbage opcodes like sub ebx,-22 that really are hiding a jmp PLT_entry, and after trying to infect the first call to each system call, it makes a jump to the first PLT entry, so it jumps to the dynamic linker.
I received the source code of this virus for test it, and painfully, in my Linux version it is not functional (Debian 2.0.34). This is because Super, with his needs of optimizing in space the virus, makes the following code for push the reloc entry and avoid to put a push each entry (that would have make him to break the virus in fragments even smaller):
The dynamic linker need entries in the .reloc.plt section for know what address it needs to resolve. For that, it supposes that the consecutive entries of the PLT will have consecutive entries in the .reloc.plt section, and if fact, that's true. If we take a look to any PLT, the compiler puts in the first PLT a
PUSH 0x00, in the second PLT a
PUSH 0x08, in the third a
PUSH 0x10, and so on. This is not really a problem, the real problem is to suppose that all the calls to the PLT are done with a
CALL immediate (being the immediate a 4 bytes value). When we do a
CALL in assembler, the processor pushes the return address on the stack (i.e. the address of the next instruction of the call). The virus, as we can see, reads from the stack that value, substracts to it a 4 (the size of the immediate) and reads the value pointed by that address (the next code after the call). To that value, it substracts the PLT address, so we obtain the difference of bytes of the PLT entry we've called to, and the beginning of the PLT, and with that value, it obtains the entry value in the reloc section with a simple rotation opcode. This method is okay if we only make calls with the opcode
CALL immediate. This might be true, for example in the newest Linux versions,but for example my Linux version makes jumps to the PLT of the host only with the opcode
CALL *EBP also this instruction is not codifies in host's code, it's done by the dynamic linker even before the host takes the control (i still don't know why).
Anyway this method is very interesting and useful.
The resident viruses in Ring-0 are those that achieve maximum privileges in the processor, and already in Ring-0 they hook the system calls made by all the processes of the system.
For achieve Ring-0 an user process should try to make various things: it could try to modify the IDT for generate a TRAP GATE, modify the GDT or the LDT for generate a CALL GATE, or even patch code in Ring-0, so as our code would receive the execution thread already in Ring-0. Wihtout any doubt, it seems a hard work, because all those structures are or should be protected by the OS.
But in systems like Windows 95, where code like this (used by the CIH virus) allows us to jump to Ring-0 without difficulty:
What makes possible that this code works in Windows? The answer is simple, firstly Windows can directionate with user selectors the kernel memory, also (and besidess it seems incredible) lacks of protection by pagination in addresses superior to 0xC0000000, that lies, as linux, the code executed in Ring-0.
So if we can directionate the IDT memory, and also we can write there, the jump to Ring-0 is easy. In this example we have chosen the int 0x05 because it is already a TRAP GATE in Windows,that's why we only modify the IDT entry and instead jump to the memory address assigned by windows, it would jump to our label ring0code inside the perprocess memory of our process.
However, in Linux we can't directionate the user memory with Ring-0 selectors so we couldn't do the jump in case that we could directionate the kernel memory and the pagination protection would be deactivated, the modification of the IDT wouldn't be enough. If we modify the int 0x5 entry of the IDT for generate a TRAP GATE, we wouldn't be able to use the Ring-0 selector of Linux (0x10). In the IDT we would find the address 0x10:ring0code for make the jump, but that address doesn't point to the PerProcess memory; in fact the base address of the 0x10 segment is 0xC0000000, really we would be jumping to the address 0xC0000000+ring0code.
Let's see where lies the IDT in Linux. Compile the next code with NASM:
Executing this step by step, and reading the value stores in 'data_', we get the following memory dumps. (0x80495ED = address of 'data_' variable):
Dump after SIDT (gdb)x/2 0x80495ED 0x80495ed <data_>: 0x501007FF 0x0807C180 Dump after SGDT (gdb)x/2 0x80495ED 0x80495ed <data_>: 0x6880203F 0x0807C010 Dump after SLDT (gdb)x/2 0x80495ED 0x80495ed <data_>: 0x688002Af 0x0807C010
The first and the second assembler opcodes return in the first 16 bits of 'data_' the IDT and the GDT limits respectively, and in the next 32 bits the lineal address of that structures. Meanwhile, the SLDT only returns a selector that points to its descriptor inside the GDT (each LDT must have defined a descriptor in the GDT).
So we know that the IDT has as base address 0xC1805010 and its limit is 0x7FF bytes. The GDT will have as base address 0xC0106880 and will have a size of 0x203F bytes. And of the LDT we know that its descriptor is 0x2AF. As we were expecting, the addresses are all above 0xC0000000, so they are well protected from the user-processes.
Another way for access the kernel memory could be to map kernel pages below 0xC0000000, but painfully, that is not possible because the page table is mapped above the 0xC0000000 address, so it can't be modified by Ring-3 processes. Linux maps all the physical memory of your machine parting from the linear address 0xC0000000, or, with another words, the virtual address 0x0 using the kernel segment 0x10. We can build a module for read the CR3 reg, that contains the physical address of the page table, and with that info, visualize the mapped pages. The program would be the following one:
After the execution of this program we can get the mapped pages in that moment, and the protection attributes of each page.
The first page we would see would be the read-only pages of the process being executed in Ring-3 on the address 8040000 with read only attributes and the user bit, the next ones would be the read/write pages of the executable, with user attributes too. After, in the 40000000 address we would have the library libc mapped in memory in a similar way: first r/w code, and after, some read only pages. When we arrive to the linear address 0xC0000000 we enter the marvelous world of the core, where is mapped all the physical memory of your PC. If it's Pentium or higher, it will use 4M pages. So, if you have 16 megs of RAM, from the 0xC0000000 address, Linux would use 4 entries in the directory table for map those 16 megs, if it would have 32 it would use 8, etc.
This system guides us to make ourselves some questions, like, for example, what would happen if we got 1G of physical memory? In these pages lies the code of the core, aswell as the page table, and surprisingly it lacks of protection via pagination, uses r/w attributes and the user bit for mark the page, so the bad-coded modules that try to overwrite the code of the core would achieve such goal without making any protection fault :)
But that's not all, after map all the physical memory of the machine. It maps some 4Kb pages, all with system attributes, all except one, used for store the IDT (interrupt table) that is the only one with read-only attributes and the S bit, so any bad-coded module that could try to overwrite it, wouldn't achieve that, and would die by a protection fault, and the system would remain stable.
The fact that any Ring-0 process is not able to modify a read-only page is handled by the WP bit of the control register CR4. If that bit is set to 1, then all the Ring-0 processes won't be able to write in read only pages, neither user, neither kernel. If that bit is set to 0, the memory protection works like a 386 and a Ring-0 process can do whatever it wants to, being able to modify all mapped pages, no matter of their protection attributes. So, if a Linux module wants to modify the IDT, will firstly have to deactivate the WP bit of the CR4 reg for be able to write, or modify the page attributes of that page in the page table.
Because all the said, the real mechanism of protection in Linux is the segmentation, and not the pagination as it occurs in Windows NT. If we would have 4G segments, as in NT, and the pagination would be as ids, we would have free access to kernel memory, but this is not the case.
NOTE: Actual versions as 2.2.XX of the core use a protection similar to NT with 4G segments, painfully i haven't been able to look at the page table of that version,but it's a fool thing to think it remains stable
Another possibility of achieve Ring-0 in Linux consist is the call to the system call modify_ldt for generate a CALL GATE. That system call was created for make WINE to be able to emulate windows' memory system, where the user segment descriptors lies at the LDT and not in the GDT, and where it's possible to directionate all the memory with those segments. Generate a CALL GATE with modify_ldt could be possible if we were able to write to every fields of each generated entry, but that's not possible. Firstly, modify_ldt doesn't accepts as an entry an INTEL segment descriptor, it uses this pseudo structure that will be later translated to a descriptor with INTEL format inside the call:
If we see the code of the call in /usr/src/linux/arch/i386/kernel/ldt.c, this code shouws us the transformation of that structure to an INTEL descriptor:
ldt_info is the structure we have passed as a parameter,and *lp is a pointer inside the LDT where resides the segment entry we want to modify. Seeing the structure of an INTEL entry we can see the transformation:
63-54 55 54 53 52 51-48 47 46-45 44 43-40 39-16 15-0 base G D R U limit P DPL S type base limit 31-24 19-16 23-0 15-0
With the *lp we fill the 32 first bits of the entry, corresponding to the 16 first bits of the limit and the 16 first bits of the base address, and with *(lp+1) we fill the rest of the information. But after make all the operations with ldt_info, there is an OR operation with the 0x7000 constant. Passing this constant to binary we got 0111000000000000, so we know that always the generated descriptors will have the bits 44, 45 and 46 actives. Those bits correspond with the DPL and the S bit. So we could only create user segments. That doesn't matter, because the segment must be of user for allow its execution by an user, But the next bit, the S bit, has a lot of importance. The bit S is 1 when is a normal segment, and is 0 when a segment is of system like the TSS or the CALL GATES, so the generation of CALL GATES is impossible with the modify_ldt function. Modify_ldt also limits the creation of segments of limit over 0xC0000000, thing that would allow to directionate kernel's space. Modify_ldt checks the limit of the segment we want to create with the limits_OK function, and returns a boolean value as it can ve seen in this instruction. Last would be the last accessible byte by the segment, and first the first one, and the constant TASK_SIZE takes the value 0xC0000000.
If we can't write in the IDT, the GDT, the LDT, or the page table for jump to Ring-0, and the call modify_ldt is limited for the generation of CALL GATES, another possibility is to use virtual files for access kernel memory. This has a very important problem, and it's that files as /dev/mem and /dev/kmem are only accessed, by default, by the root. However, it's one of the choices more interesting for the creation of global residents under Linux. Staog is one of the few viruses for Linux that uses this method, also it doesn't wait the root to execute it, as it uses 3 different exploits for access /dev/kmem, but the exploit usages limits it's functionality to few kernel versions. The /dev/kmem allows the access of kernel memory, the first byte of that segment is the same of the first byte of kernel's segment or, what it's the same, the linear address 0xC0000000.
This code is very important, because we call to mprotect for unprotect the memory pages used by the virus. This is done for avoid the modification of the ELF file, and put the data of the virus in a data section and the code in one of code. In this way, we can put all the data of the virus in the same page, and it doesn't matter if the virus is in a code section, at the execution time, it unprotects it.
NOTE: It is only possible to execute mprotect inside the PerProcess memory.
The first it's going to try is to reserve some kernel memory for copy the virus code there, and after will modify the sys_call_table entry that corresponds to the execve for put instead it a pointer to the hooker routine of such function. For reserve memory inside the kernel, it's only possible with kernel internal calls like kmalloc. For be able to execute it, the virus overwrites the system call uname using /dev/kmem, and makes a call to uname with the int 0x80 when it before returning from the interrupt, and it would have already executed the code we used to reserve memory with kmalloc. But before all that, it needs to know uname address. For that, the virus uses the system call get_kernel_syms, with it, it can obtain a list with all the internal Linux functions, and also pointers to structures as the said sys_call_table, that is an array in memory with pointers to the accesible functions with int 0x80, like uname function.
At this point the virus knows the memory position of the sys_call_table
The base of this method of residency consists in the hook of routines in Ring-3 and that are executed by all the processes.
The code of Ring-3 that can be executed by all the processes are the libraries, in windows are the DLLs.
Windows, for example, distributes its space in 4 arenas, each arena has a different utility and has differend code and data. There is one arena dedicated to DOS that goes from the virtual address 0 to 40000000, another one dedicated to the PerProcess memory, that goes from 40000000 to 80000000, another that handles the shared memory by all the processes that goes from 80000000 to C0000000, and another dedicated to VXD, i.e. kernel's code, that is executed in Ring-0 and goes from C0000000 to FFFFFFFF.
The most important library in windows is the KERNEL32.DLL, and there are the functions of file creation, memory handling, etc. (in linux the equivalent could be the library libc).
The files, instead of execute directly TRAP GATES for make the calls to Ring-0 code, use a dynamic link mechanism for jump to library's code (Ring-3 code) that do the jump to Ring-0 for obtain the desired kernel service. Windows 95 commited a great dessign fail, and it is the fact that it loads the majority of libraries in the shared memory arena (KERNEL32 library is load at BFF70000 address). To locate the most important libraries into a shared memory arena has the advantage that the system doesn't have to load the library with each file that imports calls to that library, because it's in the process memory. This fact also makes possible the hook of system calls without the need of jump to Ring-0. Viruses like Win95.HPS and Win95.K32 use this fact for achieve global residence without jumping to Ring-0. However this is not as easy as it gets, because even if the kernel doesn't have protection by pagination, the files have protection by pagination in the code sections (for handle the try of write into the code sections). However, this could be unprotected easily using VXD calls like _pagemodifypermissions or library calls like memoryprotect.
In Linux we could try to hook functions like execve of the libc library, located from the virtual address 0x40000000. Any try of a program of write to protected pages will mean protection faults, because there is pagination prtoection in the code sections, as in the code sections of the normal executables. But the function mprotect also works with library's code, because these are located below 0xC0000000, in the PerProcess memory. Code as the one that follows allows you to unprotect pages of libraries like libc. As we saw in the introduction, the address of the getpid function of libc its loaded in the address 0x40073000 in my Linux version, so we know that it's a code section, so it would be protected againist write attempts.
Note that this program without using mprotect would generate a general protection fault. Now try to execute simultaneously 2 copies of the program. The first page would unprotect a libc page and modify the first bytes of the call to getpid putting them to 0; the second copy is stopped by gdb in the main position for test what value is in the 0x40073000 address. The value won't be 0, it would be the original value.
This is because Linux doesn't load its libraries in shared arenas, it loads them in the PerProcess memory. But if the PerProcess memory is different for each process, do the libraries get loaded with each executable, occupying unnecessary memory? The answer is NO, the solution is in the copy-on-write mechanism that allows the sharing of read/write memory pages between different processes, when these pages are in the memory of the process. When the program is load in memory, in the 0x40073000 address will be the memory page of the parent program, and if we try to write in it, the system will verify if it's a read/write or read only page. If it's read-only, the system will generate a page fault, and if it's read/write, the OS will generate a copy of that page for the child process, so when the program writes on it, it's really writing to an own page, not to the parent page. This method allows the share of libraries in memory, preserving the security, avoiding undesired attempts of global residency. Linux implements shared memory, but it's only for inteprocess communication mechanisms (IPC).
As i explained in the chapter of ELF infection, the ELF format is a very potent format, and between its important funcitonalities resides the dynamic link of functions.
The Linux executables don't usually use the int 0x80, they leave that job to libraries like libc. With the usage of libraries we earn disk space, because that code is not inserted inside the executable each time. But these libraries can be loaded in any address of the PerProcess memory. This makes necessary the existence of one mechanism that allow the call to functions in files or different libraries, this mechanism is the dynamic link.
There are 2 main sections that are there for make the dynamic link of functions. The section .plt (Procedure Linkage Table) and the section .got (Global Offset Table).
Linux's dinamic link system had advantages among all the other systems. The PE format of Windows, for example, has specific sections for the linkage such as the Import Table, in it there are as many entryes as functions imported from libraries, and that references are resolved at load-time. In Linux, however, doesn't resolve them in load-time, it waits for the first execution of a system call for resolve the reference of that function. With the first execution, the program gives the control to the dynamic linker, that is a function inside the library we want to call, then the linker resolves the refernce and puts the absolute address of the system call in a table in memory called .got, so the next functions will jump directly to the function without needing to call previously to the dynamic linker. With that,we make better the system productivity avoiding to have to resolve that memory reference that maybe the executable won't execute. If we disassemble the next executable...
We obtain the following assembler code
0x8048480 <main>: pushl %ebp 0x8048481 <main+1>: movl %esp,%ebp 0x8048483 <main+3>: call 0x8048378 <getpid> 0x8048488 <main+8>: call 0x8048378 <getpid> 0x804848d <main+13>: movl %ebp,%esp 0x804848f <main+15>: pop %ebp 0x8048490 <main+16>: ret
The calls to GETPID will be built as a jump to an entry in then .plt section, as we can see with the command "info file", the section .plt is mapped between 0x08048368 and 0x080483C8. If we continue tracing inside the .plt code we will see the following code:
0x8048378 <getpid>: jmp *0x80494e8 0x804837e <getpid+6>: push $0x0 0x8048383 <getpid+11>: jmp 0x8048368 <_init+8>
This will be the basic structure of a .plt entry. The first jmp will be a jump to the address contained in the address 0x80494E8. This address is part of the .got table, and in the load-time will have the value 0x804837E.
(gdb)x 0x80494e8 0x80494e8 <__DTOR_END__+16>: 0x0804837e
As it's the first time we call to GETPID in the executavle, this will have to make a jump to the dynamic linker for obtain the address of the function in the library. For that it makes a push 0x0, where 0x0 is the pointer inside the reloc area that specifies to the dynamic linker what's the .got entry it has to modify. After, it makes a jmp 0x8048368, where 0x8048368 is the address of the first entry of the .plt section. The first entry of the .plt is special, because it's only used for call to the dynamic linker. If we contine debugging, we'll see the structure of the first .plt entry.
0x8048368 <_init+8>: pushl 0x80494e0 0x804836e <_init+14>: jmp *0x80494e4
Firstly, it puts on stack the value 0x80494E0, that corresponds with the 2nd entry in the .got table, and after it makes a jump to the address contained in 0x80494E4 (the third entry of the .got). The 3 first entries of the .got doesn't contain pointers to the .plt at load-time, they are special entries. The first one contains a pointer to the .dynamic section, and the third one is filled with a pointer to the position of the dynamic linker.
(gdb)x 0x80494e4 0x80494e4 <__DTOR_END__+12>: 0x40004180
So if we continue tracing, we'll see the code of the dynamic linker, already in the memory space of the library. When the program returns from the system call, in the .got section corresponding to GETPID, the linker will have put the absolute address of the function. If we continue tracing, in the second call to GETPID, we could see the new value in the .got section.
(gdb)x 0x80494e8 0x80494e8 <__DTOR_END__+16>: 0x40073000
so, with the instruction jmp *0x80494E0 we will jump directly to the function without calling to the dynamic linker.
This mechanism allows the hook of system calls inside the memory of the own process, it's the denominated PerProcess residency. A virus with this mechanism can hook, for example, the execve call, modifying the .plt entry that corresponds with that call, exchanging the jmp *address_in_got by a jmp *virus_address. However, the virus, being executed in Ring-3, will have the eternal limitations in the file access, and will be only able to infect the files the user can have access to. Another limitation is that it only hooks system call in contaminated files. Clean files being executed won't have their calls hooked by the virus.
However, the possibilities of this method are really impressive, if a command interpret like bask or sh is infected, then, because they are commands executed by all users, the hook of execve in a PerProcess way could be as effective as a global residency.