How an Exploit Works
Take any exploit downloaded from the internet that promises you an easy root shell on a remote machine, and examine its source code. Find the most unintelligible piece of the code; it will be there, for sure. Most probably, you will find a several lines of strange and unrelated symbols; something like this:
char shellcode =
This is shellcode, also sometimes referred to as “bytecode.” Its content is not a magic word or random symbols. This is a set of low-level machine commands, the same as are in an executable file. This example shellcode opens port 4444 on a local Linux box and ties a Bourne shell to it with root privileges. With a shellcode, you can also reboot a system, send a file to an email, etc. The main task for an exploit program is therefore to make this shellcode work.
Take, for example, a widely known error-buffer overflow. Developers often check data that has been received as input for functions. A simple example: the developer creates a dynamic array, allocates for it 100 bytes, and does not control the real number of elements. All elements that are out of the bounds of this array will be put into a stack, and a so-called buffer overflow will occur. An exploit’s task is to overflow a buffer and, after that, change the return address of system execution to the address of the shellcode. If a shellcode can get control, it will be executed. It’s pretty simple.
As I already said, this article is not a guide for writing exploits. There are many repositories with existing shellcodes (shellcode.org, Metasploit); however, it is not always enough. A shellcode is a low-level sequence of machine commands closely tied to a dedicated processor architecture and operating system. This is why understanding how it works can help prevent intrusions into your environment.
What Is It For?
To follow along, I expect you to have at least minimal assembly knowledge. As a platform for experiments, I chose Linux with a 32-bit x86 processor. Most exploits are intended for Unix services; therefore, they are of most interest. You need several additional tools: Netwide Assembler (nasm), ndisasm, and hexdump. Most Linux distributions include these by default.
The Process of Building
Shellcode stubs are usually written in assembler; however, it is easier to explain how one works by building it in C and then rewriting the same code in assembly. This is C code for appending a user into /etc/passwd:
char *filename = “/etc/passwd”;
char *line = “hacker:x:0:0::/:/bin/sh\n”;
f_open = open(filename,O_WRONLY|O_APPEND);
write(f_open, line, strlen(line));
All of the code is pretty simple, except maybe the open() function. The constant O_WRONLY|O_APPEND given as a parameter opens the file fact for writing and appends the new data to the end of the file.
Here is a more usable example: executing a Bourne shell:
name = “/bin/sh”;
name = NULL;
The setreuid(0,0) call attempts to obtain root privileges (if it is possible). execve(const char filename,const char argv, const char envp) is a main system call that executes any binary file or script. It has three parameters: filename is a full path to an executable file, argv is an array of arguments, and envp is an array of strings in the format key=value. Both arrays must end with a NULL element.
Now consider how to rewrite the C code given in the first example in assembly. x86 assembly executes system calls with help of a special system interrupt that reads the number of the function from the EAX register and then executes the corresponding function. The function codes are in the file /usr/include/asm/unistd.h. For example, a line in this file, #define __NR_ open 5, means that the function open() has the identification number 5. In a similar way, you can find all other function codes: exit() is 1, close() is 6, setreuid() is 70, and execve() is 11. This knowledge is enough to write a simple working application. The /etc/passwd amendment application code in assembly is:
filename db ‘/etc/passwd’, 0
line db ‘hacker:x:0:0::/:/bin/sh’,0x0a
mov eax, 5
mov ebx, filename
mov ecx, 1025
mov ebx, eax
; write(f_open, line, 24)
mov eax, 4
mov ecx, line
mov edx, 24
mov eax, 6
mov eax, 1
mov ebx, 0
It’s a well-known fact that an assembly program consists of three segments: the data segment, which contains variables; the code segment containing code instructions; and a stack segment, which provides a special memory area for storing data. This example uses only data and code segments. The operators section .data and section .text mark their beginnings. A data segment contains the declaration of two char variables: name and line, consisting of a set of bytes (see the db mark in the definition).
The code segment starts from a declaration of an entry point, global _start. This tells the system that the application code starts at the _start label.
The next steps are easy; to call open(), set the EAX register to the appropriate function code: 5. After that, pass parameters for the function. The most simple way of passing parameters is to use the registers EBX, ECX, and EDX. EBX gets the first function parameter, the address of the beginning of the filename string variable, which contains a full path to a file and a finishing zero char (most system functions operating with strings demand a trailing null). The ECX register gets the second parameter, giving information about file open mode (a constant O_WRONLY|O_APPEND in a numeric format). With all of the parameters set, the code calls interrupt 0x80. It will read the function code from EAX and calls an appropriate function. After completing the call, the application will continue, calling write(), close(), and exit() in exactly the same way.