Notes

Do not read codes but patch binary.

Copying instruction area & executing it on the fly

Calling a function means setting one of the registers onto which the instructions for the function are stayed.

Even when you call it at multiple times, the memory you access will never be changed by contrast with the allocation of the footprint of the computation (which is called often stack or heap).

We often pay too much attention to the memory where the actual computation is done because they are always so dynamic that you need to keep track of all the stuffs which had been done, which is ongoing, and which are going to be performed, especially you allocated something on memory outside of stack area.

But, these dynamic computation; jump, allocation and deallocation, is just governed by another special area which is often called code area, and they are mapped onto somewhere on virtual memory in a same manner that other areas are allocated. The each role that code area and execution area is in charge of seems to be distinct by nature. One is statically telling CPU what is done next and another is engaging on it's embodiment or performance. Of course, since the performance is influenced from running environment which is varying constantly such as external file which will be opened, sets of runtime arguments, or environment variables, the patterns of accessing the memory of code area will get some feedbacks from it which we often call it branching. Nevertheless, it's nature is distinctly split as one is statically recognised by CPU, another is dynamically following it.

Self-modifying code is something which are different from the perspective we saw above. Codes can be constructed on its running time and executed on the spot which makes us think computation as much more colourful and flexible as it was thought.

Today, I will introduce a piece of very concise code which are copied from text area on the fly to wherever you like and executed.

when you write a function,

int f1(){
   return 1;
}

They are translated to binary and sit onto a chunk of memory in a pretty obedient manner. The memory where the binary sits is not user-defined but instead mapped by kernel and kept tracked by libc.so as a chain of dynamic shared objects which is hidden on libc. What you are able to know is just the head of it,

printf("0x%x",&f1);

then, you will see the starting address.

if you want to know what was pasted as binary figure on this function, then you should define another function just under the function(how each functions are aligned depends on compiler..), and examine the data in between.

int f1(){
   return 1;
}

void f2(){}

char* start = &f1;
char* end = &f2;
for(;start != &end;start++)
     printf("0x%x",*start);

When kernel maps these instructions, you are often not allowed to write your arbitrary data to the memory.

*start = 1;

Instead, you can make executable mapping, pasting them and can run on it eventually.

char* ret = (char*) mmap( (void*)0, 4096*1, PROT_READ|PROT_WRITE|PROT_EXEC,
                                              MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0);

memcpy(ret, ptr, len_f1);
int (*_f1)() = (int (*)())(ptr);
printf("%d\n",_f1());

I allocated executable memory on the very beginning of virtual memory as libc often allocates program to higher memory.

Of course, you can modify its binary as you love.

This is just the very trivial example of self-modification code but somewhat may blow your mind, that was my intention of this writing.

Entire code snippet is here.


#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/mman.h>

// binary(instruction) which are mapped.
// 0x55(push esp)
// 0x48,0x89,0xe5(mov %esp %eip)
// 0xb8,0x01,0x00,0x00,0x00(mov:immidiate assignment to %eax)
// 0x5d(leave)
// 0xc3(ret)
// this binary is mapped on to a mapped memory.
// if you modify any of them, it will change its behavior.
int f1() {
  return 1;
}

// binary(instruction) which are mapped.
// 0x55(push esp)
// 0x48,0x89,0xe5
// 0x89,0x7d,0xfc(first argument handling)
// 0x89,0x75,0xf8,0x8b(second argument handling)
// 0x55(push esp)
// 0xfc,0x8b(add %eax)
// 0x45(pop)
// 0xf8,0x01,0xd0(mov)
// 0x5d(leave)
// 0xc3(ret)
int f2(int a, int b) {
  return a+b;
}

// main has lots of binary.
int main(int argc, char **argv, char **envp) {

  for (;*envp;envp++) {
    if (*envp == "DONE")
      exit(1);
  }

  char* ptr;
  // grab a head address of area where instruction code is packed.
  ptr = (char*)&f1;
  
  // allocate a memory where you can execute and not yet mapped.
  // MAP_FIXED is not necessary according to your OS unless you map
  // ever-mapped & (m)protected area.
  char* ret = (char*) mmap( (void*)0, 4096*1, PROT_READ|PROT_WRITE|PROT_EXEC,
                MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0);

  // get the length of codes on the scope of function f1 checking the difference
  // between next function pointer and itself.
  const size_t len_f1 = (void*)&f2 - (void*)&f1;  

  // copy memory which contains instructions of f1,f2,f3, main from allocated by libc.so
  // to the executable memory.
  memcpy(ret, ptr, len_f1);

  // execute f1, notify what is returned.
  int (*_f1)() = (int (*)())(ret);
  printf("%d\n",_f1());

  // let the pointer ahead to proceed to next function.
  ptr += len_f1;

  // main function is also mapped with the rest of function.
  const size_t len_f2 = (void*)&main - (void*)&f2;
  memcpy(ret, ptr, len_f2);

  // allocate a pointer type on a current stack.
  int (*_f2)(int a1, int a2) = (int (*)(int,int))(ret);

  char* tmpr = ptr;
  for(;*tmpr;tmpr++){
    printf("0x%x\n",*tmpr);    
  }

  // print second function.
  printf("%d\n",_f2(1,2));

  // step pointer up again.
  ptr += len_f2;

  // at this time, we use strcpy
  // as there is no following function after main which is not obvious to know
  // when the binary of this function is split(in fact, there are ways to know the end of
  // program header asking libc).  
  strcpy(ret, (char*)&main);
  int (*_main)(int argc, char **argv, char **envp) = (int (*)(int ,char** ,char**))(ptr);

  // prevent the program from being endless loop setting fake environment variable.
  envp[0] = "DONE";
  // execute main function again but not exactly itself
  // as its instruction was copied and mapped to lowest virtual area.
  printf("%d\n",_main(0,argv,envp));
  
}