Notes

Do not read codes but patch binary.

DLL injection with just ntdll

If you want to step back to the days before the high level languages had been prevailed, nevertheless still want to write somehow practical in a sense in this era, writing a piece of injection code is a good target.

Windows supports functionality where you can read or write process memory of other process(ReadProcessMemory/WriteProcessMemory) and set a thread which runs codes on other process (CreateRemoteThread).

If you attempt to set a remote process under your control, injecting your own dll onto the process is one of the easiest way. LoadLibrary & GetProcedureAddress on kernel32/kernelbase will handle nicely.

If you assume the remote process holds virtual address which mapped identically on its own process & kernel32.dll is mapped, calling them will be a piece of cake. But if neither of them met on a process, how do you deal with it?

The answer is you can call LdrLoadDll (& not mandatory butLdrGetProcAddr ) on ntdll instead on the remote process. But how can you call them from entire scratch after setting a thread on the process.

To let the work done, you need to keep in you mind following 2 stuffs.

1. Search a module handle & a function address on it
2. Call a function on pre-built dll properly.

if you do not want to read anymore, but just want to know, take a look at here (https://github.com/Hiroshi123/random)

1. Search a module handle & a function address on it

When I investigated a famous post-exploitation tool named mimikatz(https://github.com/gentilkiwi/mimikatz), I knew their essential functionality(sekurlsa) for LSA dumping was supported by memory reading on a remote process (readprocessmemory).

Roughly, the dumping is done as follows.

  1. Read PEB(process environment block)

    • OpenProcess(query_limited,,)
    • QueryInformationProcess(Basic_Information)
      gets you the base of peb.
  2. Read PEB_LDR_DATA

  3. emulate loaded dll module

  4. Reach each image base & read the image information

lsass.exe holds lots of dll, and some of them are security support providers which have some hidden credential information on memory. After reaching image mapped section, mimikatz steals credentials based on pre-examined.

This scheme is partially useful in terms of searching a loaded module handle & a function address given an ansi-string query. Namely, it means you first access PEB acquired from its address by NtQueryInformationProcess , then go to image base of each mapped dll in this case, ntdll, then go to image export directory to find function address.

Note that unless you are in a condition where particularly not been allowed to write-and set thread on a remote process, searching through ReadProcessMemory is not favoured. This is because the performance is worse in this way. You need to accumulate lots of its calls because every time you come across another pointer access on remote process, you have to issue another call for it.

Setting a or multi remote thread is letting things done faster.

Below is the function _get_ntdll_faddr_1 which serve its purpose when you want to search a specific address of function on ntdll given a string query which might match the function name.

uint8_t* v = _get_ntdll_faddr_1("LdrLoadDll");
printf("module handle : %p\n",v);

You provide an ASCII name of a function as a query on ntdll, then it gets you back the address if it exists. You might think this is almost identical with GetProcAddress . Indeed, the difference is you can execute it in a condition on a remote process where you are not sure when the address of GetProcAddress is.

Let's dig a little bit about its implementation.

It is written by assembly and memory access on static area is not allowed on it for torelation to be executed with a thread on a remote process.

Here is a function tree.

f:id:vrodxda:20190711091922p:plain

That is,

  1. Grab the module handle of ntdll(_get_ntdll_handle)
  2. Get a function address which corresponds to a given query(_get_faddr_from_module_handle )
    • Reach an image export directory from a module handle on an image.(_get_export_entry_from_handle)
    • Get a function address by name combining tables on an image export directory(_get_faddr_by_name)
      • find the matched function name on an export name table and get the index.(_get_findex_by_name)
      • Get the function address accessing both name ordinal table & address table.(_get_faddr_by_index)
_get_ntdll_handle:
    ;; mov rax, [fs:0x30] for 32bit
    ;;; 1
    mov rax,[gs:0x60]
    ;;; 2
    mov rax,[rax+0x18]
    ;;; 3
    mov rax,[rax+0x10]
    ;;; 4
    mov rax,[rax+0x10]
    ;;; 5
    mov rax,[rax+0x20]
    ret

Note this code is only for 64bit.

  1. Get gs register + 0x60. (Get PEB)

  2. Get PEB_LDR_DATA

  3. Get load order search path

  4. Go to next pointer as first one is image of exe itself

  5. Grab the image base of the dll(coming one after image section for exe is always ntdll)

Fairly simple! ,consuming only one register 30bytes written codes, no extra heap allocation.

After you get to the ntdll image base, no more worry to be needed to miss the function address.

Go to image export directory,

_get_export_entry_from_handle:
    mov rax,rcx
    mov rbx,0
    mov ebx,[rax+0x3c]
    add ebx,0x88
    ;; add rax,0x18 + 0x70
    mov ebx,[rax+rbx]
    add rax,rbx
    ret

and get the function address by the provided function string.

iterating through image export name table -> ordinal table -> function table.

2. Call function on pre-built dll properly.

Your assembly can by no means be comprehensive by itself as pre-built codes had been generated by a compiler which follows a set of rules which are named calling conventions. Since most of codes are assumed to be built on MS-build, you need to understand Microsoft calling convention.(even when you directly call system call).

When you call a function on pre-built dll or .sys, what needs to be prepared from caller side depends on how many args are required.

  • No args
    With no arguments things are easy. Just set the function address on the code where you want to jump.
call_no_args
        push rbp
        call rax ;; where rax is the address of callee
        pop rbp
        ret

You do not need to push rbp but rsp needs to be always 0x10 aligned in total. That is because call will shift rsp upper to negative side, and pushwill compensate for the rest of 8byte.

  • Less than 4 args
    With less than 4 arguments, a bit more complicated.

Put 1st on rcx,2nd, on rdx, 3rd on r8, and 4th on r9.
Do keep in mind that you still need to allocate 0x20bytes on stack even though all of registers are passed via registers. This 0x20 bytes are called shadow space where callee will be in charge.

_call64_less_than4_arg:
    push rbp
    mov r9,[_arg4]
    mov r8,[_arg3]
    mov rdx,[_arg2]
    mov rcx,[_arg1]
    sub rsp,0x20
    call [_f_addr]
    mov [_ret],rax
    add rsp,0x20
    pop rbp
    ret
  • More than 5 args
    The rest of args are conventionally allowed to be passed via a register but on a stack.

They should be allocated before shadow space is allocated. Treat that shadow space belongs to callee, but the stack for args belongs caller as the value on shadow space can be modified by callee but the value of arguments are not(memory pointed by them could be though).

In terms of stack allocation, if the number of args are 5, treat it as 6, if 7 then 8.
That means odds args always holds additional blank 8byte filling just before shadow space.

6 args case,

_call64_6_arg:
    push rbp
    sub rsp,0x8
    mov rax,[rsp]
    mov rax,[_arg6]
    sub rsp,0x8
    mov rax,[rsp]
    mov rax,[_arg5]
    mov r9,[_arg4]
    mov r8,[_arg3]
    mov rdx,[_arg2]
    mov rcx,[_arg1]
    sub rsp,0x20
    call [_f_addr]
    mov [_ret],rax
    add rsp,0x20
    pop rbp
    ret

Number of arguments can be long as long as the rest of stack lasts. To generalise it, stack allocation for more args should be done on a loop. Last but not least, use only volatile registers so that other functions assumes the rest of them as non-volatile ; as it was before call.

Codes are here(https://github.com/Hiroshi123/random/blob/master/call64.asm).