If you want to step back to the days before the high level languages had been prevailed, nevertheless still want to write somehow practical in a sense in this era, writing a piece of injection code is a good target.
Windows supports functionality where you can read or write process memory of other process(ReadProcessMemory
/WriteProcessMemory
) and set a thread which runs codes on other process (CreateRemoteThread
).
If you attempt to set a remote process under your control, injecting your own dll onto the process is one of the easiest way.
LoadLibrary
& GetProcedureAddress
on kernel32
/kernelbase
will handle nicely.
If you assume the remote process holds virtual address which mapped identically on its own process & kernel32.dll is mapped, calling them will be a piece of cake. But if neither of them met on a process, how do you deal with it?
The answer is you can call LdrLoadDll
(& not mandatory butLdrGetProcAddr
) on ntdll instead on the remote process. But how can you call them from entire scratch after setting a thread on the process.
To let the work done, you need to keep in you mind following 2 stuffs.
1. Search a module handle & a function address on it
2. Call a function on pre-built dll properly.
if you do not want to read anymore, but just want to know, take a look at here (https://github.com/Hiroshi123/random)
1. Search a module handle & a function address on it
When I investigated a famous post-exploitation tool named mimikatz(https://github.com/gentilkiwi/mimikatz), I knew their essential functionality(sekurlsa) for LSA dumping was supported by memory reading on a remote process (readprocessmemory
).
Roughly, the dumping is done as follows.
Read PEB(process environment block)
- OpenProcess(query_limited,,)
- QueryInformationProcess(Basic_Information)
gets you the base of peb.
Read PEB_LDR_DATA
emulate loaded dll module
Reach each image base & read the image information
lsass.exe holds lots of dll, and some of them are security support providers which have some hidden credential information on memory. After reaching image mapped section, mimikatz steals credentials based on pre-examined.
This scheme is partially useful in terms of searching a loaded module handle & a function address given an ansi-string query.
Namely, it means you first access PEB acquired from its address by NtQueryInformationProcess
, then go to image base of each mapped dll in this case, ntdll, then go to image export directory to find function address.
Note that unless you are in a condition where particularly not been allowed to write-and set thread on a remote process,
searching through ReadProcessMemory
is not favoured.
This is because the performance is worse in this way. You need to accumulate lots of its calls because every time you come across another pointer access on remote process, you have to issue another call for it.
Setting a or multi remote thread is letting things done faster.
Below is the function _get_ntdll_faddr_1
which serve its purpose when you want to search a specific address of function on ntdll
given a string query which might match the function name.
uint8_t* v = _get_ntdll_faddr_1("LdrLoadDll"); printf("module handle : %p\n",v);
You provide an ASCII name of a function as a query on ntdll, then it gets you back the address if it exists.
You might think this is almost identical with GetProcAddress
. Indeed, the difference is you can execute it in a condition on a remote process where you are not sure when the address of GetProcAddress
is.
Let's dig a little bit about its implementation.
It is written by assembly and memory access on static area is not allowed on it for torelation to be executed with a thread on a remote process.
Here is a function tree.
That is,
- Grab the module handle of ntdll(
_get_ntdll_handle
) - Get a function address which corresponds to a given query(
_get_faddr_from_module_handle
)- Reach an image export directory from a module handle on an image.(
_get_export_entry_from_handle
) - Get a function address by name combining tables on an image export directory(
_get_faddr_by_name
)- find the matched function name on an export name table and get the index.(
_get_findex_by_name
) - Get the function address accessing both name ordinal table & address table.(
_get_faddr_by_index
)
- find the matched function name on an export name table and get the index.(
- Reach an image export directory from a module handle on an image.(
_get_ntdll_handle: ;; mov rax, [fs:0x30] for 32bit ;;; 1 mov rax,[gs:0x60] ;;; 2 mov rax,[rax+0x18] ;;; 3 mov rax,[rax+0x10] ;;; 4 mov rax,[rax+0x10] ;;; 5 mov rax,[rax+0x20] ret
Note this code is only for 64bit.
Get gs register + 0x60. (Get PEB)
Get PEB_LDR_DATA
Get load order search path
Go to next pointer as first one is image of exe itself
Grab the image base of the dll(coming one after image section for exe is always ntdll)
Fairly simple! ,consuming only one register 30bytes written codes, no extra heap allocation.
After you get to the ntdll image base, no more worry to be needed to miss the function address.
Go to image export directory,
_get_export_entry_from_handle: mov rax,rcx mov rbx,0 mov ebx,[rax+0x3c] add ebx,0x88 ;; add rax,0x18 + 0x70 mov ebx,[rax+rbx] add rax,rbx ret
and get the function address by the provided function string.
iterating through image export name table -> ordinal table -> function table.
2. Call function on pre-built dll properly.
Your assembly can by no means be comprehensive by itself as pre-built codes had been generated by a compiler which follows a set of rules which are named calling conventions. Since most of codes are assumed to be built on MS-build, you need to understand Microsoft calling convention.(even when you directly call system call).
When you call a function on pre-built dll or .sys, what needs to be prepared from caller side depends on how many args are required.
- No args
With no arguments things are easy. Just set the function address on the code where you want to jump.
call_no_args push rbp call rax ;; where rax is the address of callee pop rbp ret
You do not need to push rbp
but rsp needs to be always 0x10 aligned in total.
That is because call
will shift rsp upper to negative side, and push
will compensate for the rest of 8byte.
- Less than 4 args
With less than 4 arguments, a bit more complicated.
Put 1st on rcx,2nd, on rdx, 3rd on r8, and 4th on r9.
Do keep in mind that you still need to allocate 0x20bytes on stack even though all of registers are passed via registers.
This 0x20 bytes are called shadow space where callee will be in charge.
_call64_less_than4_arg: push rbp mov r9,[_arg4] mov r8,[_arg3] mov rdx,[_arg2] mov rcx,[_arg1] sub rsp,0x20 call [_f_addr] mov [_ret],rax add rsp,0x20 pop rbp ret
- More than 5 args
The rest of args are conventionally allowed to be passed via a register but on a stack.
They should be allocated before shadow space is allocated. Treat that shadow space belongs to callee, but the stack for args belongs caller as the value on shadow space can be modified by callee but the value of arguments are not(memory pointed by them could be though).
In terms of stack allocation, if the number of args are 5, treat it as 6, if 7 then 8.
That means odds args always holds additional blank 8byte filling just before shadow space.
6 args case,
_call64_6_arg: push rbp sub rsp,0x8 mov rax,[rsp] mov rax,[_arg6] sub rsp,0x8 mov rax,[rsp] mov rax,[_arg5] mov r9,[_arg4] mov r8,[_arg3] mov rdx,[_arg2] mov rcx,[_arg1] sub rsp,0x20 call [_f_addr] mov [_ret],rax add rsp,0x20 pop rbp ret
Number of arguments can be long as long as the rest of stack lasts. To generalise it, stack allocation for more args should be done on a loop. Last but not least, use only volatile registers so that other functions assumes the rest of them as non-volatile ; as it was before call.
Codes are here(https://github.com/Hiroshi123/random/blob/master/call64.asm).