I am trying to implement a minimal kernel and I am trying to implement the clone syscall. In the man pages you can see the clone syscall defined as such:
int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...
/* pid_t *parent_tid, void *tls, pid_t *child_tid */ );
As you can see, it receives a function pointer. If you read the man page more closely you can actually see that the actual syscall implementation in the kernel does not receive a function pointer:
long clone(unsigned long flags, void *stack,
int *parent_tid, int *child_tid,
unsigned long tls);
So, my question is, who modifies the RIP register after a thread is created? Is it the libc?
I found this code in glibc: https://elixir.bootlin.com/glibc/latest/source/sysdeps/unix/sysv/linux/x86_64/clone.S but I am not sure at what point the function is actually called.
Extra information:
When looking at the clone.S source code you can see that it jumps to a thread_start branch after the syscall. On the branch after the clone syscall (so only the child does this) it pops the function address and the arguments from the stack. Who actually pushed these arguments and the function address on the stack? I guess it has to happen somewhere in the kernel because at the point of the syscall instruction they were not there.
Here is some gdb output:
Right before the syscall:
[-------------------------------------code-------------------------------------]
0x7ffff7d8af22 <clone+34>: mov r8,r9
0x7ffff7d8af25 <clone+37>: mov r10,QWORD PTR [rsp+0x8]
0x7ffff7d8af2a <clone+42>: mov eax,0x38
=> 0x7ffff7d8af2f <clone+47>: syscall
0x7ffff7d8af31 <clone+49>: test rax,rax
0x7ffff7d8af34 <clone+52>: jl 0x7ffff7d8af49 <clone+73>
0x7ffff7d8af36 <clone+54>: je 0x7ffff7d8af39 <clone+57>
0x7ffff7d8af38 <clone+56>: ret
Guessed arguments:
arg[0]: 0x3d0f00
arg[1]: 0x7ffff8020b60 --> 0x7ffff7d3fb30 (<do_something>: push rbx)
arg[2]: 0x7fffffffda90 --> 0x0
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffda78 --> 0x7ffff7d3f52c (<main+172>: pop rsi)
0008| 0x7fffffffda80 --> 0x7fffffffda94 --> 0x73658b0000000000
0016| 0x7fffffffda88 --> 0x7fffffffda94 --> 0x73658b0000000000
0024| 0x7fffffffda90 --> 0x0
0032| 0x7fffffffda98 --> 0x492e085573658b00
0040| 0x7fffffffdaa0 --> 0x7ffff7d3f0d0 (<_init>: sub rsp,0x8)
0048| 0x7fffffffdaa8 --> 0x7ffff7d40830 (<__libc_csu_init>: push r15)
0056| 0x7fffffffdab0 --> 0x7ffff7d408d0 (<__libc_csu_fini>: push rbp)
[------------------------------------------------------------------------------]
After the syscall instruction on the child thread (check the top of the stack - this does not happen on the parent's thread):
[-------------------------------------code-------------------------------------]
0x7ffff7d8af25 <clone+37>: mov r10,QWORD PTR [rsp+0x8]
0x7ffff7d8af2a <clone+42>: mov eax,0x38
0x7ffff7d8af2f <clone+47>: syscall
=> 0x7ffff7d8af31 <clone+49>: test rax,rax
0x7ffff7d8af34 <clone+52>: jl 0x7ffff7d8af49 <clone+73>
0x7ffff7d8af36 <clone+54>: je 0x7ffff7d8af39 <clone+57>
0x7ffff7d8af38 <clone+56>: ret
0x7ffff7d8af39 <clone+57>: xor ebp,ebp
[------------------------------------stack-------------------------------------]
0000| 0x7ffff8020b60 --> 0x7ffff7d3fb30 (<do_something>: push rbx)
0008| 0x7ffff8020b68 --> 0x7ffff7dd5add --> 0x4c414d0074736574 ('test')
0016| 0x7ffff8020b70 --> 0x0
0024| 0x7ffff8020b78 --> 0x411
0032| 0x7ffff8020b80 ("Parameters: 0x7ffff7d3fb30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7fffffffda94\n")
0040| 0x7ffff8020b88 ("rs: 0x7ffff7d3fb30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7fffffffda94\n")
0048| 0x7ffff8020b90 ("fff7d3fb30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7fffffffda94\n")
0056| 0x7ffff8020b98 ("30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7fffffffda94\n")
[------------------------------------------------------------------------------]