I am familiar with two basic strategies for structuring the function prologue/epilogue:
- "Regular" functions: Move the stack pointer to the end of the stack frame (
sub rsp, n), do the actual work, then move the stack pointer back (add rsp, n) andret. (If there are many registers used in the body, there may additionally be some pushing and popping here.) - "Leaf" functions: Same as (1) but don't move the stack pointer, saving two instructions.
With strategy 2, you can't call functions inside the body, unless you move the stack pointer where it is supposed to be, which defeats the savings, which is why it's usually only used for leaf functions.
But it occurs to me that there is a third strategy one could use:
- "Stackless" functions: Use
mov rsi, AFTER; jump FUNCTION; AFTER:for the call sequence, and in the function justjump rsiat the end.
In this method, we ignore the stack pointer completely, so we have no stack space, but for a small function that might be doable. It also requires a custom calling convention, but compilers can do that if they want to for internal functions.
Since it pairs jump with jump, it doesn't touch the return stack so the branch predictor should not be thrown off (although the indirect jump at the end might be slower than a return), and there is no overhead for the memory writes incurred by call. Additionally, stackless functions can call other stackless functions (although not too much since you eventually run out of registers in which to store the return addresses, and there is a global optimization problem in ensuring that if A calls B then they use different return registers).
My question is: why don't compilers use method (3) more? AFAICT it doesn't ever show up when looking at functions compiled by gcc or clang. Is there a reason this calling convention is not usable?