0

If I have some non-inline function and C++ compiler knows that this function modifies some registers then compiler will save all necessary registers before doing function CALL.

At least I expect that compiler does this (saving) as far as it knows what registers will be modified inside called function.

Now imagine that my function modifies ALL possible registers of CPU (general purpose, SIMD, FPU, etc.). How can I enforce compiler to save everything what it needs before doing any CALL to this function? To remind, my function is non-inline, i.e. is called through CALL instruction.

Of course through asm I can push all possible registers on stack at my function start and pop all registers back before function return.

Although I can save ALL possible registers I would better prefer if compiler saves only necessary registers, that were used by function's caller, for performance (speed) and memory usage reasons.

Because inside my function I don't know in advance who will use it hence I have to save every possible register. But at the place where my function was used compiler knows exactly what registers are used in caller's function hence it may save much fewer registers needed, because for sure not all registers will be used.

Hence I want to mark my function as "modifying all registers" so that C++ compiler will push to stack just registers that it needs before calling my function.

Is there any way to do this? Any GCC/CLang/MSVC attribute of function? Or maybe listing all registers in clobber section of asm statement?

Main thing is that I don't want to save registers myself inside this function (for some specific reason), instead I want all callers to save all needed registers before calling my function, but I want all callers to be aware that my function modifies everything what is possible.

I'm looking for some imaginary modifies-all attribute like:

__attribute__((modifies_all_registers)) void f();

I did following experiment:

Try it online!

__attribute__((noinline)) int modify(int i) {
    asm volatile(
        ""
        : "+m" (i) ::
        "rax", "rbx", "rcx", "rdx", "rsi", "rdi", "rbp", "rsp",
        "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",
        "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
        "xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15",
        "ymm0", "ymm1", "ymm2", "ymm3", "ymm4", "ymm5", "ymm6","ymm7",
        "ymm8", "ymm9", "ymm10", "ymm11", "ymm12", "ymm13", "ymm14", "ymm15",
        "zmm0", "zmm1", "zmm2", "zmm3", "zmm4", "zmm5", "zmm6", "zmm7",
        "zmm8", "zmm9", "zmm10", "zmm11", "zmm12", "zmm13", "zmm14", "zmm15"
    );
    return i + 1;
}

int main(int argc, char ** argv) {
    auto volatile x = modify(argc);
}

in other words I asm-clobbered almost all possible registers, and compiler generated following push-sequence inside modify() (and also same pop sequence at the end):

        push    rbp
        mov     rbp, rsp
        push    r15
        push    r14
        push    r13
        push    r12
        push    rbx

nothing else was pushed, so I can see that somehow compiler (CLang) didn't care about other regiesters except rbx, rbp, r12-r15. Does it mean that there is some C++ calling convention that says that I can modify any other registers besides these few, without restoring them on function return?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Arty
  • 14,883
  • 6
  • 36
  • 69
  • 1
    It is compiler specific. Recent GCC can accept your [GCC plugin](https://gcc.gnu.org/onlinedocs/gccint/Plugins.html). Look at how `longjmp` is implemented.... – Basile Starynkevitch Dec 02 '21 at 17:50
  • 3
    Don't trust the compiler. Save the registers before you modify them, and restore before your function returns. – Thomas Matthews Dec 02 '21 at 17:57
  • @ThomasMatthews That was the main idea to enforce compiler somehow to do saving outside my function. Not to save 30-40 registers by myself. It is obvious that in caller's code only 5-7 registers are used. Hence it is much better performance and memory-wise to save those 5-7 instead of 30-40. I just want to enforce compiler to do this saving. Main idea of my question is that I can't afford saving these 30-40 registers, I will do this only if other solution is not possible. – Arty Dec 02 '21 at 17:59
  • 2
    @Arty: The compiler will only save registers, before the function call, that it knows will be modified. Usually these registers are for the passed parameters. If you look at the assembly language for functions, the compiler will also save registers before the first statement is executed. Thus the compiler is saving some registers before the invocation and some registers before the first statement. So your code, should save the registers that you are modifying in your function. The compiler, when calling your function, has no idea what registers will be used inside your function. – Thomas Matthews Dec 02 '21 at 18:05
  • 2
    The compiler has a calling convention regarding registers. Some registers are assumed to be general purpose and won't be saved. This may also include FPU and SIMD registers. You should research the compiler's calling convention. In assembly language, rules of safety say to save registers you will modify and restore them before function ends. – Thomas Matthews Dec 02 '21 at 18:08
  • @ThomasMatthews You're saying `the compiler has no idea what registers will be used` - that was the main purpose of my Question to find out how to say to compiler that I modify ALL, so that it actually has this idea. For example there could exist some imaginary function attribute like following `__attribute__((modifies_all_registers)) void f();` To remind - I can't afford in my code to save 40-50 registers, so I need workaround, to place this task on Caller. – Arty Dec 02 '21 at 18:12
  • 3
    You may want to ask yourself if you *really* need to save all those registers at once. Since you have limitations about saving registers, maybe only use a couple of registers. So, for example, the ARM has 16 registers. Instead of using all 16 at once, only use 4 at a time. I can use 4 of the other registers as "save" registers. This is all compiler specific and you'll need to check out the compiler's documentation *as well as the processor's instruction set*, to see if you really need to save all those registers at once. – Thomas Matthews Dec 02 '21 at 18:18
  • 2
    Lastly, if your code doesn't have room to save all these registers, what makes you think the compiler does? – Thomas Matthews Dec 02 '21 at 18:21
  • @ThomasMatthews I don't have room for all possible 40-50 registers, but I'm sure that all Callers of my function will use only 5-7 registers on the Average out of these 40-50. So they could save just 5-7 instead of me saving 40-50. Here I say on-Average, meaning that I need averagely good memory saving in this place. I understand that some functions may use all 50 registers, but for me is enough if on average only 5-7 will be saved to stack. And actually I need both - saving memory and saving speed, pushing less registers will definitely save not only memory but time. – Arty Dec 02 '21 at 18:27
  • @ThomasMatthews Put a look at the end of my Question, just now I updated it with some experimental code, where I try to use asm-clobber of registers to figure out what registers (according to C++ convention) are Really necessary to be saved and what are not. – Arty Dec 02 '21 at 18:30
  • 1
    The FP and SIMD registers already are call-clobbered in x86-64 System V. (Windows x64 has some call-preserved XMM regs, but not the high halves of any YMM or ZMM.) MXCSR / x87 control word are normally assumed to still have the same rounding mode and exception masks, and EFLAGS is assumed to have DF=0 on call / ret, but other than that they're also call-clobbered. – Peter Cordes Dec 03 '21 at 06:49
  • 1
    And BTW, modern compilers won't literally save/restore regs around calls; it will spill if it couldn't find a call-preserved reg for a value but usually not reload into the same reg. That's why [terms like "call clobbered" are much clearer than confusing nonsense like "caller-saved"](https://stackoverflow.com/questions/9268586/what-are-callee-and-caller-saved-registers/56178078#56178078) – Peter Cordes Dec 03 '21 at 06:51
  • @PeterCordes Maybe there exist some std library C/C++ function that saves everything what needs to be preserved on stack? Because there are dozens of different CPUs (Intel, ARM, MIPS...) and all of them have their own registers. So would be great to have some std function like `save_all_regs_on_stack_needed_by_current_cpu_convention()` and same function for restoring from stack. In other words some ready made function that takes into consideration current C++ standard conventions and current CPU model and saves only registers (and flags) needed by current C++ and current CPU and nothing else. – Arty Dec 03 '21 at 09:27
  • @PeterCordes One small off-topic - can you tell how to convert (expand) 8/16 bit run-time mask like `0b.....101` to bytes like `.... 11111111 00000000 11111111`. I need SIMD, both sse/sse2/avx/avx2 solutions are needed. And both 1bit->8bit and 1bit->16bit solutions. In other words expand each bit of mask to either 8 equal bits or 16 equal bits, both 8/16 bit expansions are needed. If possible to solve, mask should be provided at run time, if not possible then at least compile time mask value is alright. – Arty Dec 03 '21 at 09:33
  • 1
    Huh? A standard library function like that makes no sense. It's a function so it would itself follow the calling convention. It's not something that would make sense to call in C++ source code. I think some compilers e.g. for microcontrollers might have an option to call a helper block of code (not a real function) to save or restore all the call-preserved regs (instead of emitting multiple instructions to save them in each function that needs some/all to save a little bit of code size), but most calling conventions for most ISAs don't have a huge number of call-preserved regs. – Peter Cordes Dec 03 '21 at 10:06
  • 1
    Re: bit-expansion: [is there an inverse instruction to the movemask instruction in intel avx2?](https://stackoverflow.com/q/36488675) has various links to different versions of element size. – Peter Cordes Dec 03 '21 at 10:07
  • @PeterCordes Am I correct that for current C++ and Intel x64 CPUs inside my NON-inline function I have to preserve only `RBX, RBP, R12-R15` and nothing else? Not any single XMM/YMM/ZMM or FPU registers should be saved? Also no CPU flags should be saved? If I'm not fully correct then can you tell what else should be saved besides those mentioned registers, according to current C++ and Intel x64 CPU conventions? – Arty Dec 03 '21 at 10:24
  • 1
    That's correct for x86-64 System V (non-Windows). Windows x64 also has call-preserved RDI and RSI, and some XMM regs. Note that neither the C++ committee nor Intel or AMD has anything to do with defining those calling conventions; they're *software* conventions. Microsoft designed the Windows x64 convention; GCC devs designed x86-64 SysV. [Why does Windows64 use a different calling convention from all other OSes on x86-64?](https://stackoverflow.com/a/35619528) (Many other ISAs only have a single widely-used calling convention, often published by the vendor. Not so for x86.) – Peter Cordes Dec 03 '21 at 10:58
  • @PeterCordes Through my current Question I wanted to find special attribute like `__attribute__((modifies_all_registers)) void f();` that automatically enforces C++ compiler to apply all necessary calling conventions that is specific to current OS/CPU/Compiler_version/C++_standard. To avoid manually figuring out all dozens of possible conventions combinations. This attribute could tell compiler "automatically do whatever is correct for your current implementation on this OS/CPU, so that I don't care about this myself". – Arty Dec 03 '21 at 11:47
  • 1
    AFAIK, there isn't such an option; there are options like `-fcall-used-rbx` that modify the ABI, but I don't know if you can apply them on a per-function basis with anything like `__attribute__((target("-fcall-used-rbx")))`. Probably not. You can use GCC's x86 attribute `__attribute__((sysv_abi))` to always use that calling convention for a function even if you're on Windows. Hand-written asm is always specific to an ISA. – Peter Cordes Dec 03 '21 at 11:53

1 Answers1

4

Does it mean that there is some C++ calling convention that says that I can modify any other registers besides these few, without restoring them on function return?

Yes. Among other things, ABI specification that is used on a given platform defines function calling conventions. Calling conventions define a set of registers that are allowed to be clobbered by the function and a set of registers that are required to be preserved by the function. If registers of the former set contain useful data for the caller, the caller is expected to save that data before the call. If registers from the latter set have to be used in the called function, the function must save and restore these registers before returning.

There are also conventions regarding which registers, in what order, are used to pass arguments to the function and to receive the returned value. You can consider those registers as clobbered, since the caller must initialize them with parameter values (and thus save any useful data that was in those registers before the call) and the callee is allowed to modify them.

In your case, the asm statement marks all registers as clobbered, and the compiler only saves and restores registers that it is required to preserve across the function call. Note that by default the caller will always save the registers from the clobber set before a function call, whether they are actually modified by the function or not. In some cases, the optimizer may be able to remove saving the registers that are not actually modified - for example, if the function call is inlined or the compiler is able to analyze the function body (e.g. in case of LTO). However, if the function body is not known at compile time, the compiler must assume the worst and adhere the ABI specification.

So, in general, you do not need to mark the function in any special way - the ABI rules already work in such a way that registers are saved and restored as needed. And, as you witnessed yourself, even with asm statements the compilers are able to tell which registers are used in a function. If you still want to save specific, or all, registers for some reason, your only option is to write in assembler. Or, in case if you're implementing some sort of context switching, use specialized instructions like XSAVE/XRSTOR or APIs like ucontext.

Andrey Semashev
  • 10,046
  • 1
  • 17
  • 27