why callees don't use caller saved registers first?

Question

We know that by x86-64 convention, registers %rbx, %rbp, and %r12–%r15 are classified as callee-saved registers. While %r10 and %r11 are caller-saved registers. but when I compile C code in most of case, e.g. function P calls Q, I see following assembly code for function Q:

Q:
   push %rbx
   movq %rdx, %rbx
   ...
   popq %rbx
   ret

We know that since %rbx is a callee-saved register, we must store it on stack and restore it for the caller P later.

but wouldn't it be more concise and save stack operations by using a caller saved register %r10 as:

Q:
   movq %rdx, %r10
   ...
   ret

so callee doesn't need to worry about save and restore the register for the caller, because the caller had already pushed it to stack before calling the callee?

it's not the architecture that dictates who and what registers are saved. It's the calling convention. — bolov, Jul 30 '20 at 01:53
please provide a complete example including the high level language and the compiled output since the code that is generated is related to the whole function. — old_timer, Jul 30 '20 at 02:25

Peter Cordes · Accepted Answer · 2020-07-30T03:00:50.747

5

You seem to be mixed up about what "caller-saved" means. I think this bad choice of terminology has fooled you into thinking that compilers actually will save them in the caller around function calls. That would be slower usually (Why do compilers insist on using a callee-saved register here?), especially in a function that makes more than one call, or calls in a loop.

Better terminology is call-clobbered vs. call-preserved, which reflects how compilers actually use them, and how humans should think about them: registers that die on a function call, or that don't. Compilers don't push/pop a call-clobbered (aka caller-saved) register around each call.

But if you were going to push/pop a value around a single function call, you'd just do that with %rdx. Copying it to R10 would just be a waste of instructions. So mov %r10 is useless. With a later push it's just inefficient, without it's incorrect.

The reason for copying to a call-preserved register is so the function arg will survive a function call that the function makes later. Obviously you have to use a call-preserved register for that; call-clobbered registers don't survive function calls.

When a call-preserved register isn't needed, yes compilers do pick call-clobbered registers.

If you expand your example to an actual MCVE instead of just showing the asm without source, this should be clearer. If you write a leaf function that needs a mov to evaluate an expression, or a non-leaf that doesn't need any of its args after the first function-call, you won't see it wasting instructions saving and using a call-preserved reg. e.g.

int foo(int a) {
    return (a>>2) + (a>>3) + (a>>4);
}

https://godbolt.org/z/ceM4dP with GCC and clang -O3:

# gcc10.2
foo(int):
        mov     eax, edi
        mov     edx, edi      # using EDX, a call-clobbered register
        sar     edi, 4
        sar     eax, 2
        sar     edx, 3
        add     eax, edx
        add     eax, edi
        ret

Right shift can't be done with LEA to copy-and-operate, and shifting the same input 3 different ways convinces GCC to use mov to copy the input. (Instead of doing a chain of right-shifts: compilers love to minimize latency at the expense of more instructions because that's often best for wide OoO exec.)

edited Jul 30 '20 at 03:00

answered Jul 30 '20 at 01:56

Peter Cordes

328,167
45
605
847

so why the compiler does't pick `%r10` in my case? – Jul 30 '20 at 02:12
3

@amjad: Because your `...` contains a function call and the whole point is the preserve the value across that function call. – R.. GitHub STOP HELPING ICE Jul 30 '20 at 02:16
@amjad: Like I said, post an actual [mcve] if you want a specific explanation of it. And think about how you could implement that specific C function by hand / how the compiler chose to do so. – Peter Cordes Jul 30 '20 at 02:37
@R..GitHubSTOPHELPINGICE even my `...` contains a function call , then we can still use `%r10` and push it to the stack before calling the function, it will be the same as using `%rbx`, isn't it? – Jul 30 '20 at 02:43
@amjad: Look at the order the compiler's instructions are in. It doesn't put the function arg on the stack; if you wanted to do that you'd just `push %rdx`, not waste an instruction copying it to R10 first! The compiler is storing/reloading the caller's register, keeping its own arg variable in registers. (This is sometimes less efficient, especially if you expect that the function you call will do the same thing. [Why do compilers insist on using a callee-saved register here?](https://stackoverflow.com/q/61375336)) – Peter Cordes Jul 30 '20 at 02:47
@amjad: I think I figured out that the term "caller-saved" misled you into thinking the caller actually *would* save it. That terminology can be misleading (like it was for you), as well as hard to think about. Don't use it. I updated my answer. – Peter Cordes Jul 30 '20 at 03:02
@amjad: For a single call, it might be the same; it's certainly not better though. If the general case where there may be multiple calls, you'd have to restore from the stack after each one rather than just using the register, so compilers just do what works for the general case with no additional cost. – R.. GitHub STOP HELPING ICE Jul 30 '20 at 03:02
2

I agree with Peter that "caller-saved" is really bad terminology. The correct terms are *call-saved* (aka callee-saved) and *call-clobbered* and these names simply describe the ABI contract around calls rather tha operations the caller or callee must do. – R.. GitHub STOP HELPING ICE Jul 30 '20 at 03:03

why callees don't use caller saved registers first?

1 Answers1

Linked