GCC Extended assembly pin local variable to any register except r12

Question

Basically I am looking for a way that I pin a temporary to any register except r12.

I know I can "hint" the compiler to pin to a single register with:

// Toy example. Obviously an unbalanced `pop` in 
// extended assembly will cause serious problems.

register long tmp asm("rdi"); // or just clober rdi and use it directly.
asm volatile("pop %[tmp]\n"   // using pop hence don't want r12
             : [tmp] "=&r" (tmp)
             :
             :);

and this will generally work as in avoiding r12 but might mess up the compilers register allocation elsewhere.

Is it possible to do this without forcing the compiler to use a single register?

Is your `asm` just an example, or what you really want? Popping the stack inside inline asm is going to royally screw up the generated code. — Nate Eldredge, Apr 27 '21 at 22:04
Oh, are you trying to avoid r12 because of https://stackoverflow.com/questions/64791977/why-is-pop-slow-when-using-register-r12? — Nate Eldredge, Apr 27 '21 at 22:05
Its just an example. Trying to make a small snippet to disable the LSD in tight loops. Realistically will save before loop and restore with `movq` after the pop. I.e `#define NO_LSD_RD(tmp, r) "pop " #tmp "\nmovq " #r ", %%rsp\n"` and `#define NO_LSD_WR(tmp, r) "push " #tmp "\nmovq " #r ", %%rsp\n` where `tmp` needs to not be `r12` — Noah, Apr 27 '21 at 22:07
But won't everything break if the compiler decides to access some local variable with rsp-relative addressing inside your loop? — Nate Eldredge, Apr 27 '21 at 22:11
Anyhow, one partial solution would be to use `"=&abcdSD"` as the constraint, so the compiler can pick from any of `rax,rbx,rcx,rdx,rsi,rdi`; that should give the allocator a lot more flexibility. Another would be to add `r12` to your clobber list; that will ensure the compiler doesn't use it for operands, though if it was going to use r12 for something else it will have to spill. — Nate Eldredge, Apr 27 '21 at 22:13
@NateEldredge its obviously dangerous. In my case `r` will restore `rsp` so along as I set `r` at a stable place it will be fine because the compiler won't inject any code between the `pop` and restore.. Not really meant for production code but as a tool for benchmarking. — Noah, Apr 27 '21 at 22:14
Okay, fair. You do also know about the red zone, right, and that a random `push` may overwrite data? — Nate Eldredge, Apr 27 '21 at 22:17
@NateEldredge Hadnt taken that into account thanks for mentioning it. Assuming the loop has no stack ops would it be safe to just save / restore with a `movq (%rsp), r0; ; movq r0, (%rsp)`? But generally expect to use the `pop` version more unless I'm benchmarking something explicitly bottlenecked on p23. Maybe I need a seperate question for best way to disable the LSD. — Noah, Apr 27 '21 at 22:27
See my answer: https://stackoverflow.com/questions/35630949/keep-target-address-of-load-in-register-until-instruction-is-retired/35694557#35694557 for how to mangle the assembler to preserve regs transparently. Also, to reserve a particular register, `gcc` has `-ffixed-reg` — Craig Estey, Apr 27 '21 at 23:15
@Noah: Yeah, probably worth a separate question for the X of your XY problem. I'm not familiar with the mechanism you seem to have in mind. — Nate Eldredge, Apr 28 '21 at 01:18

score 3 · Answer 1 · answered Apr 28 '21 at 01:17

Note that register asm doesn't truly "pin" a variable to a register, it only ensures that uses of that variable as an operand in inline asm will use that register. In principle the variable may be stored elsewhere in between. See https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Local-Register-Variables.html#Local-Register-Variables. But it sounds like all you really need is to ensure that your pop instruction doesn't use r12 as its operand, possibly because of Why is POP slow when using register R12?. I'm not aware of any way to do precisely this, but here are some options that may help.

The registers rax, rbx, rcx, rdx, rsi, rdi each have their own constraint letters, a,b,c,d,S,D respectively (the other registers don't). So you can get about halfway there by doing

long tmp;
asm volatile("pop %[tmp]\n"
             : [tmp] "=&abcdSD" (tmp)
             :
             :);

This way the compiler has the option to choose any of those six registers, which should give the register allocator a lot more flexibility.

Another option is to declare that your asm clobbers r12, which will prevent the compiler from allocating operands there:

long tmp;
asm volatile("pop %[tmp]\n"
             : [tmp] "=&r" (tmp)
             :
             : "r12");

The tradeoff is that it will also not use r12 to cache local variables across the asm, since it assumes that it may be modified. Hopefully it will be smart enough to just avoid using r12 in that part of the code at all, but if it can't, it may emit extra register moves or spill to the stack around your asm. Still, it's less brutal than -ffixed-r12 which would prevent the compiler from using r12 anywhere in the entire source file.

Future readers should note that in general it is unsafe to modify the stack pointer inside inline asm on x86-64. The compiler assumes that rsp isn't changed by inline asm, and it may access stack variables via effective addresses with constant offsets relative to rsp, at any time. Moreover, x86-64 uses a red zone, so even a push/pop pair is unsafe, because there may be important data stored below rsp. (And an unexpected pop may mean that other important data is no longer in the red zone and thus subject to overwriting by signal handlers.) So, you shouldn't do this unless you're willing to carefully read the generated assembly after every recompilation to make sure the compiler hasn't decided to do any of these things. (And before you ask, you cannot fix this by declaring a clobber of rsp; that's not supported.)

In some cases, GCC does go beyond the current docs and use extra instructions to keep a register ... asm local var in the specified register. (It used to be documented that way, but docs changed to unsupport that use.) But https://godbolt.org/z/h1cv44G7q shows that even older GCC does *not* copy a value derived from an arg to the specified register, in a function with no `asm` statement. I had thought it would; I think I've seen in some cases having an effect from unsupported `register ... asm()` locals. e.g. for *reading* an incoming register with an uninitialized register-asm local. — Peter Cordes, Apr 28 '21 at 01:35
An `"R"` constraint gives GCC the choice of any of the 8 "legacy" i386 registers (not R8-R15). `"U"` is any call-clobbered integer register. So `"RU"` could give it the choice of RAX-RDI (including RSP and RBP if GCC wanted to choose them) plus R8-R11. https://godbolt.org/z/eaPY1ocsj. (clang12 doesn't support `"U"`) — Peter Cordes, Apr 28 '21 at 01:38
@PeterCordes so would `"abcdSDU"` be `rax, rbx, rcx, rdx, rsi, rdi, r8 - r11`? — Noah, Apr 28 '21 at 03:39
@Noah: Yes, it would. GCC's never actually going to pick RSP, though, so no real harm in using "RU", unless you want to exclude RBP for some reason. — Peter Cordes, Apr 28 '21 at 03:41
Regarding safety of stack managements in assembly. Do you have any resources on how to safely use the stack in an function written in assembly in a .S linked with/called from C? Or is there no concern there and normal rules apply? — Noah, Jun 12 '21 at 16:39
@Noah: If it's a separate function written in pure assembly, normal rules apply. The red zone is only available between function calls, so whatever function called yours won't be using one. (In fact most compilers only use the red zone in leaf functions that make no calls at all.) You can push what you like, and even use a red zone of your own if you are not making further calls. — Nate Eldredge, Jun 12 '21 at 16:53

GCC Extended assembly pin local variable to any register except r12

1 Answers1