2

I've written a small function with C-code and a short inline assembly statement.
Inside the inline assembly statement I need 2 "temporary" registers to load and compare some memory values.
To allow the compiler to choose "optimal temporary registers" I would like to avoid hard-coding those temp registers (and putting them into the clobber list). Instead I decided to create 2 local variables in the surrounding C-function just for this purpose. I used "=r" to add these local variables to the output operands specification of the inline asm statement and then used them for my load/compare purposes.
These local variables are not used elsewhere in the C-function and (maybe because of this fact) the compiler decided to assign the same register to the two related output operands which makes my code unusable (comparison is always true).

Is the compiler allowed to use overlapping registers for different output operands or is this a compiler bug (I tend to rate this as a bug)?
I only found information regarding early clobbers which prevent overlapping of register for inputs and outputs... but no statement for just output operands.

A workaround is to initialize my temporary variables and to use "+r" instead of "=r" for them in the output operand specification. But in this case the compiler emits initialization instructions which I would like to avoid.
Is there any clean way to let the compiler choose optimal registers that do not overlap each other just for "internal inline assembly usage"?

Thank you very much!

P.S.: I code for some "exotic" target using a "non-GNU" compiler that supports "GNU inline assembly".
P.P.S.: I also don't understand in the example below why the compiler doesn't generate code for "int eq=0;" (e.g. 'mov d2, 0'). Maybe I totally misunderstood the "=" constraint modifier?

Totally useless and stupid example below just to illustrate (focus on) the problem:

int foo(const int *s1, const int *s2)
{
    int eq = 0;
#ifdef WORKAROUND
    int t1=0, t2=1;
#else
    int t1, t2;
#endif

    __asm__ volatile(
        "ld.w  %[t1], [%[s1]]   \n\t"
        "ld.w  %[t2], [%[s2]]   \n\t"
        "jne   %[t1], %[t2], 1f \n\t"
        "mov   %[eq], 1         \n\t" 
        "1:"
        : [eq] "=d" (eq),
          [s1] "+a" (s1), [s2] "+a" (s2),
#ifdef WORKAROUND
          [t1] "+d" (t1), [t2] "+d" (t2)
#else
          [t1] "=d" (t1), [t2] "=d" (t2)
#endif
    );

    return eq;
}

In the created asm the compiler used register 'd8' for both operands 't1' and 't2':

foo:
    ; 'mov d2, 0' is missing
    ld.w  d8, [a4]  ; 'd8' allocated for 't1'
    ld.w  d8, [a5]  ; 'd8' allocated for 't2' too!
    jne   d8, d8, 1f 
    mov   d2, 1         
1:
    ret16

Compiling w/ '-DWORKAROUND':

foo:
    ; 'mov d2, 0' is missing
    mov16 d9,1
    mov16 d8,0

    ld.w  d9, [a5]   
    jne   d8, d9, 1f 
    mov   d2, 1         
1:
    ret16

EABI for this machine:

  • return register (non-pointer/pointer): d2, a2
  • non-pointer args: d4..d7
  • pointer args: a4..a7
quicmic
  • 33
  • 4
  • Please edit your question and post the function with your inline asm in a code block here – Craig Estey Apr 11 '22 at 17:39
  • 1
    Which actual compiler? I think it's unlikely that this is a problem, especially with early clobbers, but I could imagine a compiler choosing overlapping outputs if one or both of the C vars are unused outside of the asm. (I don't think GCC or clang does that, and I'd *hope* other compilers wouldn't, but it seems barely plausible.) – Peter Cordes Apr 11 '22 at 17:53
  • Have you experimented with [early clobbers](https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html)? Something like `=&d` for t1 & t2. – David Wohlferd Apr 11 '22 at 19:00
  • Inline assembly is an extension to C. The implementation (that is, GCC in this case) determines all the semantics, including what is and is not allowed. – John Bollinger Apr 11 '22 at 19:05
  • Yes I also tried early clobbers w/o any effect. In my understanding early clobbers just prevent overlapping of input and output operands if some outputs are written before all inputs are read. – quicmic Apr 11 '22 at 19:07
  • Re “I also don't understand in the example below why the compiler doesn't generate code for "int eq=0;"”: There is no need for it. `eq` is never used before being provided to the `asm` for output. That is, the value `eq` is initialized to is never used. So there is no need to store that value or even put it into a register; the observable behavior of the program is the same whether `eq` is initialized to zero or not. So the optimizer in the compiler removed the initialization. – Eric Postpischil Apr 11 '22 at 19:13
  • @JohnBollinger: Yes, we are all aware inline assembly is an extension to the C standard, and that is the context we are discussing it in. Nobody here is discussing how the rules of the C standard apply to it. – Eric Postpischil Apr 11 '22 at 19:14
  • Perhaps my view here is too simplistic, but are you sure this is a problem worth solving? By wrapping your assembly in a C function, you are submitting to the system's ABI for function calls. That defines which registers are caller-saved and which callee-saved. Since your function does nothing but wrap the inline assembly, it doesn't seem like there would be anything to gain by letting the compiler choose registers, versus just selecting two caller-saved registers. – John Bollinger Apr 11 '22 at 19:15
  • Are we, @EricPostpischil? The OP asks "Is the compiler allowed ...?". The answer is "yes", because the compiler *defines* what is allowed. – John Bollinger Apr 11 '22 at 19:22
  • @JohnBollinger: No, the compiler does not define what is allowed. The compiler’s specification defines what is allowed. GCC has documentation, as well as discussion among its developers about how it should behave, and OP’s compiler has its own specification as well (which may involve deferring to GCC documentation in part). – Eric Postpischil Apr 11 '22 at 19:24
  • @EricPostpischil: d2 is the return register for this machine. It is "observable". – quicmic Apr 11 '22 at 19:24
  • @John Bollinger: This is just a totally stupid function only for illustration purposes. – quicmic Apr 11 '22 at 19:26
  • 2
    @quicmic: That is not what the C standard means by “observable behavior.” For the purposes of optimization, the program does not behave any differently if d2 is set to zero before the inline assembly code is executed: It does not change access to volatile objects by the program, it does not change data written to files by the program, and it does not change the input/output dynamics of interactive dynamics by the program. Those are the observable behaviors of a program per C 2018 5.1.2.3 6. – Eric Postpischil Apr 11 '22 at 19:26
  • @Eric Postpischil: Thank you. What I wanted to do here is some kind of a "conditional" (over)write of 'eq'. I never read 'eq' in the inline asm but I think I have to use '+' instead of '=' to make this working!? – quicmic Apr 11 '22 at 19:29
  • You also forgot to tell the compiler that the pointed-to memory is an input to the asm statement, so changes to it can't be optimized away, it has to be in sync with the C abstract machine. [How can I indicate that the memory \*pointed\* to by an inline ASM argument may be used?](https://stackoverflow.com/q/56432259) You could just use `"m"` inputs and let the compiler pick an addressing mode. (Or of course `"r"` inputs and get it to load, but you're artificially not doing that so you need tmp regs to create an example.) – Peter Cordes Apr 12 '22 at 03:19
  • I'd suggest fixing your example to remove those bugs unrelated to making sure `[t1] "=d" (t1), [t2] "=d" (t2)` pick different registers (from each other and everything else). e.g. if your ISA has any kind of conditional-set instruction like MIPS `slt` or AArch64 `cset`, or x86 `setcc`, use that instead of leaving the `%[eq]` register unmodified in one path through the function. Anyway, that stuff is covered by existing Q&As which should be linked from https://stackoverflow.com/tags/inline-assembly/info – Peter Cordes Apr 12 '22 at 03:23
  • You still haven't said what actual compiler you're using. We can answer about what GCC documents and what GCC/clang actually do, but without knowing what compiler you use, you won't know if maybe it handles GNU C inline asm syntax with different quirks. – Peter Cordes Apr 12 '22 at 03:24
  • @Peter Cordes: Thank you! I don't change memory in my inline asm - because of that I think there is no need to use 'm' or a "memory" clobber!? We are using a commercial compiler that has support for "GNU inline" assembly. So I always read the GNU documentation. I would like to know how to "fix my example". What exactly is wrong with it - where are the mistakes? The real code is much more complex. I decided to create exactly those small (and very stupid) example to just focus on the problem. – quicmic Apr 12 '22 at 04:28
  • @Peter Cordes: My main problem is that I cant find in-depth specification for "GNU inline asm". There is a description of the early clobbers that can handle overlapping of inputs and outputs in some cases. But I didn't find statements regarding the overlapping of outputs - if this is "legal" or not. – quicmic Apr 12 '22 at 04:28
  • 2
    It sounds like your exact case is mentioned in the docs "Rather than allocating fixed registers via clobbers ...": https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers-and-Scratch-Registers – o11c Apr 12 '22 at 04:44
  • @o11c: Good find, that definitely implies it's safe in GCC and anything that's *actually* compatible with it. You could post that as an answer. I don't think there's a more explicit statement; it's already necessary for outputs to be non-overlapping (regardless of early-clobber) for surrounding code to actually be able to read them separately. So I think the GCC manual authors thought this was so obvious they forgot to mention it. – Peter Cordes Apr 12 '22 at 04:49

1 Answers1

3

I think this is a bug in your compiler.

If it says it supports "GNU inline assembly" then one would expect it to follow GCC, whose manual is the closest thing there is to a formal specification. Now the GCC manual doesn't seem to explicitly say "output operands will not share registers with each other", but as o11c mentions, they do suggest using output operands for scratch registers, and that wouldn't work if they could share registers.

A workaround that might be more efficient than yours would be to follow your inline asm with a second dummy asm statement that "uses" both the outputs. Hopefully this will convince the compiler that they are potentially different values and therefore need separate registers:

    int t1, t2;
    __asm__ volatile(" ... code ..."
          : [t1] "=d" (t1), [t2] "=d" (t2) : ...);
    __asm__ volatile("" // no code
          : : "r" (t1), "r" (t2));

With luck this will avoid any extra code being generated for unnecessary initialization, etc.

Another possibility would be to hardcode specific scratch registers and declare them as clobbered. It leaves less flexibility for the register allocator, but depending on the surrounding code and how smart the compiler is, it may not make a lot of difference.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82