1

In 32 bit ARM assembly are several instructions available for atomically loading and storing a pair of registers:

  • ldaexd and stlexd (for ARMv8 32 bit with acquire-release memory order) [https://developer.arm.com/documentation/dui0802/b/A32-and-T32-Instructions/LDAEX-and-STLEX ]
  • ldrexd and strexd (for ARMv7 without included barriers) [https://developer.arm.com/documentation/dui0802/b/A32-and-T32-Instructions/LDREX-and-STREX ]

These 32 bit instructions have some requirements for choosing the transfer register pair (Rt and Rt2):

  • "Rt must be an even numbered register, and not LR"
  • "Rt2 must be R(t+1)"

I have included some example GCC inline assembly code (for C/C++, the described problem below is the same for all 4 instructions). This code does not fulfill the required register numbering.

inline static void atomic_exclusive_load_pair_aquire(uint32_t atomic[2], uint32_t target[2])
{
    asm volatile("ldaexd %0, %1, [%2]"  // load-acquire exclusive register pair
                 : "=r"(target[0]),     // first transfer register
                   "=r"(target[1])      // second transfer register
                 : "r"(&atomic[0])      // atomic base register
                 : "memory");           // "memory" acts as compiler r/w barrier
}

I would expect that GCC arm inline assembly constraints somehow might be able to describe depending register pairs for automatic register mapping, if this is required by single instructions.

My question is, how can the requirements for the two transfer registers be described as GCC inline assembly constraints to automatically choose the correct register numbers? Is this possible at all? May using "multiple alternative constraints" be a possible solution ([https://gcc.gnu.org/onlinedocs/gcc/Multi-Alternative.html ])?

Solution:

As amonakov and others wrote, the solution is to use uint64_t as transfer type, which uses a register pair on ARM 32 bit. Depending on Thumb is disabled, the register pair will be an even/odd pair. There are also more or less undocumented inline assembler constraints for accessing the pair registers.

inline static void atomic_exclusive_load_pair_aquire(uint32_t atomic[2], uint32_t transfer[2])
{
    uint64_t pair;
    asm volatile("ldaexd %Q[pair], %R[pair], [%[addr]]"  // load-acquire exclusive register pair
                 : [pair] "=r"(pair)       // transfer register pair
                 : [addr] "r"(&atomic[0])  // atomic base register
                 :        "memory");       // "memory" acts as compiler r/w barrier

    transfer[0] = static_cast<uint32_t>(pair);
    transfer[1] = static_cast<uint32_t>(pair >> 32);
}

Please see a full solution with assembly code on godbolt.

artless noise
  • 21,212
  • 6
  • 68
  • 105
  • 1
    I don't know the exact answer, but I expect the intended way to do this is one `uint64_t` operand, with a constraint that requires it to be in a suitable pair. And modifiers like `%A0` / `%B0` to expand to halves of that register-pair. Maybe not actually `A` and `B` though. I think GCC has features for that on at least some ISAs, probably including ARM. – Peter Cordes May 01 '23 at 13:34
  • Please re-read the documentation with respect to the purpose of exclusive access. – old_timer May 01 '23 at 14:26
  • 2
    Yes, `LDRD` is not the exact same as `LDREXD`, but the concept of getting gcc inline specifiers to find a register set than conform to the required ordering is the same. – artless noise May 01 '23 at 17:11
  • You are right, the basic problem is the same as in the dupicate question. However, as I have remarked already below, it seems that both constraint variants (%0, %H0 and %Q0, %R0) do not fulfil the requirement that the first register must be even numbered. Please see line 7 in the Compiler Explorer output: https://godbolt.org/z/nK46WhroT (and I think, this problem basically also exists for LDRD in the duplicate question). – S. Gleissner May 02 '23 at 16:56
  • Instead of a `"memory"` clobber, you can just use a dummy memory input to tell it that 2 elements of memory pointed-to by `atomic` are a memory-input to the asm: `"m" (*(const uint32_t (*)[2]) atomic )` See *[How can I indicate that the memory \*pointed\* to by an inline ASM argument may be used?](https://stackoverflow.com/q/56432259)*. Oh, but you *want* to block compile-time reordering wrt. other operations since you want acquire semantics, so yes, `"memory"` clobber. – Peter Cordes May 03 '23 at 17:35
  • If you didn't need this to be the read side of an LL/SC, you could just be using `__atomic_load_n(atomic, __ATOMIC_ACQUIRE)`, as in https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html – Peter Cordes May 03 '23 at 17:36
  • Exactly, the "memory" clobber shall act as a compiler barrier, which protects from reordering instructions. The acquire behaviour of ldaexd does the same for cpu-internal reordering (pipeline, out-of-order-reordering, cache behaviour, etc.). I want this for acquire-release semantics with full control of the inside between load & store. So the builtin atomics are not sufficient, as they does not give full control about the status return of stlexd. – S. Gleissner May 04 '23 at 19:56
  • Both answers are using `uint64_t`, whereas you are using `uint32_t array[2]`. The compiler only lays out the register correctly for a `uint64_t` type. You can not uses these specifiers for a `uint32_t` array. This is an important aspect of the answers. You maybe able to duplicate with a union. Compiling with `-marm` makes the stack aligned, which is another requirement. The layout for a `uint32_t` does not need to be 64bit aligned. – artless noise May 08 '23 at 17:09

1 Answers1

3

For such under-documented or even undocumented things you can "peek under the hood" and see how GCC internally describes these instructions in config/arm/sync.md.

It turns out, binding a DImode (64-bit) operand is sufficient to get an even-odd register pair. In C, you can bind an uint64_t variable and use the H modifier to spell out the second register (the modifiers for Arm are not documented on GCC side, but LLVM documents them):

uint64_t f(uint64_t *p)
{
    uint64_t r;
    asm volatile("ldaexd %0, %H0, [%1]"
                 : "=r"(r)
                 : "r"(p)
                 : "memory");
    return r;
}
amonakov
  • 2,324
  • 11
  • 23
  • I'm sorry, I have checked that with Godbolt and it seems that the %x, %Hx constraints (also the %Qx and %Rx variant, which has been mentioned for the duplicate question) do not fulfill the requirement, that the first register must be even numbered and the second one odd numbered, ( https://godbolt.org/z/nK46WhroT , Line number 7: ldaexd r3, r4, [r1] ). Also for the ldrd/strd instructions, for which the duplicate has been answered, this requirement exists due to the ARM documentation. So also here the inline asm contraints seem not to be fully sufficient. – S. Gleissner May 02 '23 at 16:48
  • 2
    @S.Gleissner that compiler is configured with `--with-mode=thumb`, and in Thumb mode using the even-numbered register is not required; if you add `-marm` to gcc command line, it allocates an even-odd pair – amonakov May 02 '23 at 17:10
  • 2
    ... but there's no downside to enabling Thumb2, is there? Also note that your use of reinterpret_cast leads to incorrect code (aliasing). – amonakov May 02 '23 at 17:14
  • 1
    Yes, of course you are right. As soon as I use '-marm', the registers are correct. I have no problems with Thumb/Thumb2, I just wanted to check the generated compiler result. I will also add a clarification to my question with beautified code. Many thanks! – S. Gleissner May 03 '23 at 13:43