In 32 bit ARM assembly are several instructions available for atomically loading and storing a pair of registers:
- ldaexd and stlexd (for ARMv8 32 bit with acquire-release memory order) [https://developer.arm.com/documentation/dui0802/b/A32-and-T32-Instructions/LDAEX-and-STLEX ]
- ldrexd and strexd (for ARMv7 without included barriers) [https://developer.arm.com/documentation/dui0802/b/A32-and-T32-Instructions/LDREX-and-STREX ]
These 32 bit instructions have some requirements for choosing the transfer register pair (Rt and Rt2):
- "Rt must be an even numbered register, and not LR"
- "Rt2 must be R(t+1)"
I have included some example GCC inline assembly code (for C/C++, the described problem below is the same for all 4 instructions). This code does not fulfill the required register numbering.
inline static void atomic_exclusive_load_pair_aquire(uint32_t atomic[2], uint32_t target[2])
{
asm volatile("ldaexd %0, %1, [%2]" // load-acquire exclusive register pair
: "=r"(target[0]), // first transfer register
"=r"(target[1]) // second transfer register
: "r"(&atomic[0]) // atomic base register
: "memory"); // "memory" acts as compiler r/w barrier
}
I would expect that GCC arm inline assembly constraints somehow might be able to describe depending register pairs for automatic register mapping, if this is required by single instructions.
My question is, how can the requirements for the two transfer registers be described as GCC inline assembly constraints to automatically choose the correct register numbers? Is this possible at all? May using "multiple alternative constraints" be a possible solution ([https://gcc.gnu.org/onlinedocs/gcc/Multi-Alternative.html ])?
Solution:
As amonakov and others wrote, the solution is to use uint64_t as transfer type, which uses a register pair on ARM 32 bit. Depending on Thumb is disabled, the register pair will be an even/odd pair. There are also more or less undocumented inline assembler constraints for accessing the pair registers.
inline static void atomic_exclusive_load_pair_aquire(uint32_t atomic[2], uint32_t transfer[2])
{
uint64_t pair;
asm volatile("ldaexd %Q[pair], %R[pair], [%[addr]]" // load-acquire exclusive register pair
: [pair] "=r"(pair) // transfer register pair
: [addr] "r"(&atomic[0]) // atomic base register
: "memory"); // "memory" acts as compiler r/w barrier
transfer[0] = static_cast<uint32_t>(pair);
transfer[1] = static_cast<uint32_t>(pair >> 32);
}
Please see a full solution with assembly code on godbolt.