Do atomics in C++11 cache repeatable reads in a register or are they only atomic?

Question

Do atomics cache repeatable reads in a register ? Or are they just only atomic, i.e. a read may not be split in multiple parts ?

MSVC++, clang++ / clang-cl and g++ don't cache atomic reads without memory ordering:

#include <atomic>

using namespace std;

int x( atomic_int const &ai )
{
    int
        a = ai.load( memory_order_relaxed ),
        b = ai.load( memory_order_relaxed );
    return a + b;
}

g++:

movl    (%rdi), %edx
movl    (%rdi), %eax
addl    %edx, %eax
ret

clang-cl:

mov eax, dword ptr [rcx]
add eax, dword ptr [rcx]
ret

cl:

mov edx, DWORD PTR [rcx]
mov eax, DWORD PTR [rcx]
add eax, edx
ret 0`

The point of an atomic-read, is to read a value that might have been updated by another thread. Omitting the second read would defeat that use. — Richard Critten, May 27 '23 at 17:33
@RichardCritten actually, in this situation it is permitted to optimize out the second load. But compilers tend to not optimize atomic operations. — ALX23z, May 27 '23 at 18:36
Check out this video. It might answer your question https://youtu.be/IB57wIf9W1k — ALX23z, May 27 '23 at 18:45
@ALX23z: It's a good idea to include titles of videos when linking them, e.g. that one is *CppCon 2016: JF Bastien “No Sane Compiler Would Optimize Atomics"*. There's also a C++ committee document by JF, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html with the same title. — Peter Cordes, May 28 '23 at 05:17
The phrasing sounds odd to me, like both sides of the "or" would allow the optimization that compilers don't do. I'm reading it as "*or* are they just atomic (when they happen at all)". Combining multiple loads into one would be the same as keeping repeatable loads in a register. The question you're trying to ask is whether they're like `volatile atomic`, and if not, why do current compilers insist on doing the same load or store multiple times back to back? (Which is basically a duplicate of [Why don't compilers merge redundant std::atomic writes?](https://stackoverflow.com/q/45960387)) — Peter Cordes, May 28 '23 at 05:22
@RichardCritten When would it be useful to reload an atomic object twice in succession, with no other atomic operation, or CPU consuming operation, in between? — curiousguy, May 28 '23 at 17:50

user17732522 · Accepted Answer · 2023-05-27T20:26:30.020

I think no compiler actually does that specific optimization at the moment. In many situations it is unlikely that the programmer would want this behavior because they expect stores from other threads to become visible as quickly as possible.

I think is part of the reason that compilers are very conservative in applying optimizations to atomics, especially ones that would eliminate loads/stores, although I don't see any problem with applying the optimization in your specific example.

However, from the perspective of the ISO C++ standard, the compiler would be allowed to cache the read in a register and reuse it in general.

The compiler only has to make sure that a write to the atomic variable from another thread becomes visible to this thread in a finite time, i.e. an infinite loop reading the atomic should not keep reading a cached register value.

It should also make them visible in "a reasonable amount of time" ([atomics.order]/11). It depends on how exactly you interpret "reasonable" to determine to what degree such caching would be ok.

Also note that atomics are much more than just a guarantee that reads and writes happen atomically for all threads. Even with only std::memory_order_relaxed (the weakest variant) operations on it, it also guarantees that all threads agree to a single global modification order in which writes to the atomic happen which is consistent with sequencing in each individual thread and that read-modify-write operations happen atomically and consistent with that order. This is not guaranteed for non-atomic objects.

And with other std::memory_order_* options (which are the default) the operations also provide memory ordering guarantees with respect to other objects than the single atomic object itself.

Consistent atomic reads and writes are especially important when you have a 32 bit platform and you use sth. like an atomic_uint64t. My first thought that this is done on Windows with two loads and further a CMPXCHG8B until the compare succeeds. But MSVC simply used a SSE load that's atomic by itself. — Edison von Myosotis, May 30 '23 at 15:28

Do atomics in C++11 cache repeatable reads in a register or are they only atomic?

1 Answers1