Can I load a value into a register without stalling until it is fetched from memory?

Question

I'd like to kick off a main memory access, but as long as I get the new value at some point, I'm not too fussy about exactly when the new value arrives. In the meantime, I'd like to continue using the value that's currently in the register, rather than stalling until the memory fetch completes.

Here is a motivating toy example in C. I have a worker loop that continually does some work, and accumulates the results into some storage. Every so often, I'd like to swap where the values are being accumulated to (analogous to log rotation on a Linux server). As long as each computed value gets accumulated to exactly one storage location, I'm happy.

As written, the cpu will stall while the memory fetch happens, because I haven't expressed that I want to use the pre-existing, "wrong/stale" value in the register until the memory fetch completes.

How can I make that clear to the compiler/cpu? It's fine if it requires directly writing assembly. I'm interested in answers that work on any/all architectures.

// does unspecified work and returns an int.
inline int do_work();

// where our main thread accumulates its work
int * accumulator;

// assume that some other thread is concurrently updating this.
// Every so often, we want to switch our accumulator to be a
// newer value taken from here, but we're not fussy about exactly
// when that switch happens.
int * volatile * current_accumulator;


void work_loop() {
    int * accumulator = *current_accumulator;
    // Here, I want to block once until `*current_accumulator`
    // is loaded into a register.
    while (1) {
        for (int i = 0; i < 100; ++i) {
            // This has a dependency on the memory load, and so
            // it will stall until the new value is loaded into
            // the register. However I don't want that. I want the
            // register to update eventually, but I want to always
            // be using the value that happens to currently be there.
            *accumulator += do_work();
        }
        accumulator = *current_accumulator;
    }
}

I find this interesting, but from an academic point of view only. Do you have a real-world example where this presumed stall is causing measurable slowdown despite your CPU's pipelining? — Thomas, Nov 07 '18 at 19:06
Take a step back, explain what your *actual* problem is, because this is almost certainly an XY problem. — EOF, Nov 07 '18 at 19:08
I don't think this is possible as this breaks the transparency of caching. — fuz, Nov 07 '18 at 19:11
Your register won't magically be updated at some later time for obvious reasons. What you can do is use prefetching but if that doesn't provide the data by the time you actually need it, it will stall. — Jester, Nov 07 '18 at 19:12
Wouldn't it be faster to accumulate in a register and then periodically decide where to write it? — stark, Nov 07 '18 at 19:15
On the Mill you can (to an extent): https://www.youtube.com/watch?v=8E4qs2irmpc&t=1604 — Swordfish, Nov 07 '18 at 19:30
@stark: according to C's "as-if" optimization rules, this code will already optimize the `*accumulator = do_work();` into a register, if `do_work()` can inline and can be proven not to read from whatever `accumulator` points to. (alias analysis). `int *accumulator` is not `volatile`, so actual memory access is not a side-effect that needs to be preserved, only correctness of the final result. — Peter Cordes, Nov 07 '18 at 23:53
@ajp: out-of-order execution attempts to solve the problem of hiding cache-miss latency in a much more general way, without changing correctness. Most CPU architectures don't have anything like this where the results are timing-dependent. — Peter Cordes, Nov 07 '18 at 23:56
Related: [Is it possible to “abort” when loading a register from memory rather the triggering a page fault?](https://stackoverflow.com/q/52221575) (not very efficiently, but there is kind of a hack you can use on Intel x86 CPUs with transactional memory). — Peter Cordes, Nov 08 '18 at 02:44

Can I load a value into a register without stalling until it is fetched from memory?

0 Answers0