Loading data into a register only when the data is in the CPU cache?

Question

I am writing code to “walk a graph” and don’t care the order I visit the nodes in. Therefore it will clearly be of benefits to order the graph walk to take advantage of what is already loaded into the CPU cache.

But how can I make a load instruction “abort” if the data is not already in the CPU cache?

You can't and this doesn't sound like a good idea to me. If there are a lot of loads that miss in the cache, then you could be wasting a lot of time just aborting loads, doing no actual work. If there are only few loads like that, then would not constitute a perf problem. Software prefetching can be very useful when walking graphs, but only if the processing time of each graph node is not trivial so that you can prefetch the next node before you start processing the current one. — Hadi Brais, Sep 07 '18 at 17:33
Near duplicate of [Is it possible to “abort” when loading a register from memory rather the triggering a page fault?](https://stackoverflow.com/q/52221575). @Hadi: The garbage collection use-case is plausible for pagefault, but prob. not cache. As I commented on Margaret's answer, it would be cheap for most uarches to provide try-load that sets flags according to success/abort, allowing you to defer misses. No actual ISA do, though. The same could work here, but would require support from other levels of cache for hit-only requests, not just the load port (except for L1d hits). — Peter Cordes, Sep 07 '18 at 23:56
@PeterCordes I don't see how a programmer might benefit from such probing instructions whether for the TLB or data cache. So if the target page table entry or line was not in the cache, then what? Check next the target line? Then? This is not the right approach. *There needs to be an algorithm and the graph needs to be allocated in an intelligent fashion and can be reorganized in memory dynamically as needed*. This can be done by having the GC track object allocation and access patterns. People have researched these idea for many years. Software TLB prefetching is an active... — Hadi Brais, Sep 08 '18 at 00:54
...area of research and partially supported in x86 (through `PREFETCHh`). From uarch design perspective, supporting TLB probing requires changes to the frontend to support new instructions and changes to the load buffer design to hold a new type of loads and changes to the TLB and wires between the TLB and the load buffer for the result. I wouldn't consider these changes to be major. So it mainly depends on whether there is a solid use case, especially a case where probing is measurably better than prefetching. — Hadi Brais, Sep 08 '18 at 00:55
@HadiBrais: Then what? go down the other side of the tree first, in case *one* side is in cache. The OP suggested maybe adding "cold" pointers to a CheckLater set while crawling a garbage-collection graph. Another use-case is probing the page table to decide whether to make a system call like `madvise(MADV_WILLNEED)` or Windows `PrefetchVirtualMemory`. System calls are expensive, a cheap check to avoid doing that even on hot pages that are already wired, and thus don't need to be brought in from disk would be great. — Peter Cordes, Sep 08 '18 at 01:19
@HadiBrais: I wrote an answer on [Is it possible to “abort” when loading a register from memory rather the triggering a page fault?](https://stackoverflow.com/a/52349667) that basically covers cache misses as well as TLB/page misses. The first half of the answer kind of says what I said in comments here, about avoiding system calls being far more valuable than avoiding L3 misses. — Peter Cordes, Sep 15 '18 at 23:22
Closing as a dup because there isn't any current way to do it, and my answer on your other question discusses the merits of this idea as well. — Peter Cordes, Sep 15 '18 at 23:23

Loading data into a register only when the data is in the CPU cache?

0 Answers0