0

The Extended Asm manual https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html says the following about the "memory" clobber:

The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.

I am confused about the decision to flush to memory. Before the asm code, how would GCC know if a register serves as a cache for a memory location, and thus needs to be flushed to memory? And is this part of cache coherency (I thought cache coherency was a hardware behavior)? After the asm code, how does GCC distinguish a register as a cache and, next time the register is read, decide instead to read from memory as the cache may be old?

Carlos Vazquez
  • 426
  • 3
  • 9
  • It's not talking about hardware cache. Just that the compiler itself used a register for caching a variable. Of course it knows if it has done that so it knows if the register needs to be flushed or not. – Jester Jan 26 '19 at 16:30

1 Answers1

1

Before the asm code, how would GCC know if a register serves as a cache for a memory location, and thus needs to be flushed to memory?

Because GCC is the one who generates this code.

Generally, from GCC's perspective:

[C code to compile]
[your inline asm with clobber]
[C code to compile]

GCC generates the assembly instructions prior and after your inline asm, hence it knows everything before and after it. Now, since the memory clobber means sw memory barrier, the following applies:

[GCC generated asm]
[compiler memory barrier]
[GCC generated asm]

So GCC generates the assembly before and after the barrier, and it knows that it cannot have memory accesses crossing the memory barrier. Basically, from GCC's eyes, there is code to compile, then memory barrier, then more code to compile, and that's it, the only restriction the memory barrier applies here is that GCC generated code must not have memory accesses crossing this barrier.

So if, for example, GCC loads a register with a value from memory, change it, and store it back to memory, the load and store cannot cross the barrier. Depending on the code, they must reside before or after the barrier (or twice, on both sides).

I would recommend you reading this related SO thread.

Ped7g
  • 16,236
  • 3
  • 26
  • 63
izac89
  • 3,790
  • 7
  • 30
  • 46
  • Thank you! In this example (from the [Extended Asm](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html)) `asm ("sumsq %0, %1, %2" : "+f" (result) : "r" (x), "r" (y), "m" (*x), "m" (*y));`, there is no memory clobber, but why do you need symbols %3 and %4? In [this](https://stackoverflow.com/questions/49067312/gcc-extended-asm-understanding-clobbers-and-scratch-registers-usage?rq=1) thread, it explains that they tell gcc to flush any x/y registers it was using. But shouldn't gcc know to do this anyways, since it knows symbols %1 and %2 are read from memory? – Carlos Vazquez Jan 26 '19 at 17:13
  • in this example, `x` and `y` are pointers. If you eliminate %3 and %4, than gcc is only aware that you use the addresses, it does not know that your instruction/s access the memory pointed by them, gcc does not know what your instructions does. So why whould it flush the values pointed by them if it think your instruction won't access the memory? (given the elimination of %3 and %4) – izac89 Jan 26 '19 at 17:29
  • 1
    @CarlosVazquez gcc doesn't parse/understand the instruction block, so it is not aware there is `sumsq` instruction, so it can't deduct memory usage by such knowledge. If that source code would use instead of inline assembly some intrinsic function for `sumsq`, then the compiler can get this internal info from the intrinsic definition, and optimize "around" such `sumsq`, but inline assembly is "black box" for compiler, only the clobber info tells it what the assembly affects. If you are not precise with that extended info (in, out, clobber), compiler may generate code based on wrong assumptions – Ped7g Jan 26 '19 at 18:54