I want to move four xmm registers into a zmm register, perform some computations using AVX512 instructions and get the result back to the XMM registers. What is the most efficient way to do so without going through memory?
Asked
Active
Viewed 235 times
2
-
1"SSE registers" are just the lower 128 bits of AVX512 registers, so you don't have to "move them", unless you want to join (and later split) four 128 bit registers into one 512 bit register. Could you elaborate what you want to do? And are you using pure assembler or C/C++ intrinsics? – chtz Aug 22 '20 at 18:33
-
32x `vinsertf128` + 1x `vinsertf32x8` https://www.felixcloutier.com/x86/vinsertf128:vinsertf32x4:vinsertf64x2:vinsertf32x8:vinsertf64x4. So 3 shuffle instructions total. Of course you don't want to actually use legacy *SSE* instruction encodings mixed with AVX512; that could lead to transition stalls. Just use 128-bit VEX or EVEX encoded instructions like `vpshufb xmm0, xmm1, xmm2` – Peter Cordes Aug 22 '20 at 20:28
-
Thanks Peter. Do you have any suggestions how to go from one zmm to four xmm registers? – its Aug 23 '20 at 01:20
-
2To extract, use the corresponding `vextractf32x8`/`vextractf128` in reverse order: https://www.felixcloutier.com/x86/vextractf128:vextractf32x4:vextractf64x2:vextractf32x8:vextractf64x4 And note that you don't need to explicitly extract the lower 128 bits of ymm or zmm register. – chtz Aug 23 '20 at 11:07