1

I have question about SSE.

As I can understand, SSE consist of XMM registers. And (as I can understand) if I need move some value from some part of SSE to other part - I have to use assembler language. For example movdqa xmm7, xmm6. But it doesn't work.

I need to move value to another cell enter image description here
What should I do?

Johan
  • 74,508
  • 24
  • 191
  • 319
  • 7
    That's a shuffle, and if you are using intrinsics, you don't need asm for that. See [_mm_shuffle_epi32](https://msdn.microsoft.com/en-us/library/56f67xbk(v=vs.100).aspx) – Jester Jul 31 '17 at 11:32
  • @Jester Thank you very much. It works. –  Jul 31 '17 at 11:38
  • 1
    Note that @Jester's comment is not limited to the shuffle operation; you don't need asm to use any of SSE, really. The `` header defines also the types (`__m128`, `__m128i`, and `__m128d`) that correspond to what an XMM register can hold, as well as the intrinsics operating on them, corresponding to SSE instructions. – Nominal Animal Jul 31 '17 at 16:01
  • Note that there are separate shuffle instructions for shuffling integer values and for shuffling floating point values. On some processors, there's a small time penalty for switching between integer and floating point operations. – rcgldr Aug 02 '17 at 01:45

1 Answers1

1

When you write in C++, you use intrinsics, and it is up to the compiler to allocate registers, the same as with usual non-vector code.

Related:


The values you see in debugger watch window are representations of a single xmm value, that may be in a register or in memory. In case of registers, is not two different xmm registers, so movdqa xmm7, xmm6 is irrelevant.

The fact that you see in in named tt variable indicates that it is in memory, the Visual Studio debugger most of the times cannot show a variable that is implemented in a register. (It may be due to debug build, whereas in release it will be in a register).

_m128i type is defined as union of arrays, because it may be seen by different operations as consisting of vectors with different element size. You are looking at vector of 32-bit values. To rearrange components there's 32-bit shuffle _mm_shuffle_epi32, as pointed out in the comment.

The same shuffle can be used to swap 64-bit values. For other value sizes (smaller than 32) there's 8-bit shuffle _mm_shuffle_epi8. It requires SSSE3 though. If you want to keep SSE2 as a baseline, the goal can be achieved in multiple intrinsics: there's _mm_extract_epi16/_mm_insert_epi16, also some shuffles can be achieved by using a sequence of _mm_unpack operations and/or shifts.

Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79