When you write in C++, you use intrinsics, and it is up to the compiler to allocate registers, the same as with usual non-vector code.
Related:
The values you see in debugger watch window are representations of a single xmm value, that may be in a register or in memory. In case of registers, is not two different xmm registers, so movdqa xmm7, xmm6 is irrelevant.
The fact that you see in in named tt variable indicates that it is in memory, the Visual Studio debugger most of the times cannot show a variable that is implemented in a register. (It may be due to debug build, whereas in release it will be in a register).
_m128i type is defined as union of arrays, because it may be seen by different operations as consisting of vectors with different element size. You are looking at vector of 32-bit values. To rearrange components there's 32-bit shuffle _mm_shuffle_epi32, as pointed out in the comment.
The same shuffle can be used to swap 64-bit values. For other value sizes (smaller than 32) there's 8-bit shuffle _mm_shuffle_epi8. It requires SSSE3 though. If you want to keep SSE2 as a baseline, the goal can be achieved in multiple intrinsics: there's _mm_extract_epi16/_mm_insert_epi16, also some shuffles can be achieved by using a sequence of _mm_unpack operations and/or shifts.