I'm looking at clang's output, to see what the C code:
(mask==0xffff ? one : zero)
This produces, where one is set like this:
const __m128i one = _mm_set_epi64x(0, 1);
And the assembly output:
4e0: 66 0f d7 c0 pmovmskb eax, xmm0
4e4: 3d ff ff 00 00 cmp eax, 65535
4e9: 66 0f ef c0 pxor xmm0, xmm0
4ed: 74 06 je 6 <_vm_run+0x465>
4ef: 66 0f ef c9 pxor xmm1, xmm1
4f3: eb 0a jmp 10 <_vm_run+0x46F>
4f5: b8 01 00 00 00 mov eax, 1
4fa: 66 48 0f 6e c8 movq xmm1, rax
My question is why clang doesn't promote one to a register? (there are some unused). Is it a question of call convention?
It would have saved quite a few bytes (a move between xmm registers is only 4 or 5 bytes).
EDIT:
Here's a reproductible example: https://godbolt.org/z/qfTZqY