1

I have a bunch of packed floats inside an XMM register (using SSE intrinsics):

__m128 xmm = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f);

I'd like to convert all of these to integers in one go. I found an intrinsic, that does what I want (_mm_cvtps_pi16()), but it yields 4x16-bit short instead of full-blown int. An intrinsic called _mm_cvtps_pi32() yields int, but only for the two lower values in xmm. I can use it, extract the values, move things around and use it again, but is there a simpler way? Why wouldn't there be a straightforward 32bit packed float -> 32bit integer instruction? Surely both fit in the same space of an XMM register?

EDIT: Okay, I see now that _mm_cvtps_pi32() returns __m64 instead of __m128, which means it operates on a MMX-style MM... register. That would explain why it returns just two ints, but now I'm wondering:

  • Will I have trouble when compiling for x64? Reportedly, __m64 isn't supported there...
  • Why didn't they extend this instruction when SSE rolled out?

Thanks!

neuviemeporte
  • 6,310
  • 10
  • 49
  • 78

2 Answers2

4

According to this documentation: __m128d _mm_cvtps_epi32(__m128d a) generates a cvtps2dq instruction, which does what you want.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • 1
    It's worth taking a moment to understand the suffixes. In this case the question's `pi32` leads directly to this answer's `epi32` -- the `e` for extended. Extended, parallel, integer of 32 bits. – Ben Jackson Sep 17 '13 at 23:43
  • I used to think `__m128d` was used for storing two 64-bit floats, that's why I didn't look at this intrinsic more carefully. Any idea why this return type? – neuviemeporte Sep 17 '13 at 23:44
  • Okay, it looks as we were both wrong, the return type is actually `__m128i` and all is right now. The intrinsic is documented in the `__m128d` section of the SSE2 docs on MSDN, though, for a reason I don't understand. – neuviemeporte Sep 17 '13 at 23:47
  • I can't vouch for the documentation (I didn't write it, I just searched for the instruction I wanted), but it seems like the other answer also suggests `_mm_cvtps_epi32`, so it may be worth trying that one. – Mats Petersson Sep 17 '13 at 23:48
  • Thank you very much. It's just that I find these docs very confusing. Accepting now. – neuviemeporte Sep 17 '13 at 23:50
  • Yeah, I look up the instruction in my books first, then search for it with google... – Mats Petersson Sep 17 '13 at 23:54
  • It looks like there are mistakes on the page you link to - the intrinsic should be: `__m128i _mm_cvtps_epi32 (__m128 a)` - I recommend using the [Intel Intrinsics Guide](http://software.intel.com/en-us/articles/intel-intrinsics-guide) for looking this stuff up, rather than Google or Microsoft. – Paul R Sep 18 '13 at 07:26
1

Use documentation (_mm_cvtps_epi32):

Magic documentation.

  • I guess it's my bad for sticking with the MSDN docs. I figured it was the way to go, since I'm writing in Visual C++ on Windows. – neuviemeporte Sep 18 '13 at 00:03
  • Some times you need to search deeper: [MSDN documentation](http://msdn.microsoft.com/en-us/library/xdc42k5e(v=vs.90).aspx) – Jakub Świerk Sep 18 '13 at 00:24
  • A more useful reference is the [Intel Intrinsics Guide](http://software.intel.com/en-us/articles/intel-intrinsics-guide) - it's a documentation tool for Linux/Windows/OS X and it's much more comprehensive/accurate and quicker/easier to use than MSDN. – Paul R Sep 18 '13 at 07:31