0

I have a value in uint16x8_t (a Q-register). If it was asm, I'd add two subparts of the register, e.g. for Q0 it would be vadd_u16(d0, d1) the result that I need. The problem is that I don't see how I can get that using neon intrinsics since there is no conversion from uint16x8_t to uint16x4x2_t to be able to pass low and high parts to vadd_u16.

There are lots of vreinterpret_x_y macros but not a single one converts from uint16x8_t to uint16x4x2_t. Am I missing something, how such operation should be done in arm-neon?

Pavel P
  • 15,789
  • 11
  • 79
  • 128

1 Answers1

3

You can use vget_low and vget_high

The problem is however that the compiler will make a total mess out of it, resulting in a terrible performance hit.

The built-in Clang in Android Studio is especially bad dealing with those, so are GCC version less than 6.x

Your only options are updating the toolchain to the most recent one, or sticking to assembly.

Pavel P
  • 15,789
  • 11
  • 79
  • 128
Jake 'Alquimista' LEE
  • 6,197
  • 2
  • 17
  • 25
  • that's strange, these are listed as VMOV instructions, while referencing Dn,Dn+1 from a Q register obviously doesn't need any opcodes. I always wrote neon asm and never did any neon intrinsics and couldn't figure out how to get around that. – Pavel P Mar 27 '18 at 21:39
  • @Pavel I noticed that GCC6.x or later can deal with them the way you think of, but anything older generates FUBAR machine codes with lots of `vmov`. It's especially bad on the built-in `Clang` on Android Studio. – Jake 'Alquimista' LEE Mar 28 '18 at 08:19
  • @Pavel I even tried resorting to `union`, but it gets even worse then. (memory load/store). It's truly "intrinsux". Stay with your assembly. – Jake 'Alquimista' LEE Mar 28 '18 at 08:20
  • I do intrinsics for testing/validation of my code. Seems that latest clang/gcc actually produce decent code for my messy intrinsics. I'm curious what ms arm or armcc would produce – Pavel P Mar 28 '18 at 09:58