1

Is there a built-in facility to Accelerate or elsewhere for summing an array of UInt32 using accelerated vector operations?

davecom
  • 1,499
  • 10
  • 30
  • As I understand it from http://stackoverflow.com/questions/5567517/sum-array-of-unsigned-8-bit-integers-using-the-accelerate-framework, there is no such function which operates directly on integer arrays. – Martin R Dec 21 '16 at 08:03

1 Answers1

4

I suppose that you want to accelerate a function such as

func scalarsum (_ test_array: [UInt32]) -> UInt32 {
   var result : UInt32 = 0
   for x in test_array {
     result = result &+ x
   }
   return result
}

So maybe you can write something complicated such as this...

func simdsum (_ test_array: [UInt32]) -> UInt32 {
   var tmpvector=uint4(0)
   // assume test_array.count is divisible by four
   let limit = test_array.count/4
   for i in 0..<limit {
     let thisvector = uint4(test_array[4*i],test_array[4*i+1],test_array[4*i+2],test_array[4*i+3])
     tmpvector = tmpvector &+ thisvector
   }
   return tmpvector[0] + tmpvector[1] + tmpvector[2] + tmpvector[3]
}

However, let us look what assembly swift produces for the first function...

simdsum[0x100001070] <+448>: movdqu 0x20(%rcx,%rdi,4), %xmm2 simdsum[0x100001076] <+454>: movdqu 0x30(%rcx,%rdi,4), %xmm3 (...) simdsum[0x10000107c] <+460>: paddd %xmm2, %xmm0 simdsum[0x100001080] <+464>: paddd %xmm3, %xmm1

Ah! Ah! Swift is smart enough to vectorize the sum.

So the short answer is that if you are trying to manually design a sum function using SIMD instructions in Swift, you are probably wasting your time... the compiler will do the work for you automagically.

See further code at https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/extra/swift/simdsum

Daniel Lemire
  • 3,470
  • 2
  • 25
  • 23
  • The compiler will also vectorize `array.reduce(0, &+)`; I don't think there's really much reason to even write a function. (That said, it is possible to handle alignment and edging better than the compiler does; this would yield significant speedups if your array is usually small, at the cost of writing explicit vector code. For typical lengths > 128 or so, the difference will be minimal). – Stephen Canon Dec 11 '17 at 17:41
  • 1
    "I don't think there's really much reason to even write a function." -> Point taken but, of course, this does not contradict what I stated. – Daniel Lemire Dec 13 '17 at 00:49