Sum array of unsigned 8-bit integers using the Accelerate framework

Question

Can I use the Accelerate Framework to sum an array of unsigned 8-bit integers without converting to an array of floats.

My current approach is:

vDSP_vfltu8(intArray, 1, floatArray, 1, size);
vDSP_sve(floatArray, 1, &result, size);

But vDSP_vfltu8 is quite slow.

I wanted to do the same thing myself a while ago, but I didn't find an obvious way to do this using the vDSP functions. Vector-vector integer addition is one of the operations missing from vDSP, perhaps because of overflow concerns. Maybe some of the Neon intrinsics talked about here could help on iOS: http://stackoverflow.com/questions/3675538/arm-neon-how-to-load-8bit-uint8-t-as-uint32-t , or SSE intrinsics on the Mac. — Brad Larson, Apr 06 '11 at 15:41
I was thinking as well about overflow concerns.Thank you for the suggestion about the Neon intrinsics (interesting stuff), but I would prefer to write processor agnostic code... and anyway by exploring this road I think that I would just discover the reason why is not implemented in vDSP :-) I guess is not a trivial problem that could be solved by some sort of workaround. — Fabio, Apr 06 '11 at 18:28

score 1 · Answer 1 · answered Apr 12 '11 at 04:55

1

If it is important to you that vDSP_vfltu8( ) be fast, please file a bug report. If there's any question, file a bug report. Inadequate performance is a bug, and will be treated as such if you report it. Library writers use this sort of feedback to determine how to prioritize their work; your bug report is the difference between a function being at the front of the queue for optimization and it being #1937 in the queue.
As has been hinted, integer accumulation is complicated by overflow concerns, but if it would be useful to have an optimized function for a specific case provided by the vDSP library, please file a bug report to request such a function (noticing a pattern?). Library writers are not psychic, and do not write functions that are not requested. Be sure to explain how you would use such a function--given this information, they may come up with a slightly different function that is even more useful to you.
If you decide to write some NEON code yourself, you will want to make use of the vaddw_u8( ) intrinsic.

answered Apr 12 '11 at 04:55

Stephen Canon

I don't know that a conversion from integer to floating point will ever be fast. From what I can see, they're using platform-specific accelerated intrinsics for this anyway, so I doubt it can be made much faster. The others are valid points for filing enhancement requests, though. – Brad Larson Apr 14 '11 at 16:48
@Brad Larson: whether or not it *can* be made fast, it doesn't hurt to ask. If it's already as fast as possible, the bug will simply be closed, no harm done. If further improvement is possible, someone will take a look at it, and your code will get faster. Potential gain, no downside? File the bug. Always file the bug. – Stephen Canon Apr 14 '11 at 16:52
1

For what it's worth, looking at the disassembly of the function in question, it can definitely be made *dramatically* faster. So really: file the bug. – Stephen Canon Apr 14 '11 at 16:54
Excellent points, will do. Good idea on checking the output assembly. – Brad Larson Apr 14 '11 at 17:00
Any word on getting any changes made to the vDSP library? Apple typically ignores bug reports in my experience, so I'm guessing no. It would be very helpful to be able to add vectors of 8-bit integers (very, very common in iOS audio). – jeremywhuff Feb 23 '15 at 23:29

1 Answers1