Count number of bits set to 0 in a register

Question

What is the most efficient way to look at the contents of a register and count the number of bits that are set to 0 and then save that count in a different register?

Obviously a loop is necessary along with LSR, but I'm not sure how to implement that along with the AND instruction as well as EOR.

Possible duplicate of [Fastest way to count number of 1s in a register, ARM assembly](http://stackoverflow.com/questions/15736602/fastest-way-to-count-number-of-1s-in-a-register-arm-assembly) — Notlikethat, Oct 02 '16 at 22:45

score 0 · Answer 1 · answered Oct 02 '16 at 23:58

No real answer here. Some processors have instructions that give the number of set bits (it's a pretty useless instruction for general purpose programming, but good for error detection). Assuming you don't have such an instruction, often zero is overwhelmingly the most likely value for a register to hold, and you should test that specially. Then you have to resort to counting out the bits. The basic algorithm is AND with one, add the result to the accumulator, shift right, AND with one, and repeat until you have all the bits. Or since you want zero bits, XOR with 1. But we can likely speed it up. You could take 8 bits and do a lookup. but would that be faster or slower than clocking out 8? It just depends on the particular instruction set, memory cache, and so on. If we have a "register file" such that registers are identified by index number, we can maybe set up register 0 with 4, register 1 with 3, register 2 with 3, register 3 with 2 and so on (16 registers with counts of zero bits), clock out 4 bits, then use the result to index the register file. You'll need to do several to justify the overhead.

Another issue is whether looping or unrolling will be faster. Again that is highly architecture dependent.

Then another possible trick is that if the MSB is set, the number is negative. Is a test for negative numbers faster than the AND? Quite likely. Another is that multiplication by two or addition to itself will likely set the carry flag, and add zero with carry might be faster than add register.

There are lots of possible little stratagems.

score 0 · Answer 2 · answered Aug 31 '23 at 06:30

Given all of the varied hardware bits involved, I’d be looking very carefully at __builtin_popcount() and friends if your compiler is GCC compatible. That is the most sensible answer for the garden variety programmer, because the compiler is doing the instruction selection for you.

https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

As for algorithms, there is coverage on bit hacks for scalars in Hackers Delight, and lookup table driven methods, including those that use the SIMD shuffle units to perform nibble-wise LUTs on vectors. Some vector and scalar ISAs also include (optional?) popcnt instructions.

The 0 count is obviously 8*sizeof(type) - 1s count, which is what popcount delivers.

Count number of bits set to 0 in a register

2 Answers2