No real answer here. Some processors have instructions that give the number of set bits (it's a pretty useless instruction for general purpose programming, but good for error detection). Assuming you don't have such an instruction, often zero is overwhelmingly the most likely value for a register to hold, and you should test that specially. Then you have to resort to counting out the bits. The basic algorithm is AND with one, add the result to the accumulator, shift right, AND with one, and repeat until you have all the bits. Or since you want zero bits, XOR with 1. But we can likely speed it up. You could take 8 bits and do a lookup. but would that be faster or slower than clocking out 8? It just depends on the particular instruction set, memory cache, and so on. If we have a "register file" such that registers are identified by index number, we can maybe set up register 0 with 4, register 1 with 3, register 2 with 3, register 3 with 2 and so on (16 registers with counts of zero bits), clock out 4 bits, then use the result to index the register file. You'll need to do several to justify the overhead.
Another issue is whether looping or unrolling will be faster. Again that is highly architecture dependent.
Then another possible trick is that if the MSB is set, the number is negative. Is a test for negative numbers faster than the AND? Quite likely. Another is that multiplication by two or addition to itself will likely set the carry flag, and add zero with carry might be faster than add register.
There are lots of possible little stratagems.