If you insist on the reversed order of bytes (for whatever weird reason), you can still simply scan for first bit set to 1, divide by 8 and +1 to get number of bytes.
GetReversedShiftedUtf8BytesCount:
; eax = UTF8 code in reversed order, by from LSB
; 'É' (c3 89) => eax = 0x0000c389
bsr ecx,eax
cmovz ecx,eax ; needed only for eax = 0
; ^ if eax is never 0 on input, this "cmovz" can be removed
shr ecx,3
inc ecx
ret
As you put the first byte of char into MSB, it will produce number of bit 15, 23 or 31 for multibyte chars, for 7b ASCII anything from 0 to 6 will be produced by bsr. "div 8" will fix them straight all, either way, it doesn't care.
This routine should actually work also with valid normal UTF8 codes.
For invalid UTF8 code ending with zero byte it will return wrong number of bytes (without the zero ones).
There's of course as always also LUT solution possible:
movzx ecx,al
shr ecx,3
movzx ecx,byte [utf8lengthLUT + ecx] ; +rcx for 64b
; ecx = number of bytes or 0 for invalid leading byte value
...
utf8lengthLUT: ; 32B look-up table for upper 5b of 1st byte
db 1, 1, 1, 1, 1, 1, 1, 1 ; 00000 - 00111 ; single byte
db 1, 1, 1, 1, 1, 1, 1, 1 ; 01000 - 01111 ; single byte
db 0, 0, 0, 0, 0, 0, 0, 0 ; 10000 - 10111 ; not valid leading byte
db 2, 2, 2, 2 ; 11000 - 11011 ; two bytes code point
db 3, 3 ; 11100 - 11101 ; three bytes code point
db 4 ; 11110 ; four bytes code point
db 0 ; 11111 ; not valid leading byte
I didn't debug it, just tried to translate with nasm for syntax check. Nor did I profile it of course. :) Looking at shortness of that bsr variant, I'm in doubt this will be seriously faster even on CPUs where bsr hurts.
But this one treats invalid UTF8 opcodes differently, instead of detecting non-zero MSB and returning number of it+1 (not sensitive to leading byte content), it will decode the leading byte info properly and return 0 when leading bits are wrong. But correct leading bits with incorrect 2nd+ byte (like c3 00) will still return 2, while the first variant returns 1 in such case.
(it's possible to go with only 16B LUT table, if you don't care about invalid 11111 leading byte info and you will take that as 4 byte code point)
BTW, there are some i18n libraries (open source), doing all those things like validating utf8 inputs, fixing invalid ones, counting chars, etc... Some of these are around for more than a decade... and yet still getting bug reports and fixes. Which is sort of an subtle hint how difficult it is to write this stuff correctly (without exposing app to some input data exploit). :)
(plus considering how many (fixing) edits received these two answers... :) )
And one more offtopic advice: if you will ever try to write something with PHP, supposed to process UTF8 input data (which are not from trusted source, but even from trusted source), especially if those input data are from GET/POST response... just don't on your own. Never. Get some framework for that one. :)