For 4-QAM, checking sign of real and imaginary parts is enough, but in my opinion this approach does not scale and using it for higher order modulations would require checking against multiple different thresholds. This simply seems wasteful.
Well, there's hardly a different way to do it! You first decide the sign of the real and imaginary part (and that typically gives you the first two bits: Gray coding!).
Then, you know in which quadrant you are. You add / subtract a complex constant so that the quadrant lies centered.
Then you again decide the real and imaginary part.
Repeat.
Or, you just go through a series of if / else if / else statements.
In a software decider, the iterative approach is "natural", in a hardware decider (i.e. digital logic circuit), the second might be faster, because you can basically do as many comparisons as you want in parallel.
On the other hand, I could not come up with any alternatives except checking a distance between a received symbol and all possible symbols (essentially ML detector). This also does not seem like a good idea, especially in case of large constellations, like 1024-QAM.
Since the ML decision for a rectangular QAM is exactly what you get with threshold decisions, well, that wouldn't be any better.
Let me lose a few words on:
Let us assume hard detection for simplicity.
vs
how higher order QAM modulations are demodulated in practice
In practice, you use larger QAM (like, large, as in 1024 and up) because you need to get close to channel capacity at a high SNR.
So, using a large QAM and then hard decision is basically a waste. Should have used a smaller QAM and less forward error correction redundancy instead, if the rate doesn't matter! In some cases you don't do the soft decoding for complexity reasons, but in general, in modern systems using large QAMs, the channel codes used allow for soft decoding. That's by design.
So, you do use soft decision. It's basically the same, you do it first for the first bit (project onto the real axis, calculate the Log-Likelihood ratio (LLR) directly from that value), then for the second value (project onto imaginary axis, calculate the LLR value), and then you shift the whole constellation, and repeat. (there's simplifications/approximations done here.)
You feed the the thus won LLRs (aka. softbits!) into your soft decoder.