Why doesn't the Windows x64 calling convention use XMM registers to pass more than 4 integer args?

Question

The (Microsoft) x64 calling convention states:

The arguments are passed in registers RCX, RDX, R8, and R9. If the arguments are float/double, they are passed in XMM0L, XMM1L, XMM2L, and XMM3L.

That's great, but why just floats/doubles? Why aren't integers (and maybe pointers) also passed via XMM registers?
Seems a little like a waste of available space, doesn't it?

I wrote a more complete answer to this on a later question: [Why not store function parameters in float registers?](//stackoverflow.com/a/33707435). Would you mind editing the tags to include `x86-64` and/or `assembly` so I can dup-hammer this? — Peter Cordes, Feb 24 '19 at 15:54
@PeterCordes: But this one came 4 years earlier...?! Also, in my question I wasn't really intending to masquerade pointers as non-pointers or vice-versa. The other poster intended to actually masquerade them as different types so it's not quite the same question... I could imagine that e.g. inhibiting optimizations in cases where this one wouldn't. — user541686, Feb 24 '19 at 20:18
A later canonical can be a good dup target to clean up earlier scattered questions of the same problem. That's basically what happened here, but I didn't intentionally write it with the aim of being a canonical, just answering that question. I don't see anything in the other question about type-punning to float/double in C to make this happen, just asking (like you are) why XMM regs aren't used for passing integer/pointer arg types. (They say "float registers" instead of XMM registers, but it's still a calling-convention-design question, not hacking C to work around it.) — Peter Cordes, Feb 24 '19 at 20:45
Whether you agree or not about closing as a dup, this question is still missing an x86-64 tag. (I'd rather not edit myself in case I *am* able to convince you that this is a duplicate.) — Peter Cordes, Feb 24 '19 at 20:48
@PeterCordes: Not "in C". But *"using float registers in order to store the next parameters, even when the parameters are not single/double precision variables"* is exactly proposing the equivalent of type-punning. Again, I don't think that was my intention in the question (and it certainly isn't just from the text), so I don't agree either is a dupe of the other. One could disagree on the proposal in that question but not this one. (For the tag itself, you'll probably save more time than we've both wasted arguing if you just go ahead and add it. It's not like I'd disagree this is x86-64.) — user541686, Feb 24 '19 at 21:00
But that's *exactly* what you're proposing. On x86-64, "float regs" are XMM regs. (XMM regs aren't *only* float regs, they're also integer-vector regs, but they're not GP integer so like the answer here says, they can't be used directly as pointers, or with other GP regs). Would it help if we retitled that question? — Peter Cordes, Feb 24 '19 at 21:21
I edited the other question, because that's an improvement to that question. (Not *just* to bend it into a better duplicate of this one). i.e. I edited the question to ask what I answered in my answer, because that's probably the most useful thing for everyone at this point. — Peter Cordes, Feb 24 '19 at 21:27
@PeterCordes: No, I don't, and no, it's not. I wrote this question. I'm telling you I didn't/don't see "float" as a synonym for XMM, and the intention of my question was not to shove integers into registers that hold floats. Whereas the other question clearly intended to do that. You keep rejecting what I write and then admitting in parentheses that I'm correct. This is really frustrating. If you're going to argue and dupe-hammer no matter how much I tell you I wasn't asking the same thing as that question then just save me the time and frustration and do it in the beginning. — user541686, Feb 24 '19 at 21:28
I can't dup-hammer unless someone *else* edits this question to add a tag that I have a gold badge in. `x86-64` or `assembly` would both fit. Dup-hammer doesn't apply to tags you add yourself in an edit. But anyway, you've convinced me it's not an *exact* duplicate. That Q&A wants to just avoid store/reload by copying between registers. You're picturing that integer args could actually be *used* in XMM regs with `paddd` and so on, where [Why not store function parameters in XMM vector registers?](//stackoverflow.com/q/33707228) maybe wasn't. — Peter Cordes, Feb 24 '19 at 21:35
My answer there answers most of both questions, though, but Windows x64 is different. There is actually more to say about this. — Peter Cordes, Feb 24 '19 at 21:36

score 7 · Accepted Answer · answered Jun 08 '11 at 06:26

7

Because most operations on non-FP values (i.e. integers and addresses) are designed to use general purpose registers.

There're integer SSE operations but they are arithmetical only.

So, if calling convention supported passing integers and addresses via SSE registers, it would be almost always necessary to copy value to general purpose registers.

answered Jun 08 '11 at 06:26

elder_george

7,849
24
31

But isn't it still better than potentially spilling onto the stack? – user541686 Jun 08 '11 at 06:28
IIRC, there're no operations for moving data between GP register and SSE register. This means that passing data via SSE reg. requires (in the worst case) copying value to memory (preferrably aligned) *twice*. That's much worse than using stack. BTW, some CPU vendors map stack on register files, so stack access becomes less expensive. Don't know if Intel or AMD do this, though. – elder_george Jun 08 '11 at 06:41
+1 I totally wasn't aware of that^ fact, that explains a lot. :) Thanks! – user541686 Jun 08 '11 at 06:44
@Mehrdad: no, x86 is really good at store/reload, with efficient store forwarding (like 5c latency). There is `movq xmm, r64` and `movq r64, xmm`, but that has worse throughput than loads. Better latency, though. And besides, with an arg safely in memory you don't have to spill it again if you want to call another function before using it. – Peter Cordes Feb 24 '19 at 15:56

score 3 · Answer 2 · answered Feb 24 '19 at 21:51

Functions often want to use integer args with pointers (as indices or to calculate an end-pointer as a loop bound), or with other integer args in GP registers. Or with other integers loaded from memory that they want to work with in GP registers

You can't efficiently use an integer in an XMM reg as a loop counter or bound, because there's no packed-integer compare that sets integer flags for branch instructions. (pcmpgtd creates a mask of 0/-1 elements).

See also Why not store function parameters in XMM vector registers? and the other answer here for more.

But even beyond that, this design idea is not even an option for Windows x64 fastcall / vectorcall.

Windows x64 chooses to waste space on purpose to simplify variadic functions. The register args can be dumped into the 32-byte "shadow space" / "home space" above the return address, to form an array of args.

This is why (for example) Windows x64 passes the 3rd arg in R8 or XMM2, regardless of the types of the earlier args. And why calls to variadic functions require FP args to also be copied to the corresponding integer register, so the function prologue can dump the arg regs without figuring out which variadic args were FP and which were integer.

To make the arg-array thing work, only 4 total args can be passed in registers, regardless of whether you have a mix of integer and FP args. There are enough GP integer regs to hold the max number of register args already, even if they're all integer.

(Unlike x86-64 System V, where the first up-to-8 FP args are passed in xmm0..7 regardless of how many integer/pointer arg-passing registers are used.)

Why does Windows64 use a different calling convention from all other OSes on x86-64?

Why doesn't the Windows x64 calling convention use XMM registers to pass more than 4 integer args?

2 Answers2