I'm looking for a way to have one (first) field of my structure returned from a function using a CPU register without allocating it on the stack. Assume we have a structure and functions returning it:
template <typename T>
struct Pair {
int code;
T value;
};
Pair<int> FuncInt(bool const flag) {
return {flag ? 0 : 1, 42};
}
Pair<std::string> FuncString(bool const flag) {
return {flag ? 0 : 1, "DEADBEEF"};
}
auto UseInt(bool const flag) {
if (auto const rv = FuncInt(flag); !rv.code) {
return rv.value;
}
}
auto UseString(bool const flag) {
if (auto const rv = FuncString(flag); !rv.code) {
return rv.value;
}
}
I want to have code being (ideally always) returned through a CPU register; the second member value may use the stack. It works out with FuncInt, but not for FuncString. My compiler (gcc-7.2) persistently generates this:
call FuncString[abi:cxx11](bool)
cmp DWORD PTR [rsp], 0
which compares code placed on the stack. My intention is to avoid this redundant read from memory and use a register instead, like with FuncInt:
call FuncInt(bool)
test eax, eax
Is there a way to tell gcc (or clang) that instances of Pair should (or can at least) be broken into two separate variables and returned through the most efficient way?
I believe, it won't work at all with exported function and multiple object files, though FuncInt works only with registers. But I'm still curious in the single object file case.
Here is a sample: https://godbolt.org/g/ZXYn8z
In other words, I want Pair<std::string> FuncString(bool const flag) to behave like int UseString(bool const flag, std::string &value) utilizing expressiveness of C++17 including "Structured binding."
As I understand, this problem is very similar to "Scalar replacement of aggregates," but playing around with gcc optimizer's options didn't give me any outcome or insight.
UPDATED:
I don't have any specific requirements regarding the target OS and architecture beside it must be x86. If it is possible to get it done in a generic way: I'll be happy to know; if it works only in the very specific environment: still glad to know. As it's been mentioned in comments here, a compiler should not limit itself to do optimizations if the user wants it.
I went through ABI specs and seems like such trick can work with only small structs like std::pair<std::uint32_t, std::uint32_t>, and it surely does in UseInt function in my sample above.
The thing is that my second parameter is larger than the size of a register. But, I'm hoping to get the compiler to use copy elision, allocate the value object (std::string in my sample) in the caller, and pass it as a reference, which supposedly fits the register size.