1

I'm looking for a way to have one (first) field of my structure returned from a function using a CPU register without allocating it on the stack. Assume we have a structure and functions returning it:

template <typename T>
struct Pair {
    int code;
    T   value;
};

Pair<int> FuncInt(bool const flag) {
    return {flag ? 0 : 1, 42};
}

Pair<std::string> FuncString(bool const flag) {
    return {flag ? 0 : 1, "DEADBEEF"};
}

auto UseInt(bool const flag) {
    if (auto const rv = FuncInt(flag); !rv.code) {
        return rv.value;
    }
}

auto UseString(bool const flag) {
    if (auto const rv = FuncString(flag); !rv.code) {
        return rv.value;
    }
}

I want to have code being (ideally always) returned through a CPU register; the second member value may use the stack. It works out with FuncInt, but not for FuncString. My compiler (gcc-7.2) persistently generates this:

call FuncString[abi:cxx11](bool)
cmp DWORD PTR [rsp], 0

which compares code placed on the stack. My intention is to avoid this redundant read from memory and use a register instead, like with FuncInt:

call FuncInt(bool)
test eax, eax

Is there a way to tell gcc (or clang) that instances of Pair should (or can at least) be broken into two separate variables and returned through the most efficient way?

I believe, it won't work at all with exported function and multiple object files, though FuncInt works only with registers. But I'm still curious in the single object file case.

Here is a sample: https://godbolt.org/g/ZXYn8z

In other words, I want Pair<std::string> FuncString(bool const flag) to behave like int UseString(bool const flag, std::string &value) utilizing expressiveness of C++17 including "Structured binding."

As I understand, this problem is very similar to "Scalar replacement of aggregates," but playing around with gcc optimizer's options didn't give me any outcome or insight.

UPDATED:

I don't have any specific requirements regarding the target OS and architecture beside it must be x86. If it is possible to get it done in a generic way: I'll be happy to know; if it works only in the very specific environment: still glad to know. As it's been mentioned in comments here, a compiler should not limit itself to do optimizations if the user wants it.

I went through ABI specs and seems like such trick can work with only small structs like std::pair<std::uint32_t, std::uint32_t>, and it surely does in UseInt function in my sample above. The thing is that my second parameter is larger than the size of a register. But, I'm hoping to get the compiler to use copy elision, allocate the value object (std::string in my sample) in the caller, and pass it as a reference, which supposedly fits the register size.

Dení
  • 183
  • 1
  • 9
  • Besides your compiler, which you identified, this is also highly dependent on your specific implementation (operating system), which you failed to specify. The answer will depend also on your OS (Linux, BSD, MS-Windows), and hinges on the implementation's ABI options. Something like this changes the C++ ABI. If there is such a compiler option, the compiled code will be incompatible with code not compiled with this option, as such it will require an ABI change. I don't know for certain, but I expect it highly unlikely that gcc will offer such an option on any implementation. – Sam Varshavchik Jan 21 '18 at 02:15
  • @SamVarshavchik The compiler does not need to limit itself to ABI if it can see that the function is not exposed outside of the compilation unit (which can be whole program in case of LTO). In that case it can use any calling convention as optimization. – michalsrb Jan 21 '18 at 02:17
  • If you want to guarantee it you have to write the call in assembly for a given architecture. You're assuming here of course that you know that your CPU actually has registers :P – Ahmed Masud Jan 21 '18 at 02:24
  • yes, [many compilers will do that](https://stackoverflow.com/q/46901697/995714). [How to optimize function return values in C and C++ on x86-64?](https://stackoverflow.com/q/25381736/995714), [About returning more than one value in C/C++/Assembly, Return a struct from a function](https://stackoverflow.com/q/31497152/995714) – phuclv Jan 21 '18 at 04:40
  • @SamVarshavchik I don't have any specifics regarding target OS and architecture except x86. If it is possible to get done generically: I'll be happy to know. If it works only in the very specific environment: still glad to know. As michalsrb mentioned, a compiler should not limit itself to do optimizations if the user wants it. – Dení Jan 22 '18 at 02:41
  • Other than LTO, you can also use `static` or an anonymous namespace, to let the compiler know that the function is private to this translation unit and thus doesn't need to follow the platform ABI. – Marc Glisse Jan 22 '18 at 06:51

0 Answers0