5

Consider the code below. We know that a __uint128_t variable is stored in 2 64-bit registers (Assume x64 processor). Requirement is to store the first 64 bits in one unsigned long variable and the next 64 bits in another unsigned long variable.

__uint128_t a = SOMEVALUE;
unsigned long b = a&0xffffffffffffffff;
unsigned long c = a>>64;

Here, b stores the first 64 bits and c the next 64 bits. Is there any other, simpler way to access the 2 registers separately instead of performing & and >> operations? I ask this because for my project, this section of code will be executed for like a trillion+ times. So it's better to verify this doubt first.

Anything with assembly code I can fool around with?

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
Knm
  • 55
  • 4

3 Answers3

10

What you have written is probably best, although truncation by casting is easier to read than the long constant. As a rule of thumb, if you write code that's obvious and clear, that's usually easiest for your compiler to see your intent and optimise appropriately.

On Compiler Explorer, I supplied this function:

#include <stdint.h>

void decompose(__uint128_t num, uint64_t *a, uint64_t *b) {
    *a = (uint64_t)(num >> 64);
    *b = (uint64_t)num;
}

When compiled for x64 with gcc -O3, it produces exactly the code you want:

decompose:
        mov     QWORD PTR [rdx], rsi
        mov     QWORD PTR [rcx], rdi
        ret
Toby Speight
  • 27,591
  • 48
  • 66
  • 103
  • 1
    @TedLyngmo: Not just MSVC. *Any* mainstream compiler targeting Windows x64 has to use 32-bit `unsigned long` to be ABI-compatible with existing DLLs unless they provide alternate headers, which is impossible for every 3rd-party library. (Although to be fair, apparently the default for Windows GCC is that `long double` is 10-byte, vs. MSVC making it 8-byte, so GCC isn't strictly ABI-compatible with MSVC. But `unsigned long` is a vastly more widely used type than `long double`, especially across ABI boundaries (function args and structs)) – Peter Cordes Jul 29 '23 at 09:40
  • @PeterCordes Very true again :-) I just realized that MSVC doesn't have a 128 bit integer type. At least I couldn't find it. It has 128 bit types for simd operations though. – Ted Lyngmo Jul 29 '23 at 09:48
  • @TedLyngmo: Oh right :P I'm not aware of MSVC having a 128-bit type. It has a `_umul128` intrinsic for 64x64 => 128-bit multiplication ([How can I multiply 64 bit operands and get 128 bit result portably?](https://stackoverflow.com/q/25095741)), and I think intrinsics for ADC, but I'm not sure if they're implemented efficiently. – Peter Cordes Jul 29 '23 at 10:03
  • 1
    @PeterCordes I guess in C23 one can hope for `unsigned _BitInt(128)` to work in MSVC. [clang supports it already](https://godbolt.org/z/MrGcx8MTr) – Ted Lyngmo Jul 29 '23 at 10:13
  • 1
    @TedLyngmo: Or right, I was going to mention `_BitInt(128)` in my answer. – Peter Cordes Jul 29 '23 at 10:17
  • 1
    @PeterCordes: Re “C guarantees modulo-reduction from wider *integral* types to narrower”: Only for unsigned destination types. For signed destination types, the conversion is implementation-defined. – Eric Postpischil Jul 29 '23 at 11:29
  • 1
    @chux that's simply because I compiled without any of the usual warnings. Sorry about that. – Toby Speight Jul 29 '23 at 16:29
6

Shift/mask or a union are the way to go. Especially if you just want to read the parts of an __int128, bit-manipulation is clear and will reliably compile efficiently.

If you were replacing the upper or lower 64 bits, a union would probably make it easier for the compiler to see that than bitwise mask / shift / OR. I wouldn't be surprised if both ways compile efficiently, but a union is probably good for human readability.

Note that ordering of the halves in a union will dependend on endianness, where bit-shifts don't.


I'd recommend uint64_t or unsigned long long instead of unsigned long, since Windows x64 uses 32-bit long. Most other 64-bit ABIs use LP64 ABIs, but another case of 32-bit long is ILP32 ABIs for 64-bit CPUs, like AArch64 ILP32 and the x32 ABI. sizeof(void*) = 4 but __int128 is still supported.


I'd use a cast to truncate __int128 to 64-bit, instead of having to type the right number of fs in 0xffffffffffffffff. To me, (uint64_t)a follows Toby's guideline of "obvious and clear" even better. Making the cast explicit instead of just by assigning to a narrower variable is good for human readers. C guarantees modulo-reduction from wider integral types to narrower unsigned types, which means bitwise truncation from source types that are unsigned or 2's complement signed. (Signed integers in GCC are always 2's complement.)

a>>64 is totally fine. Even for signed __int128, an arithmetic right shift and then assignment to a 64-bit type would discard the high 64 sign bits which might be all-ones or all-zeros, and GCC will still optimize that.

#include <stdint.h>
uint64_t foo_signed (__int128 num) {
    return (num >> 64) + (uint64_t)num;
    // Intentionally sloppy in the abstract machine to see what happens:
    // (u64)num is promoted back to 128-bit for + (with zero-extension because it's unsigned)
    // then the + result truncated to uint64_t for return.
    // GCC still avoids actually generating the high half of the signed shift result.
}

uint64_t foo_unsigned (unsigned __int128 num) {
    return (num >> 64) + (uint64_t)num;
}

Both of these compile to lea rax, [rdi + rsi] / ret for x86-64. (Godbolt).


Type name for 128-bit integers

In modern GNU C, the manual currently only mentions (unsigned) __int128, not __uint128_t.

AFAIK, it's not wrong to keep using legacy __uint128_t; no reason for GCC devs to want to remove that name for the same type. See Is there a 128 bit integer in gcc? - __int128 has been around since GCC4.6, which is plenty old at this point. But unless you care about ancient GCC versions, I'd recommend unsigned __int128 for new code, like in my example above.

In ISO C23, unsigned _BitInt(128) will be standardized so you might prefer that. But last I checked, only clang supported it (but not limited to 64-bit targets the way __int128 / __uint128_t are).

In new code, probably best to use a typedef

This lets you change to portable _BitInt as needed, and save typing.

#ifdef  defined(__SIZEOF_INT128__)
typedef  unsigned __int128   u128;
  // or __uint128_t for compat with even older GCC which doesn't define __SIZEOF_INT128__
#elif   ??? // feature-test macro for this C23 feature?
typedef  unsigned _BitInt(128)  u128;
#else
#error   no 128-bit integer type available
#endif

// then use   u128  in later code.

You could write helper-functions or macros if you find the shifting and/or casting is adding noise to your code.

static inline uint64_t hi64(u128 a) { return a >> 64; }
static inline uint64_t lo64(u128 a) { return (uint64_t)a; }

Then you can simply use hi64(x) and/or lo64(x).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • @Knm: Did you leave out an operator, like `a*b` vs. `a * (__uint128_t)`? I like to put the cast ahead of the second operand in binary operations, to remind readers that it (in the abstract machine) it applies *before* the binary operator, so the other side gets promoted as well, rather than the truncated result. (Casting has higher precedence than `*` or `+` so `(__uint128_t)a * b` is equivalent, but the other way saves readers some mental effort to verify that.) – Peter Cordes Jul 29 '23 at 18:37
0

Variables are not stored in registers. They are stored in memory and processed in registers.

The C language supplies the union construct to map data in several ways, like

union MyUnion
{
    __uint128_t a;
    unsigned long long b[2];
} u;

Now you can refer to u.a, u.b[0] and u.b[1] at will, and the compiler is deemed to produce efficient code for the given processor.


Note that your construction with a mask and a shift will never be implemented as such, because processors cannot handle 128 data in a single go. Instead, your a will always be processed as two 64 bits numbers. In fact, the masking and the shift will never be performed.

Yves Daoust
  • 672
  • 9
  • 3
    *Variables are not stored in registers.* True in the C abstract machine, but the "as-if" rule allows optimizing variables into registers entirely in some cases, like the examples in Toby's and my answers (where an `__int128` function arg is passed in a pair of registers, and the function never stores it to memory.) The question is tagged [assembly], so clearly they're concerned about cases where the compiler already has a value in a pair of registers. I hope they realize that no asm instructions are needed for the splitting, and that the question is just how to express it to the compiler. – Peter Cordes Jul 29 '23 at 09:07
  • 1
    I thought the question was about optimization ("execute trillions of times..."), and was trying to *avoid* confusing them. – Peter Cordes Jul 29 '23 at 09:13
  • 3
    Some data never gets stored in memory at all. e.g. a `__int128` return value from a function (in RDX:RAX) could get split up into separate C variables for the halves without ever being stored and reloaded. This costs zero asm instructions. (If you then assign those C variables to things that do get stored in memory, was it the original __int128 data getting stored after all? Debating semantics isn't helpful here, what does matter is looking at the asm for a loop you actually care about.) – Peter Cordes Jul 29 '23 at 09:21
  • @YvesDaoust No that's not what your answer says at all. – fuz Jul 29 '23 at 09:42
  • 4
    Note that the union is less portable than the code in the question, because the ordering of high and low parts is processor-dependent. – Toby Speight Jul 29 '23 at 10:38
  • 2
    Note that while accessing members at will in C is fine, it is undefined behavior to read from a different member than what was stored in C++. – orlp Jul 29 '23 at 11:20
  • @orlp: Yes, worth mentioning in general to avoid learning non-portable tricks. But many C++ compilers do define the behaviour of union type-punning. Including in the GNU dialect of C++, the same one that provides the `__uint128_t` extension this code also requires. See [Unions and type-punning](https://stackoverflow.com/q/25664848) / https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type-punning – Peter Cordes Jul 30 '23 at 02:43