Shift/mask or a union are the way to go. Especially if you just want to read the parts of an __int128, bit-manipulation is clear and will reliably compile efficiently.
If you were replacing the upper or lower 64 bits, a union would probably make it easier for the compiler to see that than bitwise mask / shift / OR. I wouldn't be surprised if both ways compile efficiently, but a union is probably good for human readability.
Note that ordering of the halves in a union will dependend on endianness, where bit-shifts don't.
I'd recommend uint64_t or unsigned long long instead of unsigned long, since Windows x64 uses 32-bit long. Most other 64-bit ABIs use LP64 ABIs, but another case of 32-bit long is ILP32 ABIs for 64-bit CPUs, like AArch64 ILP32 and the x32 ABI. sizeof(void*) = 4 but __int128 is still supported.
I'd use a cast to truncate __int128 to 64-bit, instead of having to type the right number of fs in 0xffffffffffffffff. To me, (uint64_t)a follows Toby's guideline of "obvious and clear" even better. Making the cast explicit instead of just by assigning to a narrower variable is good for human readers. C guarantees modulo-reduction from wider integral types to narrower unsigned types, which means bitwise truncation from source types that are unsigned or 2's complement signed. (Signed integers in GCC are always 2's complement.)
a>>64 is totally fine. Even for signed __int128, an arithmetic right shift and then assignment to a 64-bit type would discard the high 64 sign bits which might be all-ones or all-zeros, and GCC will still optimize that.
#include <stdint.h>
uint64_t foo_signed (__int128 num) {
return (num >> 64) + (uint64_t)num;
// Intentionally sloppy in the abstract machine to see what happens:
// (u64)num is promoted back to 128-bit for + (with zero-extension because it's unsigned)
// then the + result truncated to uint64_t for return.
// GCC still avoids actually generating the high half of the signed shift result.
}
uint64_t foo_unsigned (unsigned __int128 num) {
return (num >> 64) + (uint64_t)num;
}
Both of these compile to lea rax, [rdi + rsi] / ret for x86-64. (Godbolt).
Type name for 128-bit integers
In modern GNU C, the manual currently only mentions (unsigned) __int128, not __uint128_t.
AFAIK, it's not wrong to keep using legacy __uint128_t; no reason for GCC devs to want to remove that name for the same type. See Is there a 128 bit integer in gcc? - __int128 has been around since GCC4.6, which is plenty old at this point. But unless you care about ancient GCC versions, I'd recommend unsigned __int128 for new code, like in my example above.
In ISO C23, unsigned _BitInt(128) will be standardized so you might prefer that. But last I checked, only clang supported it (but not limited to 64-bit targets the way __int128 / __uint128_t are).
In new code, probably best to use a typedef
This lets you change to portable _BitInt as needed, and save typing.
#ifdef defined(__SIZEOF_INT128__)
typedef unsigned __int128 u128;
// or __uint128_t for compat with even older GCC which doesn't define __SIZEOF_INT128__
#elif ??? // feature-test macro for this C23 feature?
typedef unsigned _BitInt(128) u128;
#else
#error no 128-bit integer type available
#endif
// then use u128 in later code.
You could write helper-functions or macros if you find the shifting and/or casting is adding noise to your code.
static inline uint64_t hi64(u128 a) { return a >> 64; }
static inline uint64_t lo64(u128 a) { return (uint64_t)a; }
Then you can simply use hi64(x) and/or lo64(x).