4

I have an intel processor on my pc here, running in 64-bit mode, x86_64, where the registers have 64 bits in size, if I use the word register, or use some flag optimization the variable tends to be placed in the register, but if I put a value above 32 bits, the compiler complains, ie my int is not being 64 bits, why does this happen? It was not to be 64 bits, if the variable were placed in the register, and I did not even get its memory address? That is, it is not even placed on the stack.

#include <stdio.h>

int main(void) {

    register int x = 4294967296;

    return 0;
}

Compile:

gcc example.c -o example -Wall -Wextra

Output:

warning: overflow in implicit constant conversion [-Woverflow] register int x = 4294967296;

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Yuri Albuquerque
  • 474
  • 3
  • 14
  • If you use MS VC, even with x64 `sizeof (int)` is 32 bit. I'm not sure why Microsoft did it this way but I've seen this in other compilers (for other CPUs) before. If you need an integral type with 64 bits granted use e.g. `int64_t` instead. – Scheff's Cat Mar 11 '19 at 11:04
  • Please provide the value of `sizeof(int)` on your machine, maybe `sizeof(long int)`. – Yunnosch Mar 11 '19 at 11:04
  • 2
    @Scheff: because nobody wants an array of `int` to become 64-bit on a 64-bit machine, having twice the cache footprint. Every other ABI designer made the same choice as Microsoft, leaving `int` at 32 bit (because that's "big enough"), and only making `long` or sometimes only `long long` 64-bit in ABIs with 64-bit pointers. (I'm not aware of any C implementations where `int` = `int64_t`, even outside of x86-64. Even DEC Alpha (which was aggressively 64-bit and designed from scratch for 64-bit) I think still used 32-bit `int`. – Peter Cordes Mar 11 '19 at 11:06
  • @PeterCordes As you mention `long`: I once was somehow surprised as `sizeof (long)` with VC (x64) has 32 bits, but with gcc (in cygwin64) 64 bits. Something, I found worth to remember... ;-) – Scheff's Cat Mar 11 '19 at 11:08
  • If I remember right `register` is deprecated. [cppreference: register](https://en.cppreference.com/w/cpp/keyword/register) (...and seems to be "banned" starting from C++17.) – Scheff's Cat Mar 11 '19 at 11:09
  • Well, I know the type int64_t, I'm just starting to learn assembly and I wonder why I can not use the size of a register? If a register has 64 bits, why should the convention of int be 32 bits, have managed to understand me? Thankful. – Yuri Albuquerque Mar 11 '19 at 11:10
  • Please see the above comment from Peter Corders for a convincing reason. ;-) – Scheff's Cat Mar 11 '19 at 11:11
  • @Scheff: yep, you could argue either way which choice is better. ISO C requires `long` to be at least a 32-bit type, so some code uses `long` but really only needs a 32-bit type, not 64. Within a single ABI like x86-64 System V, it's semi-convenient to always have `long` be the same width as a pointer, but since portable code always needs to use `unsigned long long` or `uint64_t` or `uint_least64_t` or `uintptr_t` depending on the use-case, it might be a mistake for x86-64 System V to have chosen 64-bit `long`. Wider types for locals can sometimes save sign-extending insns when indexing... – Peter Cordes Mar 11 '19 at 11:12
  • Well, sorry for the confusion, is that I do not have much experience with C. I did the test here I compiled and generated the binary, then I looked with the objdump, and I saw that in fact the variable x, is being used by a 32-bit register , in which case the compiler chose esi – Yuri Albuquerque Mar 11 '19 at 11:21
  • Have you tried this as well without the `register` keyword? I believe the compiler will in any way try to hold as much as possible in registers. If I understood it right, with `register` the variable has no linkage. Hence, address operator is explicitly forbidden even if compiler has to ignore the `register` hint. There is a nice explanation on cppreference: [Storage-class specifiers (C)](https://en.cppreference.com/w/c/language/storage_duration) – Scheff's Cat Mar 11 '19 at 11:25
  • @Scheff: turned my comments into an answer, now that the OP has clarified what they hoped would happen. – Peter Cordes Mar 11 '19 at 11:32
  • @PeterCordes I came to the conclusion that OP had the impression `register` in C/C++ would mean something like "allocate register" what's of course not what it does mean. Hence, the expectation to get a 64 bit register in x64. (Btw. I've noticed your answer and you got my upvote.) ;-) – Scheff's Cat Mar 11 '19 at 11:36
  • Thanks a lot guys, for the answers, now yes, I managed to understand, as you might realize to starting still in the C language, I will study ABI x86-64 System V. Again thank you for the help, :) – Yuri Albuquerque Mar 11 '19 at 11:36

3 Answers3

11

The C register keyword does nothing to override the width of C type you chose. int is a 32-bit type in all 32 and 64-bit x86 ABIs, including x86-64 System V and Windows x64. (long is also 32-bit on Windows x64, but 64-bit on Linux / Mac / everything else x86-641.)

A register int is still an int, subject to all the limits of INT_MAX, INT_MIN, and signed overflow being undefined behaviour. It doesn't turn your C source into a portable assembly language.

Using register just tells the compiler to stop you from taking the variable's address, so even in debug mode (with minimal optimization) a very naive compiler can keep the variable in (the low half of) a register without running into any surprises later in the function. (Of course modern compilers don't need this help normally, but for some register actually does have an effect in debug mode.)


If a register has 64 bits, why should the convention of int be 32 bits

int is 32-bit because nobody wants an array of int to become 64-bit on a 64-bit machine, having twice the cache footprint.

A very few C implementations have int = int64_t (for example on some Cray machines, I think I've read), but it's extremely rare even outside of x86-64 where 32-bit is the "natural" and most efficient operand-size for machine code. Even DEC Alpha (which was aggressively 64-bit and designed from scratch for 64-bit) I think still used 32-bit int.

Making int 32-bit when growing from 16 to 32-bit machines made sense, because 16-bit is "too small" sometimes. (But remember that ISO C only guarantees that int is at least 16 bits. If you need more than that, in a truly portable program you'd better use long or int_least32_t.)

But 32 bits is "big enough" for most programs, and 64-bit machines always have fast 32-bit integers, so int stayed 32-bit when moving from 32 to 64-bit machines.

On some machines, 16-bit integers aren't very well supported. e.g. implementing the wrapping to 16 bits with uint16_t on MIPS would require extra AND-immediate instructions. So making int a 16-bit type would have been a poor choice there.

On x86 you could just use 16-bit operand size, and use movzx instead of mov when copying, but it's "normal" for int to be 32-bit on 32-bit machines so x86 32-bit ABIs all chose that.

When ISAs were extended from 32 to 64-bit, there was zero performance reason to make int wider, unlike the 16->32 case. (Also in that case, short stayed 16-bit so there was a typename for both 16 and 32-bit integers, even before C99 stdint.h existed).

On x86-64, the default operand-size is still 32-bit; mov rax, rcx takes an extra prefix byte (REX.W) vs. mov eax, ecx, so 32-bit is slightly more efficient. Also, 64-bit multiply was slightly slower on some CPUs, and 64-bit division is significantly slower than 32-bit even on current Intel CPUs. The advantages of using 32bit registers/instructions in x86-64


Also, compilers need a primitive type for int32_t, if they want to provide the optional int32_t at all. (The fixed-width 2's complement types are optional, unlike the int_least32_t and so on which isn't guaranteed to be 2's complement or free of padding.)

Compilers with 16-bit short and 64-bit int could have an implementation-specific type name like __int32 that they use as the typedef for int32_t / uint32_t, so this argument isn't a total showstopper. But it would be weird.

When growing from 16 to 32, it made sense to change int to be wider than the ISO C minimum, because you still have short as a name for 16-bit. (This argument isn't super great because you do have long as a name for 32-bit integers on 32-bit systems.)

But when growing to 64-bit, you want some type to be a 32-bit integer type. (And long can't be narrower than int). char / short / int / long (or long long) covers all 4 possible operand-sizes. int32_t isn't guaranteed to be available on all systems, so expecting everyone to use that if they want 32-bit signed integers is not a viable option for portable code.


Footnote 1:

You could argue either way whether it's better for long to be a 32 or 64-bit type. Microsoft's choice of keeping it 32-bit meant that struct layouts using long might not change between 32 and 64-bit code (but they would if they included pointers).

ISO C requires long to be at least a 32-bit type (actually they define it in terms of the min and max value that can be represented, but elsewhere they do require that integer types are binary integers with optional padding).

Anyway, some code uses long because it needs a 32-bit type, but it doesn't need 64; in many cases more bits isn't better, they're just not needed.

Within a single ABI like x86-64 System V, it's semi-convenient to always have long be the same width as a pointer, but since portable code always needs to use unsigned long long or uint64_t or uint_least64_t or uintptr_t depending on the use-case, it might be a mistake for x86-64 System V to have chosen 64-bit long.

OTOH, wider types for locals can sometimes save instructions by avoiding sign-extending when indexing a pointer, but the fact that signed overflow is undefined behaviour often lets compilers widen int in the asm when convenient.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Nice, your response was very complete, grateful. :) – Yuri Albuquerque Mar 11 '19 at 11:38
  • Re: "free to keep the variable in ... a register" -- it's actually sort of the other way around: `register` is a hint that a variable will be used often, so putting it in a register could improve performance. That's pretty much a no-op these days: compilers are much better at register optimization than most programmers, so they ignore `register`. – Pete Becker Mar 11 '19 at 12:36
  • @PeteBecker: I phrased it that way because the only effect in modern compilers is at `-O0`, when normal variables are stored / reloaded between every C statement (for consistent debugging: [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](//stackoverflow.com/q/53366394)), but `register` variables are excluded from this. Of course with any level of optimization enabled, modern compilers do dataflow analysis and will keep everything in registers except when they need to spill it (e.g. across a function call, or if they run out of registers). – Peter Cordes Mar 11 '19 at 12:39
  • The compiler is **always** free to keep variables in registers unless they're marked `volatile`. Your comment is about **compiler-specific** behavior, not about language requirements. – Pete Becker Mar 11 '19 at 12:44
  • @PeteBecker: Fair point. The `-O0` behaviour of modern compilers is self-imposed (and quite similar to treating everything as `volatile`). I will attempt to weasel out of that by noting that this is a `[gcc]` question. :P And that it's a useful (over)simplification of what `register` does on most real C implementations anyone's likely to use these days. – Peter Cordes Mar 11 '19 at 12:47
  • 2
    And, of course, we've both ignored the **real** side effect of `register` -- you can't take the address of a variable marked `register`. – Pete Becker Mar 11 '19 at 12:49
2

The register keyword is irrelevant here; the int data type remains 32bit on the platform in question.

#include <stdio.h>
#include <stdint.h>

int main(void) 
{
    int64_t x = 4294967296;
    return 0;
}

The register is also irrelevant because it will almost certainly be ignored. The compiler will use register storage when it is able and advantageous regardless of the explicit directive, and equally it may not.

Clifford
  • 88,407
  • 13
  • 85
  • 165
1

On your platform int is probably 32 bit.

You are asking the compiler to place a value (4294967296 = 0x100000000) that cannot fit in 32 bits into a register.

register int x = 4294967296;

But anyway, even if you remove the register keyword, the compiler would still complain for the same reason.

Jabberwocky
  • 48,281
  • 17
  • 65
  • 115