Why address-of operator ('&') can be used with objects that are declared with the register storage class specifier in C++?

Question

In C programming language we are not allowed to use address-of operator(&) with variables which are declared with register storage class specifier.

It gives error: address of register variable ‘var_name’ requested

But if we make a c++ program and perform the same task (i.e use the & with register storage variable) it doesn't gives us any error.

eg.

#include <iostream>
using namespace std;
int main()
{
    register int a;
    int * ptr;
    a = 5;
    ptr = &a;
    cout << ptr << endl;
    return 0;
}

Output :-

0x7ffcfed93624

Well this must be an extra feature of C++, but the question is on the difference between register class storage in C and C++.

`register` is a hint in C++, so compiler is authorized to ignore it. If you take its address then compiler ignore it. — Jean-Baptiste Yunès, Dec 06 '15 at 14:42
You are not making a reference to a register variable; you are making a pointer to a register variable. On many platforms, registers do not belong to the addressing space and thus don't have addresses, in other words, you can't point to a register. When making a pointer to a register variable, compilers may put the variable onto the stack or assign the variable to a temporary memory location, so that a pointer can be created. — Thomas Matthews, Dec 06 '15 at 20:08

Alan Stokes · Accepted Answer · 2015-12-06T15:02:58.653

21

The restriction on taking the address was deliberately removed in C++ - there was no benefit to it, and it made the language more complicated. (E.g. what would happen if you bound a reference to a register variable?)

The register keyword hasn't been much use for many years - compilers are very good at figuring out what to put in registers by themselves. Indeed in C++ the keyword is currently deprecated and will eventually be removed.

edited Dec 06 '15 at 15:02

answered Dec 06 '15 at 14:42

Alan Stokes

18,815
3
45
64

1

It's not just deprecated, it has no effect. – HolyBlackCat Dec 06 '15 at 15:21
@HolyBlackCat It almost certainly has no effect on any current compiler, but I'm unconvinced that compilers are actually barred from taking any notice of it. – Alan Stokes Dec 06 '15 at 15:26
I can't comment on whether that was the standard's reasoning, but the fact remains that binding the address invalidates the idea that the data is remaining in a register. "What would happen" is nothing - there is no address, you can't have it. The decision was to *silently* ignore the erroneous hint just as C would if you declared more `register` variables than the the architecture actually had registers. To `&` a register, though, *is* an error in that it logically cannot work. – sqykly Dec 06 '15 at 15:31
Analogy - you can't have the address of an `inline` function. It doesn't make sense for a function that is truly `inline` to have an address. These designations are both optimization hints; is the compiler free to drop the optimization without error just so you can have an address for an `inline`? – sqykly Dec 06 '15 at 15:35
See this: http://en.cppreference.com/w/cpp/keyword/register Since C++17 it has no effect. Before C++17 it was a way to explicitly say that the varialbe needs automatic storage duration. It is always redundant, so, if fact, no effect too. – HolyBlackCat Dec 06 '15 at 15:36
1

@sqykly But you can take the address of an `inline` function in C++. All it means is that there must be at least one out of line definition, even if all call sites are inlined. Similarly taking the address of a register variable doesn't necessarily prevent the compiler putting it in register, as long as the required semantics are preserved. – Alan Stokes Dec 06 '15 at 15:39
3

@alanstokes registers do not have addresses. In order to get an address, the compiler needs to put it in memory somewhere. In order to make that mean anything, it also has to update that value in memory at any point the receiver of said address can access it and read it back afterwards. This sounds a lot like just ignoring the hint, no? – sqykly Dec 06 '15 at 15:46
You're right about `inline` though. Bad analogy. What I am trying to get at is that keyword's meaning changed in C++ to being meaningless and deprecated. If it was still a relevant thing, the restriction would still have made sense, which is not the impression I would get from the first paragraph here. – sqykly Dec 06 '15 at 15:55
@sqykly The restriction was removed long before the deprecation; it remained a hint to the compiler, which the compiler was free to ignore. – Alan Stokes Dec 06 '15 at 15:57
Taking the address of a `register` is every bit as illogical and contradictory as assigning to a `const`. The restriction is lifted because the qualifier made little sense, not because the restriction itself made little sense. Please fix the first paragraph of the answer to reflect this, because it sounds a lot like "there is no reason for this restriction at all" at present. – sqykly Dec 06 '15 at 16:17
1

@sqykly Please feel free to add your own answer. I disagree with your argument. – Alan Stokes Dec 06 '15 at 16:21
3

I shall play devil's advocate: some implementations - embedded ones, mostly- have `register` as an actual command, not just a hint. This is because you often need to have control over registers; programmers _know_ what they're doing. I asked a question whether that was allowed and I'll shamelessly give a [link to it](http://stackoverflow.com/questions/28928674/can-an-implementation-consider-hints-as-actual-statements). – edmz Dec 06 '15 at 19:15
@black Interesting. What happens then if you take the address? (Your question refers to C rather than C++.) – Alan Stokes Dec 06 '15 at 19:18
@AlanStokes Good point. We'd need to know how the implementation handles that specifically. I guess it could ignore it and issue a warning, fail to compile it(despite it couldn't) altogether or behave in a completely different way. Strict standard conformance is hard to achieve, often because it goes against everyday's needs which are just more important than a piece of paper. – edmz Dec 06 '15 at 19:29
1

On the IAR Embedded Workbench compiler, with optimizations off, the compiler actually listens to the `register` keyword an uses registers for the variables. Without the `register` keyword, the compiler will only use registers as it sees fit with higher optimization levels. – Thomas Matthews Dec 06 '15 at 20:02
@Thomas What happens if the address of such a variable is taken? – Alan Stokes Dec 06 '15 at 20:04
If the address of a register variable is taken, the compiler must allocate memory for the variable and copy the value into memory (or treat the variable as if it didn't have the `register` prefix). In many platforms, registers don't exist in the memory map, so they don't have an address and you can't point to them. Thus the `register` keyword is ignored. – Thomas Matthews Dec 06 '15 at 20:18
@sqykly Note that `inline` keyword and inlining optimization are not directly related, and `inline` keyword is not an optimization hint. The keyword means "this function may be defined in several compilation units", and without it linker would throw a fit when seeing duplicate symbols. – hyde Dec 06 '15 at 21:41
2

The register keyword in C states that you can't take the address of the variable, which in turn implies that the variable will not be aliased by any pointer. This may allow the compiler to perform optimizations that it would not otherwise be able to do, *even if* it doesn't put the variable in a register. (Yes, modern compilers are very smart, but providing them with more information allows them to be *even smarter*. The programmer *does* have information they lack.) Of course, if C++ *does* allow you to take the address of the variable, then register is less useful in that language. – Ray Dec 07 '15 at 00:53
@Ray The same assumption can be made about any local variable whose address is never taken, a static property which it is trivial for the compiler to check. – Alan Stokes Dec 07 '15 at 07:38
1

@sqykly "*registers do not have addresses*" Are you saying there has never ever existed a CPU whose registers had addresses? (If so, you are [wrong](http://skana.tripod.com/electronics/8051_sfrs.htm).) Or do you mean that registers do not *necessarily* have addresses? – David Schwartz Dec 07 '15 at 18:49
@davidschwartz I admit that I have never seen another system that memory maps *its own* GPRs in *its own* address space. A `register` variable (if the hint is honored) in one of those (unfathomably rare) registers *still* can't be meaningful. You would have to somehow guarantee that the variable continues to reside in that register over the entire lifetime of the pointer to it, or the pointer would, at some point, magically (to its user) come to point to some other variable. To avoid this event is beyond the scope of C's defined memory management, deterministically producing UDB. – sqykly Dec 07 '15 at 21:26
@davidschwartz and come on. That feature is useless and obscure. I will have to see more docs on this, but my gut is that they are designed to simplify address modes and the MMU doesn't even know about it. Even without that, though, you can't possibly manage that pointer in C. – sqykly Dec 07 '15 at 21:45
@davidschwartz looks like you really can have a pointer to register on 8051 as long as the compiler never uses the 16 bit `DPTR` or tests every pointer for its pointee's address space at runtime (`DPTR` only points external RAM) or requires near and far to be specified for every pointer (same as a compiler-specific pointer to register type, to which C standard is not applicable). You still can't *use* it; among other reasons, dereference requires putting the pointer *in a register* as is typical, i.e. potentially overwriting the pointee. Further argument does not belong on *this* answer. – sqykly Dec 07 '15 at 23:14
@DavidSchwartz See also the TMS9900. – Alan Stokes Dec 07 '15 at 23:26
TMS has its general purpose registers in external memory, so maybe our disagreement is based on a difference in our definitions of a register. I am inclined to say that the TMS *has no* general purpose registers. The `register` optimization would certainly have no effect regardless of what we call them because using them is the same cost as using external RAM. The TMS *has* registers obviously in its ALU and such, but they aren't exposed through the instruction set. You can't manually load/store, you do it implicitly for every operand. Not what a C `register` intends. – sqykly Dec 12 '15 at 05:33

sqykly · Answer 2 · 2015-12-07T21:55:37.847

The register storage class originally hinted to the compiler that the variable so qualified was to be used so frequently that keeping its value in memory would be a performance drawback. The vast majority of CPU architectures (maybe not SPARC? Not even certain there's a counterexample) cannot perform any operation between two variables without first loading one or both from memory into its registers. Loading variables from memory into registers and writing them back to memory once operated upon takes many times more CPU cycles than the operations themselves. Thus, if a variable is used frequently, one can achieve a performance gain by setting aside a register for it and not bothering with memory at all.

Doing so, however, has a variety of requirements. Many are different for every CPU architecture:

All processors have a fixed number of registers, but each processor model has a different number. In the 80s you might have had 4 that could reasonably be used for a register variable.
Most processors do not support the use of every register for every instruction. In the 80s it was not uncommon to have only one register that you could use for addition and subtraction, and you probably couldn't use that same register as a pointer.
Calling conventions dictated differing sets of registers that could be expected to be overwritten by subroutines i.e. function calls.
The size of a register differs between processors, so there are cases where a register variable will not fit in a register.

Because C is intended to be independent of platform, these restrictions could not be enforced by the standard. In other words, while it may be impossible to compile a procedure with 20 register variables for a system that only had 4 machine registers, the C program itself should not be "wrong", as there is no logical reason a machine cannot have 20 registers. Thus, the register storage class was always just a hint that the compiler could ignore if the specific target platform would not support it.

The inability to reference a register is different. A register is specifically not kept updated in memory and not kept current if changes are made to memory; that's the whole point of the storage class. Since they are not intended to have a guaranteed representation in memory, they cannot logically have an address in memory that will be meaningful to external code that may obtain the pointer. Registers have no address to their own CPU, and they almost never have an address accessible to any coprocessor. Therefore, any attempt to obtain a reference to a register is always a mistake. The C standard could comfortably enforce this rule.

As computing evolved, however, some trends developed that weakened the purpose of the register storage class itself:

Processors came with greater numbers of registers. Today you probably have at least 16, and they can probably all be used interchangeably for most purposes.
Multi-core processors and distributed code execution has become very common; only one core has access to any one register and they never share without involving memory anyway.
Algorithms for allocating registers to variables became very effective.

Indeed, compilers are now so good at allocating variables to registers that they will usually do a better job at optimization than any human. They certainly know which ones you are using most frequently without you telling them. It would be more complicated for the compiler (i.e. not for the standard or for the programmer) to produce these optimizations if they were required to honor your manual register hints. It became increasingly common for compilers to categorically ignore them. By the time C++ existed, it was obsolete. It is included in the standard for backward compatibility, to keep C++ as close as possible to a proper superset of C. The requirements of a compiler to honor the hint and thus the requirements to enforce the conditions under which the hint could be honored were weakened accordingly. Today, the storage class itself is deprecated.

Therefore, even though it is still the case today (and will be until computers don't even have registers) that you cannot logically have a reference to a CPU register, the expectation that the register storage class will be honored is so long gone that it is unreasonable for the standard to require compilers to require you to be logical in your use of it.

Nit: I think you mean C++ is a superset of C (which technically it isn't, although a lot of work has gone into minimising incompatibilities) — Alan Stokes, Dec 06 '15 at 19:10
x86 processors can do string operations on two memory operands: see `CMPS` and `MOVS` instructions (opcodes 0xA4 through 0xA7). — Ruslan, Dec 06 '15 at 19:44
@ruslan `movs` and `cmps` are rarely used by compilers now, but they do involve multiple registers: `ds`, `es`, `ecx`/`rcx`, `esi`/`rsi`, and `edi`/`rdi`. `cmps` involves `eflags` as well. Typically `*cx`, `*di` and `*si` are going to need to be loaded from memory before and sometimes after use. I guess the immediate-to-memory address mode is actually a good counterexample, though. Will fix. 65x has block move instructions like `movs` too, memory to memory with 3 implicit registers. — sqykly, Dec 06 '15 at 23:48
@ruslan scratch imm to mem; that's not an operation *between two variables* as stated. — sqykly, Dec 06 '15 at 23:58
@sqykly I wouldn't say it's rarely used. When you do a `memcpy()`, gcc uses `movsd` on my 32-bit system, and it's not a call to libc, it's inlined. — Ruslan, Dec 07 '15 at 06:07

rcgldr · Answer 3 · 2015-12-07T18:37:08.377

A referenced register would be the register itself. If the calling function passed ESI as a referenced parameter, then the called function would use ESI as the parameter. As pointed out by Alan Stokes, the issue is if another function also calls the same function, but this time with EDI as the same referenced parameter.

In order for this to work, two overloaded like instances of the called function would need to be created, one taking ESI as a parameter, one taking EDI as a parameter. I don't know if any actual C++ compiler actually implements such an optimization in general, but that is how this could be done.

One example of register by reference is the way std::swap() gets optimized (both parameters are references), which often ends up as inlined code. Sometimes no swap takes place: for example, std::swap(a, b), no swap takes place, instead the sense of a and b is swapped in the code that follows (references to what was a become references to b and vice versa).

Otherwise, a reference parameter will force the variable to be located in memory instead of a register.

How would that work if a different calling function was using EDI? Or are you thinking of inlined calls? — Alan Stokes, Dec 06 '15 at 19:12

Why address-of operator ('&') can be used with objects that are declared with the register storage class specifier in C++?

3 Answers3