How to safely access memory mapped hardware register from C or C++ language level?

Question

In C and C++ I usually access memory mapped hardware registers with the well known pattern:

typedef unsigned int uint32_t;
*((volatile uint32_t*)0xABCDEDCB) = value;

As far as I know, the only thing guaranteed by the C or C++ standard is that accesses to volatile variables are evaluated strictly according to the rules of the abstract machine.

How can I be sure that the compiler will not generate torn stores for the access for a 32-bit processor? For example the compiler is allowed to emit two 16-bit stores instead of a one 32-bit store, isn't it?
Are there any guarantees in this area made by gcc?

If the processor is 16-bit there is no reason why you cannot have a 32-bit variable, but the processor will be *unable* to make a 32-bit write. — Weather Vane, Dec 16 '16 at 18:22
@Weather Vane Thanks for the remark I forgot to add I am speaking of a 32-bit processor. — mrn, Dec 16 '16 at 18:30
There are some similar interesting questions around, such as [this](http://stackoverflow.com/questions/54188/are-c-reads-and-writes-of-an-int-atomic) and [this](http://stackoverflow.com/questions/9399026/arm-is-writing-reading-from-int-atomic). — Weather Vane, Dec 16 '16 at 18:42
Creating a pointer from the physical address only works if your CPU is in real mode. — stark, Dec 16 '16 at 19:55
@stark : "real mode" is only *a thing* in x86 devices, and then your comment is only true when a GPOS such as Linux or Windows handles memory management. Embedded systems are commonly *not* x86, often have no MMU and may or may not use an OS. When they do make used of an MMU, it is typically more deterministic than when using a GPOS, with greater control available at the application level. Moreover in any case and MMU will often be configured such that physical addr = virtual addr for the I/O register area. — Clifford, Dec 17 '16 at 09:43
What is `uint32`? Don't use homebrew names if the standard provides a correcponding name. Use `stdint.h` types. — too honest for this site, Dec 17 '16 at 16:14
And don't use magic numbers. Your vendor should provide read headers suitable for your compiler (for ARM and gcc this is (almost) always true). — too honest for this site, Dec 17 '16 at 16:34
@Olaf: For simple reads and writes of volatile objects, C++14 does guarantee atomicity. Conversely, compound assignments of volatile objects will never be atomic. — Ben Voigt, Dec 19 '16 at 02:50
@BenVoigt: 1) This is not true for C. 2) I don't think so. Please provide a reference to where the standard specifies it. It would break code which uses not only types which can be handled by a single CPU load/store /thus would make the validity of `volatile` implementation defined. It also would interfere with sume usages of atomics. — too honest for this site, Dec 19 '16 at 15:21
@Olaf: The quote I gave comes from section 1.9 (Program execution). It changed back in C++11, the old text was "At sequence points, volatile objects are stable in the sense that previous evaluations are complete and subsequent evaluations have not yet occurred." and the new is "Access to volatile objects are evaluated strictly according to the rules of the abstract machine." As you can see, this is a much much stronger requirement. — Ben Voigt, Dec 19 '16 at 15:32
@BenVoigt: Neither guarantees atomicity or a specific sequence of access to the bytes of such an object. You're sure you don't confuse atomicity and completeness? — too honest for this site, Dec 19 '16 at 15:43
@Olaf: "strictly according to the rules of the abstract machine" means that one write in the abstract machine is one write on the physical memory bus. Your interpretation is not strict. — Ben Voigt, Dec 19 '16 at 15:50
I have had this fail with GCC and other compilers to produce the desired (in this case 32 bit) store. It was C and most likely in the GCC 3.x days, I caused a bit re-write, and had been warned by a mentor years ago not to do this, so simply stopped doing it. It was then and probably is now very hard to get the compiler to mess up, but I dont see anything there that insures that the compiler will generate the exact instruction you want. Just use the exact instruction you want with inline or real assembly language. — old_timer, Dec 19 '16 at 19:28
Having these accesses go through an abstraction/function anyway allows you to quickly change from the volatile pointer which gets inlined or inline assembly or real assembly functions, or porting to an operating system, system calls, or when running on or against a simulator a place to put the simulation accesses, etc. Worth the cost of using a function. IMO. (the whole list here and others not listed also why I dont use volatile pointers) — old_timer, Dec 19 '16 at 19:31

score 0 · Answer 1 · answered Dec 16 '16 at 18:19

0

When speeking about MCUs, as far as I know there are no such guarantees. Even more, each case of accessing HW registers may be device specific and often may have its own sequence, rules and/or set of assembler instructions. And it depends on compiler implementation, too. The only thing here that works for me is reading datasheets concering concrete devices/compilers and follow the examples.

answered Dec 16 '16 at 18:19

Fedorov7890

1,173
13
28

I mean memory mapped hardware registers of course. Question corrected. – mrn Dec 16 '16 at 18:20
Datasheets and reference manuals don't contains information about compilers, etc. The only thing you can do use them as basis for Occam's Razor. – too honest for this site Dec 19 '16 at 15:59

rcgldr · Answer 2 · 2016-12-18T00:06:13.213

Microsoft comment about ISO compliant usage of volatile

"The volatile keyword in C++11 ISO Standard code is to be used only for hardware access"

http://msdn.microsoft.com/en-us/library/12a04hfd.aspx

At least in the case of Microsoft C++ (going back to Visual Studio 2005), an example of a pointer to volatile type is shown:

http://msdn.microsoft.com/en-us/library/145yc477.aspx

Another reference, in this case C, which also includes examples of pointers to volatile types.

"static volatile objects model memory-mapped I/O ports, and static const volatile objects model memory-mapped input ports"

http://en.cppreference.com/w/c/language/volatile

Operations on volatile types are not allowed to be reordered by compiler or hardware, a requirement for hardware memory mapped access. However operations to a combination of volatile and non-volatile types may end up with reordered operations on the non-volatile types, making them non thread safe (all inter thread sharing of variables would require all of them to be volatile to be thread safe). Even if two threads only share volatile types, there's still a data race issue (one thread reads just before the other thread writes).

Microsoft compilers have a non-portable (to other compilers) extension to volatile, that makes them thread safe (/volatile:ms - Microsoft specific, used by default except for ARM processors).

Back to the original question, in the case of GCC, you can have the compiler generate assembly code to verify the operation is safe.

Because "use via a pointer to a volatile type" doesn't trigger volatile semantics. — Ben Voigt, Dec 16 '16 at 20:18
That MSDN page is most definitely C++-only (it appears in a documentation area named "C++ Language Reference"), it doesn't actually tell you the behavior of dereferencing a pointer-to-`volatile`, only when a "name" is declared `volatile` (which is non-sensical), and it doesn't describe portable guarantees, only how one particular compiler works, which isn't the same compiler as the one the question is interested in. — Ben Voigt, Dec 16 '16 at 20:31
@BenVoigt - The MSDN page is C++, while the en.cppreference page is C, and both pages include examples of pointers to volatile types. The intention of such usage is clear, and "mm" points out that this issue with the standard has been identified. In this case, GCC can output assembly code which could be used to verify safe usage. — rcgldr, Dec 17 '16 at 21:25

score 0 · Answer 3 · answered Dec 16 '16 at 19:50

0

If you are really worried use inline assembler. A single assembler instruction will not return until completed.

Also you must ensure that the memory page you are writing to is not cached otherwise the write may not be all the way through. On ARM memory barriers may be necessary as well.

Volatile is just an instruction which tells the compiler to make no assuptions about the content of the memory since the value may be changed outside one's program but has no effect or read write ordering. Use memory barriers or atomics if this is an issue.

answered Dec 16 '16 at 19:50

doron

27,972
12
65
103

Why onARM a memory barrier is needed in addition to volatile access? – mrn Dec 16 '16 at 20:00
Volatile means reads and writes to an address always translates into a ldr or str, on arm though ldr and str may be out of order. The dmb ensures the ordering is as expected. – doron Dec 17 '16 at 01:25
Peripheral hardware registers are _non-cachable_, _strictly ordered_ at least for any reasonable platform. A normal assembly load/store instruction alone will not guarantee this either. There are typically no barriers needed, unless you use DMA or change registers which influence program flow. ARM provides a document for ARMv7M where barriers are required. – too honest for this site Dec 17 '16 at 16:19
And `volatile` is not an instruction, but a type qualifier. And atomics are very problematic because the e.g. on ARM can perform the access multiple times. For hardware registers this is an absolute **don't**! – too honest for this site Dec 17 '16 at 16:26
Whether memory is cached or not, depends on how the pages tables are set up (and yes for peripheral hardware the memory should not be cached). Out of order reads and writes are normally a product of caching, not the less the arm version of the kernel function readl adds a dmb to the ldr – doron Dec 17 '16 at 16:26
The vast majority of embedded systems don't have a PMMU, not even memory protection, thus your point about page-tables is pointless. Similar for out-of-order accesses, although modern ARM MCUs do support them. They are **not** bound to a cache necessarily; a simple write-buffer as ARMv7M provides can be sufficient. Do I have to add that there typically is no kernel (whichever that would be) either? Typically one targets for speed and should not add unnecessary barriers. – too honest for this site Dec 18 '16 at 19:00
I guess there is embedded and then there is embedded. – doron Dec 18 '16 at 22:18

Ben Voigt · Accepted Answer · 2016-12-16T20:25:00.080

-1

How can I be sure that the compiler will not generate torn stores for the access for a 32-bit processor? For example the compiler is allowed to emit two 16-bit stores instead of a one 32-bit store, isn't it?

Normally, the compiler can combine or split memory accesses under the as-if rule, as long as the observable behavior of the program is unchanged, since the observable behavior of access to ordinary objects is the effect on the object's value, and not the memory access itself.

However, accesses to volatile objects are part of the observable behavior of a program. Therefore the compiler can no longer combine or split memory transactions. In the section where the C++ Standard defines "observable behavior" it specifically says that "Access to volatile objects are evaluated strictly according to the rules of the abstract machine."

Please note that the code shown is still non-portable C++, because the C++ Standard only cares about whether the object accessed is volatile, and not about modifiers on the pointer used to form an lvalue for said access. You'd need to do something crazy like this example of placement-new, to force the existence of a volatile object:

 *(new volatile uint32 ((uint32*)0xABCDEDCB)) = value;

edited Dec 16 '16 at 20:25

answered Dec 16 '16 at 20:17

Ben Voigt

277,958
43
419
720

You are right about the problem of accessing objects via lvalue of volatile-qualified type, but this has already been identified and hopefully will be solved in the next C standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1956.htm . – mrn Dec 16 '16 at 23:01
Can you provide a reference to the standard resp. the defect report where they state `volatile` guarantees atomicity? AFAIS, it does not. But any reasonable implementation will use the largest possible width access (or provide intrinsics for that). – too honest for this site Dec 17 '16 at 16:24
@Olaf: For access to `volatile` objects, order of operations must be preserved. Two half accesses in sequence are not the same order as a single combined access. Neither would split access satisfy the requirement that "access to volatile objects are evaluated strictly according to the rules of the abstract machine". For most operations, there's a gap between the abstract machine and the machine instructions chosen to implement it, but for volatile object access, none is allowed. – Ben Voigt Dec 19 '16 at 02:39
@mrn: Unfortunately, that proposal is full of errors such as "the text in the C++ standard avoids referring to volatile objects". That wasn't true in C++03 and the phrase "volatile object" appears even more often in the current draft Standard. – Ben Voigt Dec 19 '16 at 02:48
@BenVoigt: The order must only be preserved between all `volatile` objects, not between `volatile` and other objects. But that is not the point! There is no guarantee a `volatile` object will be accessed **atomically**. That has noting to do with the oder! How would a 8 bit CPU guarantee e.g. `volatile long` to use atomic accesses? That's exactly the reason the standard provides atomics (which are optional, because not ever platform can support them in hardware). If you disagree, please provide a reference to where the standard requires `volatile` to imply atomicity. – too honest for this site Dec 19 '16 at 15:09
A note about memory-mapped hardware peripheral registers: Those are typically placed into memory such as they can be written with a single load/tore operation **for a single register**. These accesses typically guarantee atomicity by design. In C one has to use a type matching the word-with and typically **the compiler** will guarantee using a single access to load/store a value to/from a CPU register **but not for RMW(!)** (unless the CPU supports them like for compound assignment on some CPUs). Nevertheless this is not guaranteed by the language standard. – too honest for this site Dec 19 '16 at 15:19
@olaf which standard are you talking about? C++ didn't always guarantee the nature of access to volatile objects but it does now. – Ben Voigt Dec 19 '16 at 15:22
@Olaf: A direct and obvious consequence of "Access to volatile objects are evaluated strictly according to the rules of the abstract machine." is that the abstract machine cannot support operations which do not exist on the physical machine. So an 8-bit CPU cannot have a C++11 (or later) compiler with `volatile int`. – Ben Voigt Dec 19 '16 at 15:35
The question is tagged C and C++. There is only one standard for each. And if the AM cannot support operations "which do not exist on the physical machine" would make every 32 bit CPU unusable for a C or C++ implementation, as they cannot complain for e.g. `long long`. That is obviously nonsense, You seem to missinterpret what the AM does. – too honest for this site Dec 19 '16 at 15:44
@Olaf: Non-volatile objects are not accessed strictly, so under the as-if rule the compiler can implement a `long long` access as a series of smaller-than-64-bit memory operations. Only `volatile` objects are so restricted, and there will be no `volatile long long` on a CPU with 32-bit memory bus (please note that "32-bit CPU" refers to the width of the arithmetic, not the external data bus. In many cases they match but not all. For the cases I assume you are considering to be counterexamples -- Intel 586 and 686 -- they don't match.) – Ben Voigt Dec 19 '16 at 15:53
@BenVoigt: I still have to see proof for C++. And for C (which I'm more familar with) this is wrong. Anyway, this would make certain parts of `stdatomic` useless. – too honest for this site Dec 19 '16 at 15:57
@Olaf: As I've already pointed out (but you disbelieved me) one consequence of the rule is that RMW instructions cannot be used for `volatile` objects, the abstract machine requires implementing `volatile int v; v |= 8;` as a read followed by write. To access an atomic bitset instruction, or interlocked increment, or interlocked compare-exchange, you need those atomic types. Volatile gives you atomic read, and atomic write, but nothing more. – Ben Voigt Dec 19 '16 at 16:00
As I wrote: 1) I cannot follow your conclusion. 2) This is definitively not true for C and the question is tagged for both (unfortunately). 3) I wrote exactly that about RMW. Nevertheless, the AM can very well use RMW instructions if those are done by a read followed by a write (which is exactly what they do). There is absolutely no need to use seperate instructions and would defy optimisations. Note that on certain platforms the compiler even guarantees using these instructions for certain construct. They don't support atomics, so it is the only way for atomic register bitset/etc. – too honest for this site Dec 19 '16 at 16:20
@Olaf: The abstract machine has no RMW operations. **`volatile` DOES defy optimizations.** The compiler can define all the non-portable constructs (typically called "instrinsic functions") that it wants to provide access to those operations. It cannot remap `volatile` object access to anything except the exact semantics specified in the Standard to which the compiler conforms. And yes, in my answer I indicated that I am talking about the requirements made by C++. Readers should look to other information for the various versions of C. – Ben Voigt Dec 19 '16 at 16:23
@BenVoigt: What difference make e.g. (68K assembly) `ADDQ #2, i` or `MOVE i,D0 ; ADDQ #2, D0 MOVE D0, i` for the abstract machine? The standards don't disallow certain machine operations and a good compiler will still use the least and fastest **possible**. There is **no** difference in the observable behaviour! On ARMv7M it can even use bit-band memory, although this is not directly implemented in the CPU. The AM does not make **any** restrictions on the machine instructions to use - how could it as it is an **abstract** machine? – too honest for this site Dec 19 '16 at 17:21
And there is only one standard version of C! As there is only one for C++. C99, C90, etc. are not standard C. – too honest for this site Dec 19 '16 at 17:22
@Olaf If volatile does not guarantee atomic access (not visibility for other threads!), then what about sig_atomic_t - a type that the platform must provide, which combined with volatile can be used by signal handlers? – mrn Dec 19 '16 at 23:12
@mm Your logic is flawed. How is that reflexive? Just because the standard guarantees atomicity for that specific type, does not imply the reverse is true! `sig_atomic_t` can be implemented by special measure of an implementation. It only allows certain operations on that type, btw. and the support for `sig_atomic_t` is optional. The `volatile` qualifier is not. – too honest for this site Dec 20 '16 at 04:12
@Olaf I don't think my logic is flawed, note that I have asked a question and not stated anythng. And what about your answer here: http://stackoverflow.com/questions/32286078/global-variables-modified-by-main-and-accessed-by-isr?rq=1 ? – mrn Dec 20 '16 at 09:10
@mm: If you read that answer **carefully**, you might notice that I explicitly warn `volatile` does **not** gurantee atomicity. So I don't how this is opposed to what I write here. And I refered apparently to your former comment when I wrote "Your logic is flawed." Please read all comments carefully and think about them. You also want to consult the C (and C++) standard. Note that they are different languages and you cannot assume they behave the same even for identical syntax. – too honest for this site Dec 20 '16 at 16:13
@Olaf You don't warn in the answer, I read it carefully. Maybe you do in comments. In one of the comments you state: 8 bit accesses are atomic on any platform with CHAR_BIT ==8. Where is this guaranteed in the standard? – mrn Dec 20 '16 at 20:40
@mrn: How else would you interpret the Note at the end: "Due to popular demand, you should check your target if it actually performs the writes atomic. "? And the sentence before the note which clearly states: "This because the AVR is an 8 bit machine, thus the update and reads are not atomic."? And which machine with `CHAR_BIT == 8` do you know which uses more than one read or write for an 8 bit value? Maxbe you should do some research, we seem to talk at different levels. – too honest for this site Dec 20 '16 at 23:24
@Olaf Let's finish this discussion as it's definiately unproductive. – mrn Dec 21 '16 at 23:38

How to safely access memory mapped hardware register from C or C++ language level?

4 Answers4