4

On an ARM processor (HT32F1655), a specific section of registers requires word accesses. From the user manual:

Note that all peripheral registers in the AHB bus support only word access.

But gcc is generating some ldrb (load byte) and strb (store byte) instructions on packed structs. The structs look something like this:

typedef union {
    struct {
        uint32_t CKOUTSRC    : 3;    //!< CKOUT Clock Source Selection
        uint32_t             : 5;
        uint32_t PLLSRC      : 1;    //!< PLL Clock Source Selection
        uint32_t             : 2;
        uint32_t CKREFPRE    : 5;    //!< CK_REF Clock Prescaler Selection
        uint32_t             : 4;
        uint32_t URPRE       : 2;    //!< USART Clock Prescaler Selection
        uint32_t USBPRE      : 2;    //!< USB Clock Prescaler Selection
        uint32_t             : 5;
        uint32_t LPMOD       : 3;    //!< Lower Power Mode Status
    } __attribute__((packed)) __attribute__ ((aligned(4)));
    uint32_t word;
} reg;

Example usage:

(*(volatile uint32_t*)0x40088000)->CKOUTSRC = 1;

Produces something similar to:

 ldrb r2, [r1]
 orr r2, r2, #1
 strb r2, [r1]

When I need:

 ldr r2, [r1]
 orr r2, r2, #1
 str r2, [r1]

Is there any way to force gcc to only generate instructions that access the whole word? Some options (-mno-unaligned-access) make gcc generate word accesses, but only when the byte is not 4-word aligned.

There is a -mslow-bytes which should do the right thing, however it seems that option does not exist for arm-none-eabi-gcc.

Ideally, there would be a way to force this only on the affected structs.

Please, no "don't use bitfields" answers. I know the drawbacks, but I have the ability here to control the compiler(s) used, so I am not worried about portability.

Chaos
  • 335
  • 1
  • 6
Shade
  • 775
  • 3
  • 16
  • Postingt the code that does "make a packed bitmap to overlay onto a device register" would make this better and more clear. – chux - Reinstate Monica Feb 11 '17 at 02:16
  • Just decide on a good set of functions to provide the access you need and write the code using inline assembly where needed. – David Schwartz Feb 11 '17 at 02:34
  • @DavidSchwartz There is a non-trivial amount of these registers to map. The mapped structs work nicely, except for this issue, so I deem it worth putting a little effort in to trying to get this to completely work before moving to a different solution. – Shade Feb 11 '17 at 02:40
  • @Shade Just curious, what happens if you drop the "packed" attribute? – Koorosh Hajiani Feb 11 '17 at 04:42
  • https://godbolt.org/g/Yz85co is an example that doesn't exhibit your reported behaviour. It would be good to have the command line to see if `-fstrict-volatile-bitfields` provides a solution to your cases code generation issue. – artless noise Feb 13 '17 at 15:08
  • Actually bitfields are not even necessary to trigger this problem, e.g. `u32 lowbyte(u32 *p) { return *p & 0xff; }` becomes an `ldrb` instruction in recent compilers. Normally volatile would inhibit that though, which is what bitfields decide to just ignore (unless you use `-fstrict-volatile-bitfields` as others have mentioned). – Matthijs Mar 11 '17 at 02:41
  • You don't have to write `attribute` twice. You can shorten it with `__attribute__((packed, aligned(4)))`. – robsn May 04 '20 at 14:24

2 Answers2

7

What you're looking for is GCC's -fstrict-volatile-bitfields option:

This option should be used if accesses to volatile bit-fields (or other structure fields, although the compiler usually honors those types anyway) should use a single access of the width of the field's type, aligned to a natural alignment if possible. For example, targets with memory-mapped peripheral registers might require all such accesses to be 16 bits wide; with this flag you can declare all peripheral bit-fields as unsigned short (assuming short is 16 bits on these targets) to force GCC to use 16-bit accesses instead of, perhaps, a more efficient 32-bit access.

If this option is disabled, the compiler uses the most efficient instruction. In the previous example, that might be a 32-bit load instruction, even though that accesses bytes that do not contain any portion of the bit-field, or memory-mapped registers unrelated to the one being updated.

In some cases, such as when the packed attribute is applied to a structure field, it may not be possible to access the field with a single read or write that is correctly aligned for the target machine. In this case GCC falls back to generating multiple accesses rather than code that will fault or truncate the result at run time.

Note: Due to restrictions of the C/C++11 memory model, write accesses are not allowed to touch non bit-field members. It is therefore recommended to define all bits of the field's type as bit-field members.

The default value of this option is determined by the application binary interface for the target processor.

along with use of the volatile keyword. See: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • This works for bitfield structs. I had the same issue with non-bitfield structs to hw registers. I had to define an inline asm macro to solve the problem: `#define WRITE32(_reg, _val) { asm("str %0, [%1]" : : "r" (_val), "r" (&_reg)); }` – robsn May 04 '20 at 09:33
  • @robsn: then you weren't using volatile, which is absolutely mandatory for memory mapped hardwareregisters. – R.. GitHub STOP HELPING ICE May 04 '20 at 11:50
  • I did. It's a `typedef volatile struct __attribute__((packed)) { uint32_t registers here } stuff;` and a `static stuff * regs;` which I accessed with a macro `#define WRITE32(_reg, _val) (*(volatile uint32_t *)&_reg = _val)`. That did't work as it unfolded to a bunch of `strb`s. But only in some places. The inline asm works always. I don't use bitfields. – robsn May 04 '20 at 13:15
  • 1
    @robsn: Probably the packed broke it. Packed is almost always wrong. Also why the (nop, but dangerously wrong if it weren't a nop) pointer type cast in the macro? – R.. GitHub STOP HELPING ICE May 04 '20 at 13:21
  • That's interesting. I used packed because there are unused reserved spaces between registers that would break the address if optimized away. The pointer cast is one of my desperate trys to get single word accesses. By the way the OP uses packed too. I saw this in different tutorials but I have to admit that I'm new to this. – robsn May 04 '20 at 13:32
  • Got it. My struct was missing the aligned. It nicely works without any macros when using `typedef volatile struct __attribute__((packed, aligned(4)))`. I find this confusing because I assign the pointer to that struct to a perfectly 4 byte aligned address and packed should keep the internal structure which consists of purely uint32_t fields. /edit Now when I think about it: I assign the address at runtime from a register read which means that the compiler can not possibly know that the address will be 4 byte aligned. Guess it would work without the aligned if the address came from a const. – robsn May 04 '20 at 14:32
  • 1
    @robsn: Packed+aligned will probably fix the problem but I suspect you're mistaken in using packed to begin with, and could/should just get rid of all the attributes. Packed is almost always a mistake, and the fact that you need to access the members as aligned whole-word units is an even stronger suggestion that it's a mistake. – R.. GitHub STOP HELPING ICE May 04 '20 at 16:19
  • You're right, thank you. I removed all attributes and there is no difference in the resulting binary. – robsn May 04 '20 at 17:07
0

This is exactly what the volatile keyword was designed for.

o11c
  • 15,265
  • 4
  • 50
  • 75