ARM: Why do I need to push/pop two registers at function calls?

Question

I understand that I need to push the Link Register at the beginning of a function call, and pop that value to the Program Couter before returning, so that the execution can carry one from where it was before the function call.

What I don't understand is why most people do this by adding an extra register to the push/pop. For instance:

push {ip, lr}
...
pop {ip, pc}

For instance, here's a Hello World in ARM, provided by the official ARM blog:

.syntax unified

    @ --------------------------------
    .global main
main:
    @ Stack the return address (lr) in addition to a dummy register (ip) to
    @ keep the stack 8-byte aligned.
    push    {ip, lr}

    @ Load the argument and perform the call. This is like 'printf("...")' in C.
    ldr     r0, =message
    bl      printf

    @ Exit from 'main'. This is like 'return 0' in C.
    mov     r0, #0      @ Return 0.
    @ Pop the dummy ip to reverse our alignment fix, and pop the original lr
    @ value directly into pc — the Program Counter — to return.
    pop     {ip, pc}

    @ --------------------------------
    @ Data for the printf calls. The GNU assembler's ".asciz" directive
    @ automatically adds a NULL character termination.
message:
    .asciz  "Hello, world.\n"

Question 1: what's the reason for the "dummy register" as they call it? Why not simply push{lr} and pop{pc}? They say it's to keep the stack 8-byte aligned, but ain't the stack 4-byte aligned?

Question 2: what register is "ip" (i.e., r7 or what?)

I linked to an ARM blog post where they recommend this two-register pattern. Please check it out, there's some code there. — Daniel Scocco, Apr 20 '13 at 12:15
using links is discouraged on SO, because the link may not last as long as the question (and/or they simply remove the question because it uses links rather than have the discussion here). — old_timer, Apr 20 '13 at 12:16
ahh, so the link answers your question. You are allowed to post that answer yourself. and close out this question. — old_timer, Apr 20 '13 at 12:17
in addition to a dummy register (ip) to keep the stack 8-byte aligned — old_timer, Apr 20 '13 at 12:19
Right, but how does that work? As far as I know the stack has a 4-byte alignment. In fact when I don't use the dummy register it works fine. So my question is still open. — Daniel Scocco, Apr 20 '13 at 12:19
see mikes answer below, it has to do with 64 bit busses, if you keep the alignment, even if you are moving 32 more bits back and forth it is the same speed or faster, it takes 2 or three extra memory transactions if you are not aligned. A 64 bit aligned push or pop (2 registers) is one memory transaction, a 64 bit unaligned push or pop is two memory transactions.a 128 bit aligned pop is 1 memory transaction (with a length of 2) a 128 bit unaligned pop is 3 memory transactions, 1 32 bit, 1 64 bit and 1 32 bit. The desire is for the compiler to always align (and hope the bootstrap does as well). — old_timer, Apr 20 '13 at 14:10
if a 32 bit bus not 64 bit then the extra register adds an extra clock to the transaction, which is not that bad, not as much of a penalty as the not-64-bit-aligned transfers are on a 64 bit bus. I imagine there is a command line switch or perhaps if you select an ARMv4 as the target instead of the default maybe it doesnt do this. — old_timer, Apr 20 '13 at 14:12
even simpler answer, than others below have already pointed out "because arm said so". the arm eabi states 8 byte stack alignment, so the compilers now generate code to maintain this alignment (well sorta, I have seen at least one problem). — old_timer, Apr 22 '13 at 14:21
2 more recent duplicates of this: [Why ARM gcc push register r3 and lr into stack at the beginning of a function?](https://stackoverflow.com/q/32622762) and [Why is the stack pointer moved down 4 bytes greater than the stack frame size when compiling with arm-linux-gnueabi-gcc?](https://stackoverflow.com/q/22279911) I think the answers here covers everything sufficiently, though. — Peter Cordes, Mar 03 '22 at 06:32

auselen · Answer 1 · 2013-04-22T09:18:49.363

8

8-byte alignment is a requirement for interoperability between objects conforming AAPCS.

ARM has an advisory note on this subject:

ABI for the ARM® Architecture Advisory Note – SP must be 8-byte aligned on entry to AAPCS-conforming functions

Article mentions two reasons to use 8 byte alignment

Alignment fault or UNPREDICTABLE behavior. (Hardware / Architecture related reasons - LDRD / STRD could cause an Alignment Fault or show UNPREDICTABLE behavior on architectures other than ARMv7)
Application failure. (Compiler - Runtime assumption differences, they give va_start and va_arg as an example)

Of course this is all about public interfaces, if you are making a static executable with no additional linking you can align stack at 4 bytes.

edited Apr 22 '13 at 09:18

answered Apr 22 '13 at 09:09

auselen

27,577
7
73
114

Worth mentioning: the store 2 registers use case is so common that in armv8, which dropped `push` and `pop`, there are dedicated push pair and pop pair instructions `stp` and `ldp`: http://stackoverflow.com/questions/27941220/push-lr-and-pop-lr-in-arm-arch64 – Ciro Santilli OurBigBook.com Oct 14 '16 at 10:41
1

@CiroSantilli新疆再教育营六四事件法轮功郝海东: More like ARM64 dropped the CISCish push-variable-number-of-regs which used a bitmask to encode which registers to push. And kept + broadened the existing ldrd / strd (2 consecutive registers in ARM32, only one reg-number in the machine encoding) into `ldp` / `stp` (any 2 registers in either order). So you can still fairly-efficiently push multiple registers, and maintain 16-byte stack alignment while doing it, but without providing an instruction that does a variable amount of work (which might need microcode or something similar). – Peter Cordes Mar 03 '22 at 06:36

Mike Seymour · Accepted Answer · 2013-04-22T11:40:45.117

7

what's the reason for the "dummy register" as they call it? Why not simply push{lr} and pop{pc}? They say it's to keep the stack 8-byte aligned, but ain't the stack 4-byte aligned?

~~The stack only requires 4-byte alignment; but~~ if the data bus is 64 bits wide (as it is on many modern ARMs), it's more efficient to keep it at an 8-byte alignment. Then, for example, if you call a function that needs to stack two registers, that can be done in a single 64-bit write rather than two 32-bit writes.

UPDATE: Apparently it's not just for efficiency; it's a requirement of the official procedure call standard, as noted in the comments.

If you're targetting older 32-bit ARMs, then the extra stacked register might degrade performance slightly.

what register is "ip" (i.e., r7 or what?)

r12. See, for example, here for the full set of register aliases used by the procedure call standard.

edited Apr 22 '13 at 11:40

answered Apr 20 '13 at 12:32

Mike Seymour

249,747
28
448
644

2

This answer is misleading and dangerous. 8-byte alignment IS a requirement for all EABI compliant code, and not maintaining it on all external boundaries can lead to runtime failures - even worse, it can lead to runtime failures when built on certain versions of compilers executing on certain processors. – unixsmurf Apr 22 '13 at 07:28
3

Just echoing @unixsmurf's response. 5.2.1.2 of the AAPCS states "SP mod 8 = 0. The stack must be double-word aligned." for public interfaces. You really want to follow that all the time, unless you know what you're doing. ARM has a knowledge article on [8-byte stack alignment as well](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4127.html). – John Szakmeister Apr 22 '13 at 09:49
2

@unixsmurf: Sorry, my knowledge of the procedure call standard is a bit out of date; I didn't realise that 8-byte alignment was a requirement these days. I guess I'd better stop trying to answer questions about ARM. I've updated the answer to reflect that; hopefully it's acceptable now, but unfortunately I can't delete it as long as it's accepted. – Mike Seymour Apr 22 '13 at 11:28
At one point long ago (OABI), this was not required. With ARMv4 and `strd`, there was a requirement to have things 8 bytes aligned. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43518 It was not **always** required that a stack was 8 byte aligned. However, any modern system will have this. If you are coding in **only** assembler, then you do not have to follow an ABI (unless other things in your system require it). Mark the function as untraceable. – artless noise Jun 15 '22 at 20:32

score 4 · Answer 3 · edited May 23 '17 at 11:53

4

Since you want to store and recover them after you execute your function. On the function entrence it saves the ip and lr registers (named prolog). After finishing the function it assigns both (epilog) :

pc <- lr

ip <- old_ip

EDIT

Register r12 is also referred to as IP, and is used as an intra-procedure call scratch register, see also.

The convention is that the callee function can change ip,r0-r3 so you must restore them dependes on the calling convention

EDIT2: Why we might want the stack to be 8 aligned on ARM

If the stack is not eight-byte aligned the use of LDRD and STRD (load and store doubleword) might cause an alignment fault, depending on the target and configuration used.

Note that we have the same issue on X86, and on Mac OS we have 16 bytes alignment

edited May 23 '17 at 11:53

Community

1
1

answered Apr 20 '13 at 12:06

0x90

39,472
36
165
245

I know it does that. My question is why do most people use two registers at push/pop. Why not push {lr} and pop {pc} simply? – Daniel Scocco Apr 20 '13 at 12:07
since the language enables you push {list of registers}, and is one assembly instruction, assuming you want to store `r0-r15` you can do it in 32 bit code length or 15*32bit code length, what is better ? http://en.wikipedia.org/wiki/KISS_principle – 0x90 Apr 20 '13 at 12:09
You didn't understand my question. I re-edited it, check it out. – Daniel Scocco Apr 20 '13 at 12:14
Register "r12" is also referred to as "IP", and is used as an intra-procedure call scratch register. http://forums.arm.com/index.php?/topic/12986-about-r12/; http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf – 0x90 Apr 20 '13 at 12:17
the convention is that the callee function can change `ip,r0-r3` – 0x90 Apr 20 '13 at 12:19
Thanks, I didn't know it was r12. I still don't get why the ARM guys recommend passing a dummy register on push/pop though. Check my edited question again please. – Daniel Scocco Apr 20 '13 at 12:20
7

@DanielS: The reason is, that the ARM EABI specifies that the stack stays 64bit aligned, otherweise ldrd/strd could not be used on the stack. Also, most implementations I've seen so far are able to do 64-bit wide memory accesses in the same time as 32bit, if the addresses are 64bit aligned. Adding ip (or any other register) in that case just saves the code from having to do the alignment explicitly (via add and sub). If the code would only push/pop lr/pc then the stack for printf would not be aligned anymore and it might crash when calling ldrd. – Nico Erfurth Apr 20 '13 at 18:07
2

@Masta79: why not add your comment as an answer? It is the correct explanation and none of the existing ones is complete. – unixsmurf Apr 22 '13 at 07:32

ARM: Why do I need to push/pop two registers at function calls?

3 Answers3

Linked

Related