29

I am looking at some old code from a school project, and in trying to compile it on my laptop I ran into some problems. It was originally written for an old 32 bit version of gcc. Anyway I was trying to convert some of the assembly over to 64 bit compatible code and hit a few snags.

Here is the original code:

pusha
pushl   %ds
pushl   %es
pushl   %fs
pushl   %gs
pushl   %ss

pusha is not valid in 64 bit mode. So what would be the proper way to do this in x86_64 assembly while in 64 bit mode?

There has got to be a reason why pusha is not valid in 64 bit mode, so I have a feeling manually pushing all the registers may not be a good idea.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Mr. Shickadance
  • 5,283
  • 9
  • 45
  • 61

5 Answers5

26

AMD needed some room to add new opcodes for REX prefixes and some other new instructions when they developed the 64-bit x86 extensions. They changed the meaning of some of the opcodes to those new instructions.

Several of the instructions were simply short-forms of existing instructions or were otherwise not necessary. PUSHA was one of the victims. It's not clear why they banned PUSHA though, it doesn't seem to overlap any new instruction opcodes. Perhaps they are reserved the PUSHA and POPA opcodes for future use, since they are completely redundant and won't be any faster and won't occur frequently enough in code to matter.

The order of PUSHA was the order of the instruction encoding: eax, ecx, edx, ebx, esp, ebp, esi, edi. Note that it redundantly pushed esp! You need to know esp to find the data it pushed!

If you are converting code from 64-bit the PUSHA code is no good anyway, you need to update it to push the new registers r8 thru r15. You also need to save and restore a much larger SSE state, xmm8 thru xmm15. Assuming you are going to clobber them.

If the interrupt handler code is simply a stub that forwards to C code, you don't need to save all of the registers. You can assume that the C compiler will generate code that will be preserving rbx, rbp, rsi, rdi, and r12 thru r15. You should only need to save and restore rax, rcx, rdx, and r8 thru r11. (Note: on Linux or other System V ABI platforms, the compiler will be preserving rbx, rbp, r12-r15, you can expect rsi and rdi clobbered).

The segment registers hold no value in long mode (if the interrupted thread is running in 32-bit compatibility mode you must preserve the segment registers, thanks ughoavgfhw). Actually, they got rid of most of the segmentation in long mode, but FS is still reserved for operating systems to use as a base address for thread local data. The register value itself doesn't matter, the base of FS and GS are set through MSRs 0xC0000100 and 0xC0000101. Assuming you won't be using FS you don't need to worry about it, just remember that any thread local data accessed by the C code could be using any random thread's TLS. Be careful of that because C runtime libraries use TLS for some functionality (example: strtok typically uses TLS).

Loading a value into FS or GS (even in user mode) will overwrite the FSBASE or GSBASE MSR. Since some operating systems use GS as "processor local" storage (they need a way to have a pointer to a structure for each CPU), they need to keep it somewhere that won't get clobbered by loading GS in user mode. To solve this problem, there are two MSRs reserved for the GSBASE register: one active one and one hidden one. In kernel mode, the kernel's GSBASE is held in the usual GSBASE MSR and the user mode base is in the other (hidden) GSBASE MSR. When context switching from kernel mode to a user mode context, and when saving a user mode context and entering kernel mode, the context switch code must execute the SWAPGS instruction, which swaps the values of the visible and hidden GSBASE MSR. Since the kernel's GSBASE is safely hidden in the other MSR in user mode, the user mode code can't clobber the kernel's GSBASE by loading a value into GS. When the CPU reenters kernel mode, the context save code will execute SWAPGS and restore the kernel's GSBASE.

doug65536
  • 6,562
  • 3
  • 43
  • 53
  • 1
    I'd guess AMD didn't want to write new microcode for a 64-bit `pusha` that saves twice as many registers that are each twice as wide. Interestingly though, AMD k7 and k8's `pusha` in 32-bit mode is twice as fast as 8 back-to-back `push` instructions (one per 4 clock throughput, according to Agner Fog's testing: https://agner.org/optimize/), so apparently it stores pairs of 32-bit registers at once to get around the 1 store per clock bottleneck. On Intel it's not worth using for performance, just code-size (8c throughput, 18 uops on Merom, Intel's first 64-bit P6 uarch.) – Peter Cordes Dec 13 '18 at 16:14
  • 1
    RIP POPA... (1978-2000) – Dima Rich Jan 16 '21 at 22:08
11

Learn from existing code that does this kind of thing. For example:

In fact, "manually pushing" the regs is the only way on AMD64 since PUSHA doesn't exist there. AMD64 isn't unique in this aspect - most non-x86 CPUs do require register-by-register saves/restores as well at some point.

But if you inspect the referenced sourcecode closely you'll find that not all interrupt handlers require to save/restore the entire register set, so there is room for optimizations.

Drew McGowen
  • 11,471
  • 1
  • 31
  • 57
FrankH.
  • 17,675
  • 3
  • 44
  • 63
9

pusha is not valid in 64-bit mode because it is redundant. Pushing each register individually is exactly the thing to do.

user200783
  • 13,722
  • 12
  • 69
  • 135
  • 4
    Also, you can't push segment registers in 64 bit mode. You need to copy them to another register first. `mov %ds,%eax; push %rax`. – ughoavgfhw Jul 26 '11 at 22:38
  • @ughoavgfhw The segment registers don't hold any meaningful values in long mode. All segments have zero base and no limit, except FS and GS, whose base addresses are control by MSR 0xC0000100 and 0xC0000101, intended for use as convenient thread local storage pointers. – doug65536 Jan 23 '13 at 19:59
  • 2
    @doug65536 It is still sometimes necessary to preserve them. For example, a 64-bit OS running 32-bit programs needs to save the segment registers when it gets an interrupt, since segmentation is used for that program. – ughoavgfhw Jan 23 '13 at 23:05
  • 1
    @ughoavgfhw Good point. I should have said, if you're running pure 64-bit code you can ignore the segment registers. – doug65536 Jan 23 '13 at 23:55
  • 1
    @doug65536 with the introduction of wrgsbase and wrfsbase instructions(which are accessible from user mode), it might be necessary to save gs and fs base, if the operating system wants to allow the application to change the segments (for some other use). – Ajay Brahmakshatriya Aug 14 '17 at 06:49
6

Hi it might not be the correct way to do it but one can create macros like

.macro pushaq
    push %rax
    push %rcx
    push %rdx
    push %rbx
    push %rbp
    push %rsi
    push %rdi
.endm # pushaq

and

.macro popaq
    pop %rdi
    pop %rsi
    pop %rbp
    pop %rbx
    pop %rdx
    pop %rcx
    pop %rax
.endm # popaq

and eventually add the other r8-15 registers if one needs to

somerandomdev49
  • 177
  • 1
  • 3
  • 11
baz
  • 1,317
  • 15
  • 10
  • If you're calling code generated by a C compiler, it will already preserve RBP and RBX (and R12-R15), assuming an x86-64 System V calling convention. If you're going to omit some registers from your save/restore, r8-r15 is not a good choice! This does sort of emulate `pusha`/`popa`, though. (Real `pusha` stores ESP, and `popa` skips over a gap. But that's rarely useful so makes sense to leave out.) – Peter Cordes May 29 '22 at 18:45
0

Take for example a short program where I was testing this today I wanted to do the same thing back up all the registers before I started to do syscalls that we just learned about. So I first attempted pusha and popa something I found in a old IA-32 Intel Architecture Software Developer's Manual. However it did not work. I have tested this manually and that works however.

#Author: Jonathan Lee
#Professor: Devin Cook
#CSC35
#4-27-23
#practicesyscall.asm

.intel_syntax noprefix

.data

Message:
        .asciz "Learning about system calls.\n"

.text

.global _start

_start:
        pusha   #facilitates saving the current general purpose registers only 32 bit processor
        mov rax, 1
        mov rdi, 1
        lea rsi, Message
        mov rdx, 30
        syscall
        popa    #facilitates restoring the registers only 32 bit processor
        mov rax, 60
        mov rdi, 0
        syscall

This is the result when it is compiled with x64:

practicesyscall.asm: Assembler messages:
practicesyscall.asm:19: Error: `pusha' is not supported in 64-bit mode
practicesyscall.asm:25: Error: `popa' is not supported in 64-bit mode

Without the pusha and popa mnemonics it works and this is the result:

Learning about system calls.

This will work for x32 mode: Intel Ref Document

However it does work manually if you wanted to try this method:

#Jonathan Lee
#CSC35
#Professor Cook
#3-31-23
#Practice NON credit assignment
.intel_syntax noprefix
.data

Intro:
        .ascii "Hello world my name is Jonathan Lee\n\n                                     SAC STATE STINGERS UP\n\n"
        .ascii "This program is to aid in learning more about stacks inside of assembly language.\n"
        .ascii "Register RSP is used to point to the top of the stack. If you use (push) it will load the stack and move the pointer (RSP) automatically.\n"
        .ascii "The stack counts down not up in assembly language.\n"
        .ascii "This is the current RSP point prior to loading a stack or program run:--------------------->  \0"

NewLine:
        .ascii "\n\0"

StackLoad:
    .ascii "Program will now load 1-14 into registers and push to the stack after it will write the register values and current RSP pointer value again.\n"
        .ascii "This will not use registers RSP/RDI they are in use for stack pointer and class subroutines.\n\0"

ZeroLoad:
    .ascii "Program will now load 0 into all registers and write register values after to show values are now stored in memory not registers.\n\0"

PopLoad:
        .ascii "Note RSP is still using the same value. The program will now pop the stack and reverse load the values into the registers, after will write register values.\n"
        .ascii "Last In First Out.\n\0"

RSPAddress:
    .ascii "This is the current next available RSP Pointer Memory Address on top of stack:------------->  \0"

PostStack:
    .ascii "This is the the current next available RSP Pointer Memory Address after stack is popped:--->  \0"

PreExit:
        .ascii "\nNote that the RSP is now pointing to the same memory address value as when the program started.\n\n\0"

.text
.global _start

_start:
        call ClearScreen
        lea rdi, NewLine
        call WriteString
        lea rdi, Intro
        call WriteString
        mov rdi, rsp
        call WriteHex
        lea rdi, NewLine
        call WriteString

        mov rax, 1
        mov rbx, 2
        mov rcx, 3
        mov rdx, 4
        mov rsi, 5
        mov rdi, 6
        mov rbp, 7
        #rsp index use
        mov r8, 8
        mov r9, 9
        mov r10, 10
        mov r11, 11
        mov r12, 12
        mov r13, 13
        mov r14, 14
        mov r15, 15

        lea rdi, StackLoad
        call WriteString
        push rax
        push rbx
        push rcx
        push rdx
        push rsi
        push rbp
        push r8
        push r9
        push r10
        push r11
        push r12
        push r13
        push r14
        push r15

        call WriteRegisters
        lea rdi, RSPAddress
        call WriteString
        mov rdi, rsp
        call WriteHex
        lea rdi, NewLine
        call WriteString
        lea rdi, ZeroLoad
        call WriteString

        mov rax, 0
        mov rbx, 0
        mov rcx, 0
        mov rdx, 0
        mov rsi, 0
        mov rbp, 0
        mov r8, 0
        mov r9, 0
        mov r10, 0
        mov r11, 0
        mov r12, 0
        mov r13, 0
        mov r14, 0
        mov r15, 0

        call WriteRegisterslea rdi, RSPAddress
        call WriteString
        mov rdi, rsp
        call WriteHex
        lea rdi, NewLine
        call WriteString

        lea rdi, PopLoad
        call WriteString


        pop rax
        pop rbx #Last In First Out
        pop rcx
        pop rdx
        pop rsi
        pop rbp
        pop r8
        pop r9
        pop r10
        pop r11
        pop r12
        pop r13
        pop r14#flip stack
        pop r15

        call WriteRegisters
        lea rdi, PostStack
        call WriteString
        mov rdi, rsp
        call WriteHex
        lea rdi, NewLine
        call WriteString
        lea rdi, PreExit
        call WriteString

        call Exit

And the result would be:

Hello world my name is Jonathan Lee

                                     SAC STATE STINGERS UP

This program is to aid in learning more about stacks inside of assembly language.
Register RSP is used to point to the top of the stack. If you use (push) it will load the stack and move the pointer (RSP) automatically.
The stack counts down not up in assembly language.
This is the current RSP point prior to loading a stack or program run:--------------------->  00007FFEC3675420
Program will now load 1-14 into registers and push to the stack after it will write the register values and current RSP pointer value again.
This will not use registers RSP/RDI they are in use for stack pointer and class subroutines.
 
RAX : 0000000000000001    R8  : 0000000000000008 
RBX : 0000000000000002    R9  : 0000000000000009 
RCX : 0000000000000003    R10 : 000000000000000A 
RDX : 0000000000000004    R11 : 000000000000000B 
RDI : 0000000000600AA5    R12 : 000000000000000C 
RSI : 0000000000000005    R13 : 000000000000000D 
RBP : 0000000000000007    R14 : 000000000000000E 
RSP : 00007FFEC36753A0    R15 : 000000000000000F

This is the current next available RSP Pointer Memory Address on top of stack:------------->  00007FFEC36753B0
Program will now load 0 into all registers and write register values after to show values are now stored in memory not registers.
 
RAX : 0000000000000000    R8  : 0000000000000000 
RBX : 0000000000000000    R9  : 0000000000000000 
RCX : 0000000000000000    R10 : 0000000000000000 
RDX : 0000000000000000    R11 : 0000000000000000 
RDI : 0000000000600B90    R12 : 0000000000000000 
RSI : 0000000000000000    R13 : 0000000000000000 
RBP : 0000000000000000    R14 : 0000000000000000 
RSP : 00007FFEC36753A0    R15 : 0000000000000000

This is the current next available RSP Pointer Memory Address on top of stack:------------->  00007FFEC36753B0
Note RSP is still using the same value. The program will now pop the stack and reverse load the values into the registers, after will write register values.
Last In First Out.
 
RAX : 000000000000000F    R8  : 0000000000000009 
RBX : 000000000000000E    R9  : 0000000000000008 
RCX : 000000000000000D    R10 : 0000000000000007 
RDX : 000000000000000C    R11 : 0000000000000005 
RDI : 0000000000600C13    R12 : 0000000000000004 
RSI : 000000000000000B    R13 : 0000000000000003 
RBP : 000000000000000A    R14 : 0000000000000002 
RSP : 00007FFEC3675410    R15 : 0000000000000001

This is the the current next available RSP Pointer Memory Address after stack is popped:--->  00007FFEC3675420

Note that the RSP is now pointing to the same memory address value as when the program started.

Long story short, you can just load the registers manually one by one into the stack and pop them after to restore it if so required.