5

I've just started to learn assembly in school, and we're starting to dive into registers and how to use them. A point that I can't seem to understand is how does the instruction pointer get the address of the next instruction? For instance take the following code:

nop
pushl    %ebp
movl    %esp, %ebp
subl    $4, %esp

In the previous code the instruction pointer gets incremented after each line, and I'd like to know how does it know which instruction to do next (i.e mov,sub,push,...etc.)? Are all the previous instruction first loaded into RAM when we first run the program and the address of the first instruction (nop in this case) gets automatically loaded into eip, then it just goes over them one by one? Or am I missing something?

Any help is appreciated.

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
GamefanA
  • 1,555
  • 2
  • 16
  • 23
  • 1
    A jmp happens to some starting point (e.g., the entry point of your program). From there execution proceeds through successive addresses until/unless another `jmp` instruction is executed. – Jerry Coffin Sep 18 '14 at 23:11
  • so was i right when i said that when we first click on the program to run it all the instructions are loaded to ram? – GamefanA Sep 18 '14 at 23:18
  • 1
    No, not necessarily. Windows (like most OSes) uses demand-paged virtual memory, so it's not necessarily loaded into RAM until something refers to it. See: http://stackoverflow.com/a/9762088/179910 – Jerry Coffin Sep 18 '14 at 23:24
  • so it goes like this: exe file that contains machine instrucions -> virtual memory -> RAM (when instruction is needed by the CPU) – GamefanA Sep 18 '14 at 23:31
  • 1
    Yup, pretty much (at least for most normal programs--it's *possible* to modify the normal behavior to some degree when needed). – Jerry Coffin Sep 18 '14 at 23:34
  • so does this mean that in certain occuations, the eip will be infact pointing to an address in vertual memory? – GamefanA Sep 18 '14 at 23:37
  • 2
    Yes, but not for very long--as soon as EIP contains that address, the OS will start working on paging that code into memory (and the program will be blocked until it does). – Jerry Coffin Sep 18 '14 at 23:40

1 Answers1

5

EIP is updated by the microcode (firmware) in the CPU itself each time an instruction is retrieved and decoded for execution. I don't believe you can even access it is in the usual sense. However it can be modified using a jmp instruction, which is functionally (not include pipeline issues and so forth) the same as mov %eip,address. It is also updated on conditional jumps, call, and ret instructions.

Once your program is loaded into memory (during this process you can think of you program as simply data like any other file), the OS (or some other loader program) performs a jmp to the start of your program. Of course the code you are showing as example code is the real start of the program but simply a function that main has called.

Mikhail
  • 7,749
  • 11
  • 62
  • 136
Dwayne Towell
  • 8,154
  • 4
  • 36
  • 49
  • ok, that explains how it gets updated. another question that i had is that when we first click on the program to run are it all the instructions loaded into ram, and then and then the address of the first instruction gets loaded into eip and then it just increments working its way through eatch of the instruction? – GamefanA Sep 18 '14 at 23:26
  • Yes. Actually, what happens is the program is loaded into memory at location X (for some X), and the loading mechanism simply does a JMP to X. – Ira Baxter Sep 18 '14 at 23:40
  • 2
    In x86-64 you can read `rip` into a general-purpose register eg. with `lea rax,[rip]`. In x86 you need to `call mylabel` and then in `mylabel:` you can eg. `pop ax` in 16-bit code or `pop eax` in 32-bit code. – nrz Sep 19 '14 at 00:16
  • @user3769877 For an example of `microcode` for very simple 8-bit processor take a look at http://stackoverflow.com/a/20961380/2626313 and how the value of `pc` register changes. pc == program counter == instruction pointer – xmojmr Sep 19 '14 at 04:14
  • @nrz: Too bad the assembler doesn't understand `call +0` to mean relative addressing. – Joshua Nov 10 '20 at 00:17
  • @Dwayne: The parts of the CPU that handle instruction-length decoding and advancing to the next instruction are pretty much fixed-function logic, not "microcode" in the usual sense. True hardware, not firmware; you probably couldn't change how the hardware decodes instruction lengths with a microcode update even if you wanted to. Maybe 386 was old enough that most of its internals were implemented via microcode, but definitely not for modern CPUs. – Peter Cordes Nov 10 '20 at 04:38
  • @PeterCordes Yeah, I have never claimed that RIP is a general-purpose register, only that "In x86-64 **you can read `rip` into a general-purpose register** eg. with `lea rax,[rip]`". – nrz Nov 12 '20 at 20:48
  • @nrz: Ugh, apparently I need to read more carefully before writing a long comment in reply, not *just* skim when I'm sleepy. Sorry about that. – Peter Cordes Nov 12 '20 at 21:04