28

Concerning the following small code, which was illustrated in another post about the size of structure and all the possibilities to align data correctly :

struct
{
 char Data1;
 short Data2;
 int Data3;
 char Data4;
} x;

unsigned fun ( void )
{
    x.Data1=1;
    x.Data2=2;
    x.Data3=3;
    x.Data4=4;
    return(sizeof(x));
}

I get the corresponding disassembly (with 64 bits)

0000000000000000 <fun>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   c6 05 00 00 00 00 01    movb   $0x1,0x0(%rip)        # b <fun+0xb>
   b:   66 c7 05 00 00 00 00    movw   $0x2,0x0(%rip)        # 14 <fun+0x14>
  12:   02 00 
  14:   c7 05 00 00 00 00 03    movl   $0x3,0x0(%rip)        # 1e <fun+0x1e>
  1b:   00 00 00 
  1e:   c6 05 00 00 00 00 04    movb   $0x4,0x0(%rip)        # 25 <fun+0x25>
  25:   b8 0c 00 00 00          mov    $0xc,%eax
  2a:   5d                      pop    %rbp
  2b:   c3                      retq   

I don't know how to calculate the terms located on the right which seems to be the address of local variables used. Moreover, I don't know to calculate it with %rip register

Could you give an example which shows the link between %rip and %rsp or %rbp, i.e especially in the computation of address when I use move instructions.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 7
    There is no such relation, `rip` is the instruction pointer (hence the name). You can't address locals relative to it. Note that `x` is not a local. Also note that you used objdump on an intermediate object file hence you did not get the correct offsets. You might want to run it on a linked executable and/or use `-r` option to see relocation entries. – Jester Feb 13 '17 at 23:05
  • No, it's incremented by however many bytes the instruction is. It points to the next instruction. – Jester Feb 13 '17 at 23:23
  • Why didnt you ask this in that question? Compare the object disassembly with the linked disassembly to see what happened. The linker filled in the rest of the instruction, the address/offset to rip. I was showing the -m32 vs -m64 instructions being generated then when that didnt completely tell the story the linked version did. – old_timer Feb 14 '17 at 02:15
  • @youpilat13 it's `RIP`, not `RPI`. ("instruction pointer" not "pointer instruction"). Also in 32b mode the 32b variant `eip` is used, and in 16b mode the 16b `ip` part. `rip` has no 8 bit aliases (like `rax` has `al`). – Ped7g Feb 14 '17 at 11:40
  • @Ped7g thanks for your correction. It seems that, at each line with `mov` instruction, `%rip`appears whereas in 32 bits version (compiled with gcc -m32), it doesn't. For example, what's the difference between classic `movb $0x4,0x0` instruction and `movb $0x4,0x0(%rip)` ? –  Feb 14 '17 at 11:48
  • 1
    `movb $0x4,0x0` will store byte value `4` into memory at absolute address `0`. `movb $0x4,0x0(%rip)` will store byte value `4` into memory at absolute address `rip + 0`, ie. at relative-to-RIP address `0`. It's same as using other registers for addressing, like `movb $4,0(%edi)`. The difference is, that the `rip` points at the time of evaluation to the beginning of next instruction. So the usage of `rip` for relative addressing allows the compiler to produce "PIC" Position Independent Code. The OS then needs to load the data + code together to maintain their relative position to each other. – Ped7g Feb 14 '17 at 11:51
  • ok and in 32 bits version, is it normal that %rip is not used (at least in my small code above) ? –  Feb 14 '17 at 11:55
  • 1
    Without `rip` in PIC code you would be unable to tell where your data are located, and you would have to load `rip` anyway to see, where the code is located, and adjust your addressing by that. So letting the compiler+linker to recalculate all the offset automatically by using mnemonics like `variable_x(%rip)` is making it easier for programmer, to make the code PIC-compatible. It's common to compile code for x86_64 target in PIC-compatible way (in some OS like OS X for Mac it is mandatory), while 32b x86 targets were usually using absolute code expecting particular position in memory. – Ped7g Feb 14 '17 at 11:55
  • If you will enforce the compiler to produce PIC-like code even for 32b target, it will use `rip` probably too (well, the 32 bit `eip` variant of course, 64b `rip` is not available in 32b mode). – Ped7g Feb 14 '17 at 11:56
  • I tried to enforce PIC-like code with : `gcc -m32 -fPIC main.c` but the corresponding Assembly file doesn't contain `%eip`, that's weird. –  Feb 14 '17 at 12:03
  • 1
    Yeah, I forgot the "basics" ... in 32b mode you can't address by `eip`, so the code is using `call` to local function to read the code position (`eip` value at time of `call` execution) from stack (return address), then it does use this value to relatively (to code position) address data. That's nice about compilers, they don't forget basics... ;) :D – Ped7g Feb 14 '17 at 12:08
  • you mean that `%eip` is modified at each `call`instruction (but in transparent way or internally) and is equal to the address of next instruction after `call`instruction, i.e after local function execution was finished ? –  Feb 14 '17 at 12:16
  • No, `rip/eip` *IS* instruction pointer, so it *ALWAYS* points at next instruction to execute. Ahead of `call helper_fn` it points at the first byte of `call` instruction (let's say it's at address `0x1000`). After decoding-phase of the `call` instruction, but before executing it, it is internally increased to point to the next instruction after `call`, that's address `0x1005` (when 5 byte long `call` opcode was used). Then CPU will execute the `call` itself, which means the value `0x1005` is pushed to the top of the stack, and `rip/eip` is loaded with value `helper_fn`= the next ins. to exec. – Ped7g Feb 14 '17 at 13:40
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/135685/discussion-between-ped7g-and-youpilat13). – Ped7g Feb 14 '17 at 13:41

1 Answers1

21

RIP addressing is always relative to RIP (64bit Instruction Pointer) register. So it can be use for global variables only. The 0 offset is equal to address of the following instruction after the RIP-addressed instruction. For example:

   mov  al,[rip+2]                     al=53
   jmp  short next   (length=2 bytes)   
db 53
next:
   mov  bl,[rip-7]   (length=6 bytes)  bl=53

You wouldn't normally mix data right in with your code, except as an immediate, but this shows what would happen if you actually ran code with very small offsets.

In your code you cannot see and check offsets (you see four zeros) because you disassembled a .o. Use objdump -drwC to show symbol names / relocations when disassembling. They will be filled by the linker when you link this object into an executable.


Example for accessing locals relative to `rbp:

push rbp      ;save rbp
mov rbp,rsp   ;rbp = pointer to return address (8 bytes)
sub rsp,64    ;reserve 64 bytes for local variables
mov rax,[rbp+8];  rax = the last stack-passed qword parameter (if any)
mov rdx,[rbp];    rdx = return address
mov rcx,[rbp-8];  rcx = first qword local variable (this is undefined now)
mov r8, [rbp-16];  r8  = second qword local variable (this is undefined now)
.
.
mov rsp,rbp
pop rbp
ret
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
toncsi
  • 273
  • 4
  • 8
  • 4
    Actually they will be filled at link time; looks like the OP disassembled a `.o` rather than a linked executable. Position-independent code doesn't need runtime fixups every time it's loaded; this is one of the big advantages of RIP-relative addressing. – Peter Cordes May 18 '18 at 21:02