Omission of base register in x86 AT&T addressing mode

Question

I am trying to decode the syntax of the intel IA32 x86 cmp command.

The command in question is

cmp 0x804a38(,%ebx,4), %eax

I have a rough estimate of whats going on - the (content of ebx *4) + 0x0804a38 is then subtracted from eax and the condition codes then set.

However, I know I am wrong because the jump command after that isn't executed - its je

What am I doing wrong here? is it because the cmp is missing an argument?

Welcome to SO. Please have a look here to learn how to improve your questions (formatting, proofreading, providing code etc.): https://stackoverflow.com/help/how-to-ask — petezurich, Apr 22 '18 at 09:25
It is the contents of (0x804a38 + ebx * 4) that you are comparing to eax, right? Your statement of it is ambiguous. — mevets, Apr 22 '18 at 12:14
From the wording of your question it sounds like you completely missed the fact, that the first argument is memory reference, i.e. `0x80...+ebx*4` is calculated as memory address, and then 4 byte value is loaded from that memory address, and compared with `eax`. In Intel syntax the instruction would look like this: `cmp eax,[ebx*4+0x804a38]` (in Intel syntax memory reference is in square brackets, not ordinary parentheses, and the syntax is relaxed, i.e. `[0x804a38+ebx*4] == [ebx*4+0x804a38]`, most of the assemblers will even evaluate simple expression for you, like `[ebx*4+label+15]`. — Ped7g, Apr 22 '18 at 12:58

score 3 · Answer 1 · answered Apr 22 '18 at 09:46

It is allowed to omit a base register in the SIB (scale-index-base) notation of Intel x86 machine instructions.

From the Intel SDM, volume 2, table 2-3:

The highlighted row, column, value and the comment below the table designate the corresponding SIB byte value.

However, I know I am wrong because the jump command after that isn't executed

The cmp instruction is encoded correctly by itself. It is something that is wrong elsewhere: either with its memory argument that cause a fault, the following je instruction (which you did not show), or you just misinterpreted the results of your code execution.

Peter Cordes · Answer 2 · 2018-04-22T10:41:55.783

2

Do you mean the je doesn't jump, and falls through? That means it executed but found the condition was false. That happens when the 4 bytes in memory don't match the 4 bytes in EAX, so cmp won't set ZF.

The only way for je to not execute would be if cmp caused a segmentation fault or something, so the program died before reaching the instruction after cmp.

And yes, you are decoding the AT&T addressing-mode syntax correctly, it's a scaled-index with no base register, just a disp32.) cmp isn't missing an argument, the addressing mode is missing a base (which is totally normal). %ebx is being used as a scaled index into a static array of dwords.

Referencing the contents of a memory location. (x86 addressing modes). (I'm not sure if there's a good link for AT&T syntax addressing modes, but what the machine can encode is fixed; AT&T and Intel syntax can both express every addressing mode the machine can do.)

edited Apr 22 '18 at 10:41

answered Apr 22 '18 at 09:45

Peter Cordes

328,167
45
605
847

1

I remember that at least some time ago (~2009) GNU's toolchain used `eiz` at a missing base: https://stackoverflow.com/questions/2553517/what-is-register-eiz – Grigory Rechistov Apr 22 '18 at 09:49
@GrigoryRechistov: In my experience, it only uses that as a placeholder for a missing *index* in an addressing mode that does use a SIB byte, but not for `base=%esp` with no index where a SIB byte is still mandatory. i.e. to tell you that the addressing-mode is padded when it didn't need to be, like as part of a long NOP. – Peter Cordes Apr 22 '18 at 09:53
oops, indeed, not *base* but *index*! Thanks – Grigory Rechistov Apr 22 '18 at 09:55
@GrigoryRechistov: also, I just checked and GAS doesn't accept `mov (%ebx, %eiz, 4), %eax` as *input*. The OP's code is probably disassembly output anyway (given the numeric displacement), but I thought it was interesting that `%eiz` (/ `%riz`?) was an output-only thing for binutils. – Peter Cordes Apr 22 '18 at 09:58
@PeterCordes This whole `eiz` thing is super weird, too. No idea why they did it. – fuz Apr 22 '18 at 10:05
@fuz: I think so the disassembler can be more "accurate" and show you when there's a SIB, even in Intel syntax, e.g. `db 0x8b, 0x4, 0x21` -> nasm -> `objdump -drwC -Mintel` -> `mov eax,DWORD PTR [rcx+riz*1]`. Normally only happens in long-NOPs, where it's nice to see what the padding is. They also include an explicit zero when there is one (like for `base=ebp`), you get `0+` or `0(%rbp)`. – Peter Cordes Apr 22 '18 at 10:16
1

@PeterCordes Yeah, but `(%eax,,1)` is already explicit enough. No idea why an extra `%eiz` is needed. – fuz Apr 22 '18 at 10:17
@fuz: For Intel syntax, where there's no way to indicate an empty index otherwise. They could have made that an Intel-only thing, but maybe it was easier not to. – Peter Cordes Apr 22 '18 at 10:43

Omission of base register in x86 AT&T addressing mode

2 Answers2