1

As a learning exercise, I've been handwriting assembly. I can't seem to figure out how to load the value of an address into a register.

Semantically, I want to do the following:

_start:
        # read(0, buffer, 1)
        mov     $3, %eax            # System call 3 is read
        mov     $0, %ebx            # File handle 0 is stdin
        mov     $buffer, %ecx       # Buffer to write to
        mov     $1, %edx            # Length of buffer
        int     $0x80               # Invoke system call

        lea     (%ecx, %ecx), %edi  # Pull the value at address into %edi
        cmp     $97, %edi           # Compare to 'a'
        je      done

I've written a higher-level implementation in C:

char buffer[1];
int main()
{
    read(0, buffer, 1);
    char a = buffer[0];
    return (a == 'a') ? 1 : 0;
}

But compiling with gcc -S produces assembly that doesn't port well into my implementation above.

I think lea is the right instruction I should be using to load the value at the given address stored in %ecx into %edi, but upon inspection in gdb, %edi contains a garbage value after this instruction is executed. Is this approach correct?

JFMR
  • 23,265
  • 4
  • 52
  • 76
James Taylor
  • 6,158
  • 8
  • 48
  • 74
  • 3
    No, `lea` will give you the effective address, i.e. `ecx + edi`. It looks like you want `mov`, and probably the variant that only reads one byte and zero-extends it into a double-word since you're dealing with ASCII characters. – Michael Jan 29 '18 at 07:04

1 Answers1

2

Instead of the lea instruction, what you need is:

movzbl  (%ecx), %edi        

That is, zero extending into the edi register the byte at the memory address contained in ecx.

_start:
        # read(0, buffer, 1)
        mov     $3, %eax            # System call 3 is read
        mov     $0, %ebx            # File handle 0 is stdin
        mov     $buffer, %ecx       # Buffer to write to
        mov     $1, %edx            # Length of buffer
        int     $0x80               # Invoke system call

        movzbl  (%ecx), %edi        # Pull the value at address ecx into edi
        cmp     $97, %edi           # Compare to 'a'
        je      done

Some advice

  • You don't really need the movz instruction: you don't need a separate load operation, since you can compare the byte in memory pointed by ecx directly with cmp:

    cmpb $97, (%ecx)
    
  • You may want to specify the character to be compared against (i.e., 'a') as $'a' instead of $97 in order to improve readability:

    cmpb $'a', (%ecx)
    
  • Avoiding conditional branches is usually a good idea. Immediately after performing the system call, you could use the following code that uses cmov for determining the return value, which is stored in eax, instead of performing a conditional jump (i.e., the je instruction):

    xor     %eax, %eax     # set eax to zero
    cmpb    $'a', (%ecx)   # compare to 'a'
    cmovz   %edx, %eax     # conditionally move edx(=1) into eax
    ret                    # eax is either 0 or 1 at this point
    

    edx was set to 1 prior to the system call. Therefore, this approach above relies on the fact that edx is preserved across the system call (i.e., the int 0x80 instruction).

  • Even better, you could use sete on al after the comparison instead of the cmov:

    xor     %eax, %eax     # set eax to zero
    cmpb    $'a', (%ecx)   # compare to 'a'
    sete    %al            # conditionally set al
    ret                    # eax is either 0 or 1 at this point
    

    The register al, which was set to zero by means of xor %eax, %eax, will be set to 1 if the ZF flag was set by the cmp (i.e., if the byte pointed by ecx is 'a'). With this approach you don't need to care about thinking whether the syscall preserves edx or not, since the outcome doesn't depend on edx.

JFMR
  • 23,265
  • 4
  • 52
  • 76
  • 1
    You missed updating the 2nd example to movzx. And in that example, I think you want `setz %al`, because you didn't include an instruction to set `%edx` to 1. (And you don't want to have to do that). Anyway, `xor`-zero eax / `cmpb $'a', (%ecx)` / `setz %al` should do the trick. (You can even leave out the xor-zeroing if you only return an 8-bit `bool`, or take advantage of the fact that only the low byte of main's `int` return value is visible as the process's exit status). – Peter Cordes Jan 29 '18 at 14:14
  • @PeterCordes Thanks for the comment. The second code snippet comes *after* performing the system call (i.e., the `int $0x80` instruction). The register `edx` is set to 1 *before* the syscall. I thought that [`edx`'s value is preserved across the syscall](https://stackoverflow.com/a/2538212/8012646). The code also assumes that `ecx`'s value is preserved across the syscall. Am I missing something here? – JFMR Apr 02 '18 at 16:05
  • 1
    Oh, I think I didn't realize this was still *immediately* after the system call, before any other code that might have used `%edx`. So it's not a stand-alone fragment. Yes, `int $0x80` preserves all registers except the return value in EAX. But still prefer `setz` instead of `cmov`, with xor-zeroing before the compare. The system call dwarfs the cost here, of course, but `cmov` is 2 uops on Intel before Broadwell. If you needed to produce a 0/1 but it didn't matter what register, `cmov` on EBX and EDX would work because you already have a 0 and 1, and be better on AMD and recent Intel. – Peter Cordes Apr 03 '18 at 19:28
  • 1
    BTW, `sete` has the correct semantic meaning for this case: set if the last compare was `e`qual. It's a synonym for `setz`, of course, but it helps indicate to human readers what it means for ZF to be set there. – Peter Cordes Apr 05 '18 at 20:17