4

I'm using the latest official "Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4" as a reference to understand the machine level encoding of x86-64 ISA.

The documentation for the ModR/M and SIB bytes provided in Volume 2, Section 2.1.3 gives the exact encodings for referring to the 8-bit, 16-bit and 32-bit registers (Tables 2-1, 2-2 and 2-3)

However, I'm not able to find a similar table that specifies how the REX.X, REX.B, REX.R bytes in the REX prefix combine with ModR/M to specify the extended registers. I'm specifically looking for the explicit binary encoding for each of the extended registers. As far as I can tell, the documentation for REX prefix in the manual only specifies that the reg, r/m fields are extended by 1 bit in the MSB using corresponding bits in REX, but doesn't actually give the explicit mapping for the bit combinations.

Does the Intel documentation explicitly state these mappings anywhere in the SDM? Or is it just assumed that R8-R15 will follow the obvious/natural mapping strategy with REX.B/X/R set to 1 and R8 encoded as 000, R9 as 001 ... R15 as 111 ?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
John Adam
  • 43
  • 4
  • 3
    Section 2.1.3 "Not all instructions require a REX prefix in 64-bit mode. A prefix is necessary only if an instruction references one of the extended registers or uses a 64-bit operand". So REX is basically defined per instruction. But yes, my understanding is that if it extends a register number, then the logical mapping applies (R15 is REX set to 1 and the ModRM is 111) – wxz Aug 19 '21 at 19:29
  • 2
    IDK if Intel bothered to make a big table anywhere, but yes as you found it follows straight-forward binary numbering using the REX bit as the leading bit and the ModRM bits as the low 3 bits to encode the register number. 8 = 1000 in binary, so that's the encoding for R8. https://wiki.osdev.org/X86-64_Instruction_Encoding#Encoding describes it nicely, and did even make a table (https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers). I guess you know this, and are just wondering about the state of Intel's own documentation? – Peter Cordes Aug 19 '21 at 19:31
  • @PeterCordes - yep, I'm a big fan of the OSDev wiki, but recently I've been trying to limit myself to using just the official documentation as an exercise. After trying to parse the SDM couple of times and failing to find the mappings listed, I wasn't sure if Intel's documentation was missing that information, or if my understanding was incorrect and it was in fact stated in the documentation. – John Adam Aug 19 '21 at 20:24
  • 1
    Just as a general comment, there are some areas where I've found that the AMD documentation is a lot clearer than Intel's. – sj95126 Aug 19 '21 at 22:10

1 Answers1

3

Yes, as you found it follows straightforward binary numbering using the REX bit as the leading bit and the ModRM bits as the low 3 bits to encode the register number. 8 = 1000 in binary, so that's the encoding for R8.

https://wiki.osdev.org/X86-64_Instruction_Encoding#Encoding explains nicely, and https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers even has a table.


I searched Intel's vol.2 PDF for r14 (which is not "special" for anything, and will probably only show up in tables). There are some tables in vol.2, but not one for simple ModRM itself. (The combined PDF is too huge to want to work with).

Vol.2 does clearly describe how REX fields combine with ModRM fields to make 4-bit register numbers. (e.g. Figure 2-4 showing the concatenation of REX.B and ModRM.rm, and REX.R with ModRM.r). I didn't check vol.1 - I wouldn't be surprised if some statement about register names (used by assemblers) matching binary register numbers could be found there. Names are only meaningful to assemblers, not in machine code, and that is clearly documented in vol.2.


However, the info is there in vol.2:

It does have Table 3-1. Register Codes Associated With +rb, +rw, +rd, +ro (Contd.) for instructions with no ModRM where the low 3 bits of the opcode byte are the low 3 bits of the register number. (Like the short encodings of push/pop r64).

Reg REX.B Reg field
R13B Yes 5
R14B Yes 6

And so on with rows for every register, and 3 more sets of columns for word, dword, and qword sizes for R14W, R14D, R14. So if you were in doubt about the fact that the binary numbers map to register names, that table makes it clear. (It would be insane to assume that register numbers work differently here than in other contexts.)

There's also the opcode map for one-byte opcodes, where push rSI/r14 share an entry (0x58), same for xchg-with-(e)ax, mov-immediate to byte-reg, pop, mov-immediate to word/dword/qword-reg, and bswap in the 2-byte opcode map. Again, it would be insane for these register numbers to work differently than register numbers in other places.

There is a full table Table 2-8. VEX.vvvv to register name mapping, with xmm/ymm0..15 and RAX/EAX .. R15/R15D. (VEX.vvvv can encode integer registers for BMI instructions like andn, and yes they are only documented for dword or qword, not overrideable to word operand size with a 66 prefix.)

Table 2-13. 32-Bit VSIB Addressing Forms of the SIB Byte is also relevant, showing columns like ESI/R14D. (In 64-bit mode, you normally wouldn't use a 67 address-size prefix with vpgatherdd or whatever, but you can. There isn't a separate table for 64-bit address-size.) The table doesn't explicitly mention how VEX.B selects between the two registers for a given value of bits 2:0, but that should be obviously from other cases.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Wiki specifies encoding of the first 16 registers only. AVX512 brought the VSIB concept, where XMM16..XMM31 can be used in addressing of memory, their lower 3 bits are in ModRM or in SIB, and higer 2 bits are encoded in EVEX, see the table [Operands encoding in EVEX prefix](https://euroassembler.eu/easource/ii.htm#IiAssembleAVX) – vitsoft Aug 19 '21 at 21:38
  • @vitsoft: No, VSIB was new with AVX2, for `vgatherps` / `vpgatherdd` / etc. AVX512 also uses it for gather/scatter, but VSIB wasn't new with AVX-512. What was new was EVEX providing an extra *2* bits (instead of just 1 in VEX) for vector register numbers. – Peter Cordes Aug 19 '21 at 23:21
  • Thank you very much for the detailed answer, @PeterCordes. `Vol.2 does clearly describe how REX fields combine with ModRM fields to make 4-bit register numbers` It does clearly describe which REX bits contributes to the leading bit for which ModR/M & SIB fields, but it still doesn't state the encodings for R8-R15 explicitly. `I wouldn't be surprised if some statement about register names (used by assemblers) matching binary register numbers` I did read vol. 1 before posting the question but I don't remember such a statement I might be mistaken though. – John Adam Aug 20 '21 at 10:01
  • Apologies for the poor formatting - not sure whats the best way to quote in a reply comment. I've just used the backticks to wrap the quoted parts from parent comment. – John Adam Aug 20 '21 at 10:06
  • `There is a full table Table 2-8. VEX.vvvv to register name mapping, with xmm/ymm0..15 and RAX/EAX .. R15/R15D.` This is what I too found as closest to an explicit encoding map, given that VEX documentation states: `Full REX prefix functionality is provided in the three-byte form of VEX prefix. However the VEX bit fields providing REX functionality are encoded using 1’s complement form, i.e. XMM0/YMM0/R0 is encoded as 1111B, XMM15/YMM15/R15 is encoded as 0000B` – John Adam Aug 20 '21 at 10:07
  • Based on your findings, @wxz 's comment about how REX is defined "per-instruction", it does seem to me that the SDM doesn't have any explicit mappings for REX + ModR/M (like the one in OSDev Wiki) and this key assumption is required - it would be insane for these register numbers to work differently than register numbers in other places. – John Adam Aug 20 '21 at 10:08
  • @JohnAdam: Re: your first comment: correct. I'm making a distinction between register numbers (0000 .. 1111 in the machine encoding) vs. register *names* (R8 .. R15 in asm source). It *does* clearly show how to make 4-bit register numbers, and it's fairly clear that the same register number as a base refers to the same register as the same encoding as `/r` when talking purely about ModRM instructions. – Peter Cordes Aug 20 '21 at 12:09
  • @JohnAdam: I don't really agree that REX is "*defined* per-instruction". Its fields always mean the same thing for any instruction (except for .W which is ignored for instructions like `push`/`pop` whose operand-size is fixed at 64-bit). The REX fields all default to 0 if not present, so you often can avoid one, but I wouldn't say that means it's defined differently for different instructions. I think that was just poor choice of phrasing by wxz, and doesn't really go with their quote from the SDM. – Peter Cordes Aug 20 '21 at 12:13