3

Recently I started to learn x86 assembly language and CPU architecture. I noticed that total number of int registers is 8, but for x86-64 it is 16.

Why? There must be some explanation.

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
kirugan
  • 2,514
  • 2
  • 22
  • 41
  • 1
    x86 is really old... Back in the days, it was better to keep the hardware simple. – Mysticial Apr 27 '13 at 04:30
  • There's probably a more concrete answer, but having more registers is generally faster. You don't need to write to/read from the stack as much because you can store all of the values in the registers. I'm sure there's some downside to it (like the `PUSHA` instruction might be slower somehow?) but it's (probably) outweighed by the advantages. – rliu Apr 27 '13 at 04:32
  • @Mysticial but still x86 play a big role – kirugan Apr 27 '13 at 04:33
  • 2
    @kirugan Backwards compatibility made it stay pretty much the same all these years. Since 64-bit necessitated a new ISA anyway, they took the opportunity to add those much needed extra registers. – Mysticial Apr 27 '13 at 04:35
  • 1
    It just happens to be so - not sure what inner meaning you trying to find. It also quite old [The 8086 was introduced in 1978](http://en.wikipedia.org/wiki/X86) so you may need to go to some archives to find real reason. – Alexei Levenkov Apr 27 '13 at 04:38
  • You may be interested in reading [The 8088/8086 Primer: An Introduction to Their Architecture, System Design and Programming](http://archive.org/details/The8086Primer), by Stephen P. Morse (the designer of 8088/8086). – nrz Apr 27 '13 at 04:43
  • @nrz thanks! 400 pages to find the truth :) – kirugan Apr 27 '13 at 04:49
  • 1
    It's important to understand that there are actually 100-200 registers on a modern x86 core. What there are only 8/16 of are register *names*. This has significant impact on software optimization. – Stephen Canon Apr 27 '13 at 13:32
  • 2
    "Closed as not constructive"? The question IMHO is precise, and the answer is technically accurate (even if it is mine). What more could one want? – Ira Baxter Apr 27 '13 at 20:30

3 Answers3

8

The x86 architecture has evolved from its earliest incarnation as an 8008 back in the early 1970s. At the time, memory bytes and therefore opcode space was extremely precious; only 3 bits were set aside for the (at the time) A, B, C, D, E, F, (and IIRC) H and L registers, all 8 bits. (I remember how painfully hard those machines were to program, and how slow! You had to load H and L with a memory address, before a memory read or write!)

Since then, Intel has evolved the instruction set, through 8080, 8086, 80186, 80286, 80386, 80486 architectures of the late 1980s, extending the registers to 16 and 32 bits, but staying with the same 3 bits to select a register.

It wasn't until AMD designed a 64 bit version of the 80486 architure, that a 4th register bit was added by virtue of adding (since now memory and therefore opcode bytes are cheap) an instruction prefix byte. This prefix byte in essence adds "8" to the register number selected by those same 3 legacy register bits; this means the "register number" is spread out across the instruction, which makes for an ugly decoder, but transistors are now cheap, too.

The excuse for 16 registers is "register pressure". The ideal CPU will do all necessary arithmetic in its registers, always having enough so it doesn't have to sometimes spill (store and reload later) a register to memory to make space for another computation. Measurements (and experience) have shown that 8 registers was not really quite enough to avoid such spills, and since spills touch memory, they slow down the processor considerably. I think 32 is considered (carefully measured) to be more than enough registers, but that would have required 2 bits, and 16 is close enough to ideal to be very practical. And, AMD for awhile was able to use their 64 bit offering, and 16 registers rather than a mere 8, as effective high-tech marketing features.

Intel, discovering they were losing the 64 bit processor war to AMD, tried to produce their own 64 bit extension of the x86, but Microsoft said they were supporting the AMD instruction set, and would not support 2 different x86 64 bit instruction sets. Intel folded, and now has essentially the same basic 64 bit instruction set that AMD offered.

You'll find that extremely modern versions of these CPUs have vector registers sets of 16 and 32 (I think) registers; opcode bits are much cheaper now and instruction fetch rates are incredible.

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • 2
    RISC CPUs tended to have 16 registers for the register pressure reason. After all, the idea behind RISC was to engineer the instruction set simply but covering what was really needed, not what seemed cool for the CPU designers (overly CISC instructions are what really killed the VAX). What you will discover if you dig into x86 architecture these days is the chip internals are effectively RISC machines with really complex decoder to map what you think of as x86 instructions into this simpler micro ops. – Ira Baxter Apr 27 '13 at 04:49
  • 1
    @AlexeiLevenkov: Cleanup? – Ira Baxter Feb 29 '16 at 20:02
  • some more history: https://blogs.msdn.microsoft.com/oldnewthing/20040105-00/?p=41203 – phuclv Mar 05 '18 at 11:04
1

You can easily to explain yourself. To address 8 register you need just 3 bits, it is what is available in assembler language command format. Since 64 command is twice bigger it has 6 bits. See more here What is the size of each asm instruction?

So if your shortest instruction length 1 byte you have just 8 bits. So if you need a command moving from one register to other, you need to keep in 8 bits the following:

  1. opcode

  2. source address

  3. destination address

since source and designation addresses take 3 bits each, so 6 bits are consumed and you have only 2 bits for actual operations totally 4 as move, add, sub and swap.

Community
  • 1
  • 1
Singagirl
  • 465
  • 1
  • 3
  • 11
1

The original 8086 from 1978 had, as I recall, a few tens of thousands of transistors and, because of this, all instructions were micro-coded. This means that there are compromises galore to shoehorn a 16-bit processor's functionality into the equation. The most successful 16-bit processor to date had been Digital Equipment's PDP-11 which also had eight general purpose registers. The 8086 bettered this with eight byte registers in four of the 16-bit registers.

Even the address translation was micro-coded. The average cycles required for an instruction was 17. The 286 lowered this to 7 thanks to many more transistors (and a not entirely micro-coded address translation unit), the 386 to 4.4 and the 486 to 1.8. During that time the clock speed increased from 5 to 100 MHz (in the 486 DX-3).

The 8086 was intended for real-time applications and fewer registers means faster interrupt handling and task switches. It should be seen in its historical context and not compared with today's 32-bit, 32-register RISC processors.

The 386 was the first 32-bit capable processor and that was ten years after the 8086. There are still only eight general purpose registers which have 16-bit subregisters of which four have 8-bit subregisters.

There came a point where the 8-register, 32-bit architecture couldn't be made faster by throwing transistors at it. Or, rather, in the scales were improving the 8-register, 32-bit solution using more transistors vs extending to 16 registers of 64 bits each and implementing a revised instruction set. The latter won.

Olof Forshell
  • 3,169
  • 22
  • 28
  • Can you cite a reference showing the original 8086 was microcoded? I don't think it was, and usually a dearth of transistors means you hardwire the essential stuff and you try not to have anything else. – Ira Baxter Apr 27 '13 at 22:44
  • Several sites state that the 8086 instruction set was micro-coded. When you say "hardwire the essential stuff" I say fine but the 8086 had a rich instruction set and all of it couldn't have been "essential stuff." A 16-bit multiply could require over one hundred cycles to execute which comes out to 6+ cycles per bit (as in a shift, test and add multiplication primitive) - hardly hardwired territory. The 80286 had a partly hardwired effective address calculation: "offset[bx]" (one addition) had no cycle penalty but "offset[bx+si]" (two additions) had one. Divisions are still micro-coded today. – Olof Forshell Apr 28 '13 at 12:27
  • I agree complex instructions are hard to hardwire. You asserted "all instructions were microcoded", which is where I have trouble. You uselots of clocks as a justification. The Intel chips used IIRC a 3 phase clock compared many other processors single phase clock, and used to bogusly market this "higher clock rate" when in fact the throughput was essentially the same. That 3 phase clock for your multiply would then translate to 3 phases for add, and three phases for shift, e.g, essentially one clock for an add and one for a shift which is what you would expect of a simple implementation. – Ira Baxter Apr 28 '13 at 15:00
  • How many transistors are required to implement a hardwired adder that will do subtracts, adds, effective address calculations and segment+offset to 20-bit-address calculations in one cycle? I guess it would have used up a sizeable portion of the 8086 transistor budget and left little for demanding multiply/divide, not to mention the mass of other, simpler instructions the 8086 was able to perform. Gradually more and more instruction have been "implemented in hardware" but it's an ongoing process over processor generations - an endless string of compromises if you will. – Olof Forshell Apr 28 '13 at 17:50
  • @IraBaxter: While instruction timing charts for the 8088 and 8086 were often not accurate at predicting how long code would take to execute since they assumed the memory bus would keep up with the processor (in practice, it usually came close on the 8086 and was nowhere near close on the 8088) effective address calculations were slow on the 8088 because they were microcoded. – supercat Nov 08 '16 at 07:32