The start of x86: Intel 8080 vs Intel 8086?

Question

Why is it said that all modern Intel processors of the x86 family are said to descend from the Intel 8086 and not the Intel 8080? From the Wikipedia article on the Intel 8086,

The 8086 gave rise to the x86 architecture, which eventually became Intel's most successful line of processors.

But why start at the 8086 when the 8086 was source compatible with the 8080, had some 16-bit operations? What is the defining feature that so set the two chips apart that the 8086 is said to start the architecture?

@MartinRosenau Yes. Source compatible, meaning an assembly language program may work on both 8086 and 8080. But they are not binary compatible. — Omar and Lorraine, Oct 16 '19 at 15:56
@Wilson As far as I know, the conditional return (e.g. RPE) and conditional call (e.g. CNS) instructions were commonly used in 8080 and Z80 programs. And as far as I know, no x86 CPU ever supported comparable instructions. Some other 8080 instructions (like XTHL) are only supported in 32-bit code starting with the 80386. I doubt that the 8086 was really source-code compatible to the 8080. Maybe there were assemblers that could replace one 8080 instruction like CNS by multiple 8086 instructions. ... — Martin Rosenau, Oct 16 '19 at 16:15
@Wilson ... However, if you argument with the existence of such assemblers, you also have to argue that PowerPC and ARM CPUs are x86 compatible because there are JIT-translators that convert PowerPC and ARM binary code to x86 code when you run an executable file. — Martin Rosenau, Oct 16 '19 at 16:16
@MartinRosenau Indeed Intel had such a tool. I forgot what it's called though. — fuz, Mar 21 '21 at 11:52
The DEC Alpha was highly source-compatible to the VAX, because a compiler was written that understood MACRO-32 (the VAX assembly language) and emitted Alpha object code. But that doesn't mean there's any relationship between the two architectures. — dave, May 21 '23 at 14:41

Peter Cordes · Accepted Answer · 2019-10-16T15:50:36.637

57

8086 was designed to make asm source porting from 8080 easy (not the other direction). It is not binary compatible with 8080, and not source-compatible either. 8080 is not an x86 CPU. 8080 is a more distant ancestor that had some influence on the design of 8086, but it's not the same architecture. As an analogy, all x86 CPUs are the same genus but different species, while 8080 is a different genus.

8080 itself has some ancestors like 8008, so if you're considering more-distant ancestors and not strict binary compat, then you definitely don't stop at 8080 as the earliest ancestor.

Modern x86 CPUs are binary compatible with 8086. You can literally run 8086 binaries on a modern PC, in real mode. (The species analogy is a stretch here, but works if you look at forward compat instead of backwards compat: old x86 chips can't run AVX / AVX2 / FMA / AVX512 code, so you could look at each ISA extension as a speciation event.)

The 86 in x86 comes from 8086 / 80186 / 80286 / ..., Intel's official CPU model numbers until they switched to names like Pentium (because you can't trademark a number).

Modern PC firmware usually still supports booting in legacy BIOS mode, supporting int 0x10 / int 0x13 etc. "BIOS" system calls for keyboard/screen input/output, reading disks, and so on. This is a PC software thing, going beyond 8086 binary compatibility, but it does mean you can still boot an 8086 kernel / bootloader on a modern PC.

8080 machine code is completely different from 8086: reg,reg instructions are 1-byte long (8080 opcode map), vs. most 8086 instructions specifying operands in a ModR/M byte. In 8080, the destination register is always part of the opcode (and ALU ops are mostly only available with A (the accumulator) as the destination).

8080 asm source is also not the same as 8086 asm source: the register names are different, and so are many of the mnemonics. e.g. ADI 123 for an add-immediate (implicitly to the accumulator, I think) or ORA E to do A |= E.

8086 has segmentation but 8080 doesn't (just a flat 16-bit address space).

You can write a program to mechanically translate from 8080 to 8086 asm source, but you can't just rebuild the same asm source for a different architecture. It's not even close to really being the same.

MichaelPetch says there were assemblers that could read 8080 source and output 8086 machine code (i.e. with the translation built in to the assembler, presumably with some fixed mapping between 8080 byte registers and 8086 AL/AH/BL/BH/...). IDK if they would ever have to emit multiple 8086 instructions for any 8080 mnemonics.

The manual for one such translator is XLT86™ 8080 to 8086 Assembly Language Translator USER'S GUIDE, from Digital Research, Inc.

This is not what I'd call "assembly-language compatible". It's close enough to enable translating single instructions separately (I think), but that's about it. You have to realize that by programming an 8086 using pure 8080 asm source, you're missing out on the power of 16-bit operations, and any 8086-specific optimizations.

Fun fact: Patrick Schlüter comments about chips that were binary compatible, not just source compatible. Contrast this with Intel chips which did not do this:

NEC V20/V30 were 80186 compatible CPUs that could explicitly execute 8080 binaries. They had 2 instructions that allowed to call and to trap into 8080 functions.

That's a similar idea to modern CPUs that support multiple machine-code formats, like ARM with Thumb vs. ARM, or early Itanium (IA-64) with hardware support for x86 machine code with some rules for mapping IA-32 register state onto the IA-64 registers. Or x86 protected (and long) mode with far calls between 16-bit and 32-bit (and 64-bit) code segments. (But not real mode; although it decodes the same as 16-bit protected mode, segment regs mean different things so real-mode code usually has to run inside vm86 mode, hardware virtualization under a 32-bit kernel.)

edited Oct 16 '19 at 15:50

answered May 15 '18 at 19:20

Peter Cordes

3,207
17
26

3

@HadiBrais: 8080 machine-code is completely different (mostly 1-byte instructions http://pastraiser.com/cpu/i8080/i8080_opcodes.html), and it has different register names (see the link in my answer). – Peter Cordes May 15 '18 at 19:33
1

Did you see but not binary-compatible . No you could not take machine code from 8080 and have it decoded properly on the 8086. At the time there were assemblers that could take 8080 assembly code and assemble it into 8086 machine code. But you needed an actual assembler for that. – May 15 '18 at 19:40
3

@HadiBrais: updated my answer. Unless they're implicitly talking about a special translating assembler, "assembly-language compatible" is a serious overstatement in that wiki article. Ah, I see MichaelPetch says there were such assemblers. – Peter Cordes May 15 '18 at 19:45
5

@PeterCordes : yes, I've found myself on occasion modifying Wikipedia articles based on SO questions that involved info there. Wikipedia may be a good resource in general, but many articles are not always technically accurate. – May 15 '18 at 19:48
2

@PeterCordes : Yep, I have had to use one decades ago on a project. The one I had experience with was Digital Research's xlt86 product. Was useful for going from CPM on 8080 to CPM on 8086. – May 15 '18 at 19:57
I would say that besides binary compatibility, the major difference is segmentation. If 8086 supported disabling segmentation, it would really operate very much like 8080. The 8086 is also the first to support masking interrupts. – May 15 '18 at 21:34
3

I couldn't post this earlier since I was on my phone. If anyone is curious this is the product we had used previously: http://www.s100computers.com/Software%20Folder/Assembler%20Collection/Digital%20Research%20XLT86%20Manual.pdf – May 15 '18 at 22:51
1

"You have to realize that by programming an 8086 using pure 8080 asm source, you're missing out on the power of 16-bit operations, and any 8086-specific optimizations." That's the one thing I did realize, even though the Wikiepdia article does say the 8080 had some 16 bit operators. By the same token, would you call x64 "assembly-language compatible"? – Evan Carroll May 16 '18 at 02:32
2

As an aside: if assembly compatibility were the test, isn't the 8080 a non-binary-compatible of the 8008? Which is itself an IC version of the Datapoint 2200? I don't think there's a consistent heritage test which ends up moving from the 8086 to the 8080 but stops there. – Tommy May 16 '18 at 20:14
2

@Tommy: Thanks, great observation that 8080 wouldn't be the start of the chain for a weaker criterion. – Peter Cordes May 16 '18 at 20:20
2

@HaidBrais "If 8086 supported disabling segmentation, it would really operate very much like 8080" You could come close by setting all segment registers the same (the "Tiny" memory model). – TripeHound May 17 '18 at 09:20
1

Actually it is int 13h for reading disks – Rui F Ribeiro May 17 '18 at 12:22
IIRC, 8008 is also not machine code compatible with 8080. – rackandboneman May 17 '18 at 16:24
@TripeHound: Some things would still be awkward to port. For example, the 8080 has three pairs of 8-bit registers that can be treated as general-purpose 16-bit address registers. The 8086 has four pairs of 8-bit registers, of which one can be treated as a 16-bit address register, along with two general-purpose 16-bit address registers that cannot be treated as two 8-bit halves. One could usually work around such limitations by e.g. using si for some of the things one would have used bc for, using and ch/cl for some of the others, but doing so efficiently may take some work. – supercat May 17 '18 at 23:02
@RuiFRibeiro: thanks, fixed, and linked to the BIOS category of Ralf Brown's interrupt list :P – Peter Cordes May 17 '18 at 23:15
2

@rackandboneman: Right, I'm saying that 8008 counts as an ancestor of 8080 if we aren't requiring binary compatibility for something to count as an ancestor. Is source porting from 8008 to 8080 about as easy as source porting from 8080 to 8086? (Also, feel free to edit to improve the wording if my phrasing didn't imply what I intended, but I think my 2nd paragraph says what I mean.) – Peter Cordes May 17 '18 at 23:18
@supercat True; Jules's answer covers a lot of those things very well. I was really only talking about the segment registers. – TripeHound May 18 '18 at 00:03
1

@TripeHound: Ah--I'd missed that you were responding to a comment. As you note, segmentation on the 8086 doesn't really break 8080 compatibility (unlike the segment-descriptor design which throws out the window the best approach that has ever been implemented for a 16-bit processor to access 1MB of address space). – supercat May 18 '18 at 16:29
2

May be mentioning NEC V20/V30 which were 80186 compatible CPU that could explicitely execute 8080 binaries. They had 2 instructions that allowed to call and to trap into 8080 functions. – Patrick Schlüter Oct 16 '19 at 08:18
1

@PatrickSchlüter: Thanks, that makes an interesting contrast with Intel CPUs. Added a section at the bottom about CPUs that understand multiple machine-code formats. – Peter Cordes Oct 16 '19 at 15:51
1

Interestingly, Intel seems to be considering abandoning some backward compatibility: https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html (See also the Real World Technologies thread.) – May 21 '23 at 16:23

Jules · Answer 2 · 2018-05-17T07:16:46.293

To supplement @PeterCordes's excellent answer, I thought it would be worth going into the details of exactly how close to source code compatible the two processors are -- for example, how easy would it be to use textual substitutions (e.g. macros) to automatically translate 8080 code to 8086 code, and what the limitations would be.

The first point would be to examine how the registers in the architecture can be mapped. Fortunately, the 8086 registers are effectively a superset of the 8080 registers, so we can map A to AL, BC to CX, DE to DX and HL to BX (this ends up with the registers in a non-intuitive order, as HL can be used for indirect memory addressing, which is better supported using BX than the other general purpose registers on the 8086 -- but note that this unusual ordering of the registers is actually reflected in their conversion to machine code, suggesting that while the mnemonics for the registers weren't named with 8080 compatibility in mind, the design of the instruction set was). Clearly SP and IP must map to the registers for the same purpose, as must the flag register, which conveniently has the bits with the same meanings in the same locations when it is stored elsewhere. But here we note the first incompatibility: the 8080 groups the A register with the flags register (referring to the combination as the 'processor status word') and handles them together as a unit (for example when pushing and popping to the stack), but in the 8086 both are expanded to a full 16 bits and handled individually.

This means that the following 8080 instructions have no single instruction that can perform the same operation on the 8086:

PUSH PSW    ; "push af" for those who prefer Z80 syntax
POP PSW     ; "pop af"

To emulate these operations on the 8086 you'd need multiple instructions:

LAHF        ; Load AH from low-order 8-bits of flags
PUSH AX

POP AX
SAHF        ; Store AH in low-order 8-bits of flags

Again, I have a suspicion that the LAHF and SAHF instructions were specifically designed to allow this translation -- they're a pretty unusual operation to support in most respects -- and the choice of AH (when AL would be the more usual target for such an operation) seems strongly to indicate that these instructions were added to make 8080 translation easier.

Looking through the table of instructions supported by the 8080, few others stand out as not being easy one-to-one translations, although as Peter Cordes points out many 1-byte instructions become 2-byte instructions on the 8086 (e.g. MOV C,M or Z80 equivalent ld c,(hl) which is 4Eh on the 8080 would convert to MOV CL,[BX] or 8Ah 0Fh on the 8086, or PCHL load program counter with HL - equivalent to 8086 CALL BX FFh D3h). Also slightly tricky are the RST n instructions, which could plausibly convert to INT nn, although there are subtle differences on the receiving end of the call ... but that would usually be system software, and I believe the intent was to allow easy application compatibility but that a complete rewrite of system software would have been expected. Another group of instructions that aren't supported are conditional calls and returns (e.g. CNZ addr / call nz, addr) which would need to be emulated in the 8086 by a conditional jump with the opposite condition skipping over the instruction.

There are two issues that are caused by the expansion of single byte instructions into two bytes:

Self-modifying code would end up hopelessly broken (although it would need substantial modification to work anyway)
Code that only just fits into the 64K memory space of the 8080 may struggle to fit into a 64K segment on the 8086. In the majority of cases this can be mitigated by moving from the "tiny" memory model (i.e. code and data in the same segment) to the "small" memory model (i.e. one segment for code and one for data).

It therefore seems reasonably simple to perform an automated one-to-one translation of code from 8080 to 8086; a macro assembler may well have been able to handle the translation, even if dedicated packages (as mentioned above) weren't available. It wouldn't work for all programs, but with a small increase in memory required, it should be reasonably simple to make most programs work successfully.

Another interesting question is to what extend the extended variants of the 8080 are also compatible? That is, either the 8085 or the Z80?

The relevant 8085 extensions are:

RIM (read interrupt masks) and SIM (store interrupt masks) - the operations here are entirely unsupported by the 8086; machines using the 8086 typically use an external programmable interrupt control that provides the same feature.
DSUB, ARHL, and RDEL are undocumented 16-bit arithmetic instructions that are obviously well supported on the 8086
LDHI, interestingly, is (an undocumented) equivalent to the 8086s LEA DX, [BX+n] instruction, which I'd previously thought to be entirely unique to the 8086.
LDSI is a parallel to LDHI but using SP as its source register: the 8086 equivalent would be LEA DX, [SP+n], except that the 8086 doesn't support using SP like that ... you'd have to wait for the 80386 to get support for an equivalent instruction, and then only with 32-bit registers. You'd probably encode this as MOV DI, SP; LEA DX, [DI+n] instead. Which is the first time I've seen a 4 byte instruction come out from a single byte instruction input...
SHLX and LHLX are 16-bit (undocumented) indirect memory operations, of a kind quite natural on the 8086.
This leaves the remaining set of undocumented instructions: jumps based on the equally undocumented X5 and overflow flags. The 8086 has no equivalent to any of these instructions.

The Z80 is harder still: it extends the 8080's register set with not just a new pair of index registers (IX and IY, which could be mapped to DI and SI ... although the undocumented I[X/Y]H and I[X/Y]L instructions would have no direct equivalent there) but an entire duplicate set of registers (which cannot be mapped to anything, because we've run out of registers now). Any application using the Z80's exx or ex af, af' instructions would be difficult to translate automatically. Other Z80 instructions are easier, e.g. djnz has no exact equivalent (the 8086's LOOP CX is a 16-bit equivalent, but there is no 8 bit version), and instructions like ldi, ldd, etc are broadly (although not precisely) equivalent to the 8086's string processing instructions (e.g. MOVSB) and repeat prefix (REP MOVSB being roughly equivalent to ldir) -- although the precise details are different, meaning some register remapping may be necessary to make them work. On the whole, the possibility of doing automatic translation of Z80 programs is a whole lot less convincing.

Interesting. So that explains the stupid layout of 8086's FLAGS where OF is outside the low 8, and thus not saved/restored by SAHF/LAHF. IDK why they didn't use one of the low bits 8080 didn't use (1, 3, or 5), which are still "reserved" in modern x86's EFLAGS/RFLAGS. BTW, pushf / popf were introduced in 186, so 186 could push/pop flags directly (but the whole FLAGS, not combined with AL). — Peter Cordes, May 17 '18 at 02:39
Another fun fact about SAHF and flags layout: x87 was designed to line up with it, so fnstsw ax / sahf / ja float_compared_above works. http://www.ray.masmcode.com/tutorial/fpuchap7.htm#fcom. This is why we use unsigned JCC / SETCC / CMOVCC conditions after FP compares. (Because P6 fcomi sets EFLAGS directly the same way, and so do SSE/AVX scalar compares into EFLAGS like comisd). Although using CF instead of OF is a good choice anyway to enable branchless code for i+=(a<b) because CF is special: adc, sbb, (undocumented) salc (set AL from carry), and rcl use CF, not OF. — Peter Cordes, May 17 '18 at 02:46
LOOP CL: nope, 8086's loop instruction implicitly uses CX, not CL. It's 2 bytes: opcode + rel8. In modern x86, the register it uses is determined by the address size (not operand-size like you'd expect). e.g. 386 can use a 67 prefix to loop on ecx instead of cx in real mode (or vice versa in 32-bit mode). But there's no 8-bit CL version. How exactly does the x86 LOOP instruction work?. (IDK if the size-override weirdness is related to it being slow on everything except Bulldozer/Ryzen). — Peter Cordes, May 17 '18 at 02:58
@PeterCordes "IDK why they didn't use one of the low bits 8080 didn't use (1, 3, or 5)" ... which is particularly bizarre when you consider that on the 8085 bit 1 actually was an undocumented overflow flag... — Jules, May 17 '18 at 09:37
Interesting. I wonder how much thought the designer of 8086's ISA gave to all this. Apparently it was almost totally done by one guy, Stephen Morse. I wonder if he has any regrets in hindsight now that x86 has taken over the world. Most of the things I wish in hindsight (like not spending so much opcode space on 1-byte xchg) were (probably) smart for 8086 at the time, but this might be a case of being overcautious for not-yet-written translators. — Peter Cordes, May 17 '18 at 09:45
@PeterCordes: I've done by share of griping about the 8088 back in the day, but I now recognize that the fundamental design was a work of genius. The thing I'd be most curious about is what Stephen Morse thinks of his segmentation design. IMHO, the 8086 really needed one more segment register, and needed a few more instructions to operate upon them [e.g. load ss,immed and maybe add ss,immed / add ss,rm], but the overall design is still better than anything I've seen since. — supercat, May 18 '18 at 16:41
@Jules: I think the Z80's "ld a,(bc)", "ld a,(de)", "ld (bc),a", and "ld (de),a" were all 8080 instructions, though I don't know the names. I can't think of any way to emulate any of those in less than four bytes. Otherwise, you may be interested to know that the manual for the NEC V20 (an 8086 clone) uses the names IX and IY to refer to the SI and DI registers (a fact which confused me for awhile while trying to understand what the bit-field-related instructions were for). — supercat, May 18 '18 at 16:46
@supercat - good point. Yes, they are 8080 instructions - mnemonics are LDAX B, LDAX D, STAX B and STAX D. Probably the easiest way is to copy into SI or DI, which as you suggest ends up as 4 bytes (e.g. MOV DI, CX; MOV AL, [DI] -> 89 cf 8a 05 ... and I've just discovered that the Mac I'm using has NASM installed by default, which makes me unreasonably happy). — Jules, May 18 '18 at 21:00
@Jules: I'd actually been thinking of using xchg with bx, but I just realized that only the ax version is a single byte. The use of a 5-bit rm field to select a register or one of 24 addressing modes is rather clever, though I'm curious how the hardware cost would have been affected if the eight "short offset" forms had e.g. been replaced by four forms that use the MSB of the next byte in the address-mode selection. That would necessitate the use of 16-bit offsets in cases where the offset was beyond the range e.g. +/- 64, but free up four bit patterns for other addressing modes. — supercat, May 18 '18 at 21:16
@Peter Cordes: "BTW, pushf / popf were introduced in 186," -- Actually that reference is wrong, these instructions are part of the original 8086 instruction set. In my revision of the NASM instruction set reference this is correctly listed as 8086 level, though I did not have to make that edit. https://ulukai.org/ecm/insref.htm#insPOPF -- It appears the edit was made in https://hg.ulukai.org/ecm/insref/rev/5c1b4598d4ed#l1.3020 on 2002-05-13, also at https://repo.or.cz/nasm.git/commitdiff/ac6fb42b4bfdfbcd4856f45d971ced31782e14e4 — ecm, Nov 30 '19 at 18:00
@ecm: ok that makes more sense. That ancient version of the NASM appendix had multiple other errors, like when imul forms were introduced. Updated https://stackoverflow.com/tags/x86/info to link to the current NASM appendix, https://nasm.us/doc/nasmdocb.html (which documents instruction forms in a less readable format, using stuff like "sbyteword" instead of "r/m16" or "r/m8" to save lines I guess. — Peter Cordes, Nov 30 '19 at 18:43
@Peter Cordes: The new appendix is created from the internal table that is used as input to the assembler's compilation. That is why I forked the 2.05 based reference. It was removed because it (almost) completely lacks AMD64 extensions and other new instructions, and wasn't maintained any longer. But I still found it useful for [013]86 programming. — ecm, Nov 30 '19 at 18:58
@ecm: Ok, so https://ulukai.org/ecm/insref.htm is a (hopefully) correct version of the old appendix for most of the useful instructions? Yeah, that's definitely more human friendly. I'll link that instead of the old posix.nl copy as a human-friendly version in the tag wiki. — Peter Cordes, Nov 30 '19 at 19:20
The 8086 Primer, page 43, says "the LAHF and SAHF instructions exist mainly to permit programs written for the 8080 to be translated into efficient 8086 programs". — Single Malt, Apr 11 '20 at 14:38
Tiny correction, PUSH/POP PSW needs more than LAHF/SAHF after, as the flags need to end up on the lower stack address, so ' PUSH PSW' ends up being 4 instructions: LAHF/XCHG AH,AL/PUSH AX/XCHG AH,AL, while POP PSW is three: PPOP AX/XCHG AH,AL/SAHF. There are also more opcodes without direct counter part, needing 'helpers'. For example DAD/DCX/INX do need to dance around the flags using SAHF/LAHF, as they do not change them. There might be others I do not recall ATM. — Raffzahn, Oct 25 '22 at 07:59

The start of x86: Intel 8080 vs Intel 8086?

2 Answers2

Linked