55

The 8086 used a segmented memory architecture where the linear address was computed from a 16-bit segment number and a 16-bit offset. This greatly complicated things from a programming perspective. The Motorola MC68000, designed at about the same time, used a flat 32-bit linear address space and was much easier to program.

I understand that source-level compatibility with the 8080 was a consideration but surely the CPU could have started up in, say, 8080 mode where the 16 MSBs of the address registers would be forced to zero then switch to full 32-bit (or 20-bit) addressing via a mode-switch instruction.

What exactly were the reasons the designers of the 8086 chose a segmented memory architecture instead of a flat, linear one?

user3840170
  • 23,072
  • 4
  • 91
  • 150
Alex Hajnal
  • 9,350
  • 4
  • 37
  • 55
  • 8
    The 68000 ran in something like your 8080 mode: it had a 24-bit address space, ignoring the top eight bits of addresses. This caused problems when the 68020 switched to a full 32-bit address space, as explained here. – Stephen Kitt Jul 08 '18 at 07:22
  • @StephenKitt Yea, I know about that (learned assembly on the 68k). Funny, at the time I didn't realize that it had a 16-bit ALU (just knew that some instructions took longer than others and I avoided MUL and DIV like the plague). Still, the 68k was a clean break from the 6800, at least internally. – Alex Hajnal Jul 08 '18 at 07:26
  • @AlexHajnal Not an exact duplicate, but if you haven't already read it, take a look at 8086 pinout and address space limit – manassehkatz-Moving 2 Codidact Jul 08 '18 at 14:07
  • 6
    Wrong question: There's nothing wrong with base registers (which really, is what the 8086 "segment" registers actually are.) The real question you should be asking is, "Why did they think anybody would be happy with 16-bit pointers on a machine that was capable of addressing 20-bits worth of physical memory?" If the 8086 only had A1-A16 address lines, then nobody ever would have complained about the segment registers; but also, Either IBM would have chosen a different processor for the IBM PC, or else the IBM PC would have gone the way of the Apple ][ and the Commodore 64. – Solomon Slow Jul 09 '18 at 02:34
  • 2
    @jameslarge The 68k was considered for the IBM PC; preferred in fact but unfortunately the 68k wasn't fully debugged at the time the decision of what CPU to use in the 5150 was made. At decision time they had decided to go with a 16-bit (or better) processor in the PC and the 808[68] seemed like the safest bet. – Alex Hajnal Jul 09 '18 at 02:49
  • 8
    I think this quote from DTACK Grounded #10 sort of sums it up: In fairness to Intel, the 8086 was not INTENDED to be the best. It was intended to be FIRST, with downward software compatibility. – tofro Jul 09 '18 at 07:50
  • 2
    The real question is not “Why did the 8086 use …”, but “why did anyone use 8086”. It was so out of date the day that someone at IBM suggested the creation of the PC. – ctrl-alt-delor Jul 11 '18 at 13:49
  • See also PAE of x86 https://en.wikipedia.org/wiki/Physical_Address_Extension Allowing more than 4GB (2 or 3GB ram) of virtual memory on 32 bit x86. – ctrl-alt-delor Jul 11 '18 at 13:54
  • Some comments to go along with the answers provided below. Note that Intel had a program to convert 8080 / 8085 code into 8088/8086 code. This part of the reason for the 8088/8086 PUSH AF and POP AF instructions. I used this program to assist with a conversion of a CP/M system to a CP/M-86 system. – rcgldr Jul 13 '18 at 03:32
  • The Mac OS used a "segment" like model for the 68000 because the early Macs (1984) only had 128KB of memory. Worse yet, allocated memory was handled via pointers to pointers called handles, normally located in the first 32k "segment", and many of the system calls did garbage collection, requiring programs to update local pointers based on the handles. Then due to backwards compatibility issues, the Mac OS continued with this segmented model long after it was no longer needed. There was a Unix like OS called AIX for later Macs. – rcgldr Jul 13 '18 at 03:34
  • The Atari ST, released about a year after the first Macs in 1985 came with a minimum of 512KB of RAM and the OS in ROM, used a flat address space and was essentially DRI's port of MSDOS called GEM to 68000. The Atari ST could read / write PC 3.5 inch floppy disks (double density) and even hard drives if the hard drives used the FAT12 system. Still keep in mind that this was 4 years after the initial release of the IBM PC. – rcgldr Jul 13 '18 at 03:39
  • @rcgldr A few minor corrections regarding the ST: GEMDOS was actually DRI's DOS work-alike. GEM was the GUI (Graphics Environment Manager). BTW, there was also a 260ST (with 256kB RAM) but few (if any) of these made it onto the market. Hard drives were FAT16 or FAT32 with FAT12 for floppies; later OSes used VFAT. – Alex Hajnal Jul 13 '18 at 03:59
  • @AlexHajnal - PC XT and MSDOS at the time used FAT12 for hard drives, up to 16MB per partition (32MB was the limit on hard drive size, due to a 16 bit sector count for a hard drive). Later versions of MSDOS still allowed FAT12 to be used if partition size was <= 16MB. You're correct about the naming convention of GEMDOS for the MSDOS "clone", and GEM for the GUI. Atari ST didn't have a hard drive interface, just a partial parallel interface that an adapter could convert into a SCSI interface. Still I recall someone making an interface to allow an Atari ST read/write a PC compatible hard drive. – rcgldr Jul 13 '18 at 04:26
  • @rcgldr I could be mistaken about FAT16; I'd have to physically dig out my old kit to be sure. IIRC, the 20MB hard-drive I used had 1 partition formatted as FAT16; the later 50MB and 320MB drives had 1 and 4 partitions respectively formatted as FAT32 (I think the ~80MB partition size limit was a TOS thing). Using SCSI disks through ASCI (using an adapter) was never an issue and the TT had a true SCSI port. – Alex Hajnal Jul 13 '18 at 04:50
  • @rcgldr Also, I presume the drives were PC compatible. They were standard 50-pin SCSI-1 drives with MBRs and (V)FAT partitions. I've only ever accessed them from Atari or Linux systems so I'm not sure e.g. DOS can read them. According to fsck.vfat(8) TOS does things a bit differently than MSDOS at the low level. In practice I never encountered problems transferring files on (floppy) disk between Atari and DOS/Windows/Mac. – Alex Hajnal Jul 13 '18 at 05:05
  • 1
    DTACK Grounded #10 link: http://www.easy68k.com/paulrsm/dg/dg10.htm – Thorbjørn Ravn Andersen Feb 16 '19 at 15:39
  • @Alex Hajnal 260ST had 512K of RAM (256KiB is an impossible hardware configuration for a 16 bit wide bus with standard memory chips 256Kbit x 16) but had the TOS to load from floppies leaving only around 256K for the user. A 128K machine was planned at the beginning but abandoned quickly. – Patrick Schlüter Jun 28 '22 at 06:03
  • @rcgldr you're wrong, the ST had the ACSI interface from day one. ACSI was based on a preliminary SCSI draft before the norm was finished. ACSI is very close to SCSI, it just has inverted logical levels and has a very similar line protocol. The first harddisks for the ST (Megafiles, SH205) had ST-506 adapter to be able to use the then mainstream MFM and RLL harddisks. SCSI adapter came later or from third parties (Vortex). – Patrick Schlüter Jun 28 '22 at 06:08
  • @PatrickSchlüter - the parallel interface I mentioned was ASCI, an interface board was needed to use an ST with a SCSI hard drive (or tape drive, I wrote a driver for one of the interface boards, doing an image backup, but treating the tape drive like a read only hard drive for restore, caching the FAT table in RAM to reduce random access.) – rcgldr Jun 28 '22 at 06:32

8 Answers8

100

For once, I do have a direct source for a "Why didn't they ...?" question. Eric Isaacson, back in the late '80s and '90s, wrote a commercial assembler for the 8086, called A86. (His homepage still has a section offering it for sale for $50, $52 outside North America, and explaining why it's the best assembler on the market for DOS. You can even download the extra patch supporting 80386 instructions for free.) What's of interest to us is this story from chapter 10 of the manual. The date on this copy is 1999, but the earliest versions of the document were written in 1986. I've bolded one part of this.

The (grotesquely ornate) level of support for segmentation was dictated by Intel, when it specified (and IBM and the compiler makers accepted) the format that .OBJ files will have. I attended the fateful meeting at Intel, in which the crucial design decisions were made. I regret to say that I sat quietly, while engineers more senior than I applied their fertile imaginations to construct fanciful scenarios which they felt had to be supported by LINK. Let's now review the resulting segmentation model.

[...] The scenario is as follows: suppose you have a program that occupies about 100K bytes of memory. The program contains a core of 20K bytes of utility routines that every part of the program calls. You'd like every part of the program to be able to call these routines, using the NEAR form to save memory. By gum, you can do it! You simply(!) slice the program into three fragments: the utility routines will go into fragment U, and the rest of the program will be split into equal-sized 40K-byte fragments A and B. Now you arrange the fragments in 8086 memory in the order A,U,B. The fragments A and U form a 60K-byte block, addressed by a segment register value G1, that points to the beginning of A. The fragments U and B form another 60K-byte block addressed by a segment register value G2, that points to the beginning of U. If you set the CS register to G1 when A is executing, and G2 when B is executing, the U fragment is accessible at all times. Since all direct JMPs and CALLs are encoded as relative offsets, the U-code will execute direct jumps correctly whether addressed by G1 with a huge offset, or G2 with a small offset. Of course, if U contains any absolute pointers referring to itself (such as an indirect near JMP or CALL), you're in trouble.

It's now been over a decade since the fateful design meeting took place, and I can report that the above scenario has never taken place in the real world. And I can state with some authority that it never will. The reason is that the only programs that exceed 64K bytes in size are coded in high level language, not assembly language. High-level-language compilers follow a very, very restricted segmentation model-- no existing model comes remotely close to supporting the scheme suggested by the scenario. But the 86 assembly language can support it [...]. The LINK program is supposed to sort things out according to the scenario; but I can't say (and I have my doubts) if it actually succeeds in doing so.

Note that this is discussing the software support for segmentation in the MS-DOS linker, rather than the hardware support.

Another reason that we have clear evidence for: Intel's previous chip, the 8080, supported a 16-bit, 64KiB, memory space. The dominant operating system for it was CP/M. Intel and Microsoft both made a serious effort to make the new environment as source-compatible with CP/M code as possible.

I will give examples from MS-DOS 1.0, because that was the most important and best-documented OS that took advantage of this feature. Back when the ISA was being developed, IBM had not yet chosen the 8088 for its Personal Computer Model 5150, there was a large library of 8-bit CP/M software, and memory was even more expensive, all the considerations I am about to mention were even more crucial.

The segmentation scheme allowed an OS for the 8088/8086 to emulate an 8080 running CP/M with minimal hardware resources. Every MS-DOS program was initialized with a Program Segment Prefix, which just like it says on the tin, was loaded at the start of the program segment. This was designed to emulate the Zero Page of CP/M. In particular, the 8080 instruction to make a system call in CP/M was CALL 5. If you use that instruction in a MS-DOS program, it will still work. The Program Segment Prefix will be loaded into CS, and CS:0005h contains a jump to the system-call handler of MS-DOS.

Segmentation effectively gave every legacy program its own 16-bit memory space, allowing it to use 16-bit pointers as it always had. This both saved memory and saved 8-bit code from needing to be extensively rewritten. (Recall that most software today runs on 64-bit CPUs, but ships as 32-bit programs because smaller pointers are more efficient; this was even more crucial on an original IBM PC model 5150.) A .COM program was based on the executable format of CP/M, and got by default a single segment for code, data and stack, so a program ported from CP/M could treat the 8086 as a weird 8080 where it had the whole 64KiB of memory to itself and the registers had different names. MS-DOS 2.0 let programs start with separate segments for their code, stack, data and extra data, still using 16-bit pointers and offsets. Of course, a program aware of the full 20-bit address space could request more memory from the OS. In MS-DOS, it could request memory in 16-byte "paragraphs" (bigger than a word, smaller than a page) and would get them in a new relocatable segment, whose exact value it did not need to care about or waste a precious general-purpose register to store. (Unlike shared libraries for the x86 and x86_64 today!)

But why not shift segment registers 16 bits to the left, instead of four, for even less complexity, and let programs use only the lower-order 8 or 16 bits of a 32-bit address space, for offsets within the current page of memory? The apparent advantage is that it allowed segments to be aligned on any 16-byte boundary, instead of a 65,536-byte boundary. Any program back then needed to use a lot less than 64KiB of memory, and exceptionally few computers even shipped with the 256KiB that the CS, DS, ES and SS registers could address. The OS, the program in the foreground and every program to Terminate-and-Stay-Resident could not all have gotten their own 16-bit address spaces, much less separate ones for their code, data, and stacks, if every segment had needed to be on a 64KiB boundary. But, with the memory model Intel used, programs could use 16-bit pointers with much smaller memory blocks.

Finally, remember that gigabytes of memory was a preposterous figure in 1976. Intel correctly realized that they'd have plenty of time before it was ever worth worrying about needing 32 bits of address space. Even their 80286, generations down the line, only supported a 16-MiB address space and allowed 16-bit near pointers. What they didn't foresee was that their 8086 ISA would ever become as dominant as it did—and that they'd be stuck with it, in a market that demanded 100% compatibility.

isanae
  • 1,009
  • 2
  • 6
  • 10
Davislor
  • 8,686
  • 1
  • 28
  • 34
  • 3
    Now that's an interesting find. Bookmarked. Thanks a lot. Nonetheless, I find it a bit strange to argue with the 1981 IBM-PC and its software constrains to explain a the decisions made in 1976. Isn't it? – Raffzahn Jul 08 '18 at 20:36
  • 1
    @Raffzahn Good point. That was by far the most popular OS for the 8088/8086, but the need for backward-compatibility with the 8080 was even more pressing before IBM chose the 8088 for its Model 5150. – Davislor Jul 08 '18 at 20:44
  • 3
    @Raffzahn I do not know when that story was added to the A86 manual, other than that it was between 1986 and 1997 (the earliest copy on archive.org). From the reference to IBM, he was probably writing in the early '90s about a meeting circa 1980. – Davislor Jul 08 '18 at 21:34
  • 13
    why [A86 is] the best assembler on the market for DOS ... random offtopic remark: the somewhat expensive licensing for it (from my perspective at the time as an 18-year old student) was one of the primary reasons that the actual best assembler on the market for DOS (i.e. NASM) ended up being written. :) – Jules Jul 09 '18 at 04:33
  • 2
    @Jules I've used it! Thank you. If I were motivated to complain about a86 today, I would probably.point out that being able to compile a huge number of simple statements per second is not useful if, as he says, asm programs larger than 64K do not exist. And TASM is worse because it does support more complex syntax? But, it's charming that that page from the 20th century, which was already retro when it went up, is still there. – Davislor Jul 09 '18 at 05:10
  • "I can report that the above scenario has never taken place in the real world"?? – tofro Jul 09 '18 at 07:06
  • I don't understand the point about non-65,536 byte boundaries saving space. Modern operating systems use virtual memory, and only allocate a backing physical memory page for the virtual memory page when the virtual page is written to by the program. Did old operating systems not do that? – Buge Jul 09 '18 at 07:41
  • 8
    @Buge No, they didn't. Not for at least 10 years to come when the 8086 was designed. – tofro Jul 09 '18 at 08:42
  • @huge PCs back then did not even have fixed disks. (At least not commonly.) – Davislor Jul 09 '18 at 15:59
  • 1
    @Buge the architectures in question didn't offer a protected memory model until 80386, when the first chip-supported memory fence was integrated. Meaning you could write an app that cheerfully read and wrote the memory of other running programs. – Mark McKenna Jul 09 '18 at 17:24
  • @Buge, et.al., RE: Although virtual memory existed in HW and OS's since the '60s, the industry and market was generally hostile to it. The introduction of the VAX and its operating system VMS in 1997 by DEC was probably the watershed moment for it (I know it was an incredible epiphany for me). PC architectures would start implementing in the early 80s. – RBarryYoung Jul 09 '18 at 17:50
  • @RBarryYoung I don't think of it as "hostile", so much as "can't afford the clock cycles and hardware to support it." VM was well established much earlier on mainframes and Windows 95 already had some form of virtual memory. Definitely well established long before 1997. Maybe you mean 1977 - the introduction year of the VAX 11/780? – manassehkatz-Moving 2 Codidact Jul 09 '18 at 19:52
  • 2
    @manassehkatz Yeah, 1977, sorry. – RBarryYoung Jul 09 '18 at 20:44
  • 1
    As ugly of a hack as it is, x86 memory segmentation actually makes for some really neat hacks in modern processors. – forest Jul 10 '18 at 02:02
  • 3
    btw, some 80186 models support 16MB address space in real mode by shifting segment value by 8 bits instead of 4 – Igor Skochinsky Jul 10 '18 at 22:07
  • @RBarryYoung+ PDP-11's other than the cheapest models had 'memory management' from about 1972 (Unix from almost its earliest days required MM) and IBM S/370's had 'virtual storage' from 1970 which IBM promoted as the wave of the future and was from my perspective widely and mostly enthusiastically accepted. – dave_thompson_085 Jul 17 '18 at 23:09
  • 1
    @dave_thompson_085 “memory management” (which was really just a more general form of segment mapping) was a very different thing from “virtual memory”. I did System’s programming on both the pdp-11 and the VAX when it came out and the difference could not have been more profound. – RBarryYoung Jul 18 '18 at 19:07
  • 1
    This answer is getting a wave of upvotes recently. Thank you to everyone! But I’m curious: did it get linked somewhere? – Davislor Feb 14 '19 at 21:52
  • @IgorSkochinsky: I wonder if it would have been practical to design an x86 where segments in the range 0xF010 to 0xFFEF would shift the bottom 12 bits left by 12, and segments below 0xF000 would use the normal shift by 4? I think that would have retained compatibility with most software, while allowing a simple means by which code that doesn't need to perform segment computations on pointers could access up to 16MiB of address space. – supercat Sep 18 '20 at 21:17
  • I am not sure that the meeting referred to, discussed the design of the CPU - it sounds more like it was the design of the output files of the assembler so all scenarios was covered. – Thorbjørn Ravn Andersen May 25 '22 at 21:45
  • 1
    There were a few assembly language programs that went over 64KB. I worked on one for the Apple II, the Bitstik CAD system, which ran in a heavily bank-switched environment, provided by a Saturn 128KB memory card plus a 64KB Apple II+ or IIe. The link scripts were quite complex, and would have made the A-U-B model seem plausible. Later on, some of the same programmers used complicated tree-structured overlays to fit RoboCAD, written in C, into MS-DOS' 640KB with space for some data. Life really is simpler these days. – John Dallman Jun 25 '22 at 17:58
  • @JohnDallman Very interesting! By the way, I’m suddenly getting lots of upvotes on this little old answer of mine. Did someone by any chance link to it recently? – Davislor Jun 25 '22 at 23:22
  • @Davislor Occasionally user- or system-generated edits will bump a question back up to the top of the front page where it will attract attention. In the case of your Jun 25 comment, the bump was from a Jun 24 edit to fix a broken link. – cjs Dec 12 '22 at 03:55
  • As far as I can tell, the first half of this answer with the long quote from the A86 manual is not related in any way to the question, and should be deleted. – benrg Nov 18 '23 at 23:21
  • The Turbo Pascal model prior to 4.0 allowed initialized data to be stored in the code segment, which allowed some constructs to be much more efficient than would otherwise be possible. A CPU flag that would make SS be the default for all accesses in the absence of a DS qualifier could have also been extremely helpful for programs whose stack and static data would together total 64K or less, by allowing DS to be used much like ES. – supercat Feb 16 '24 at 18:06
27

The 8086 used a segmented memory architecture where the linear address was computed from a 16-bit segment number and a 16-bit offset. This greatly complicated things from a programming perspective.

I beg to differ. Using segments doesn't 'complicate' things in any way. Sure, it may require a different style of structuring the data used and there are very few cases, in real world applications, where pointer arithmetic is needed.

The whole issue is a bit like a NYC cabby crawling bumper to bumper on the FDR and complaining about the speed limit on highways and how great the German Autobahn is. While right in theory, there is no practical implication as his job will never let him use this advantage - even if he moves to Europe and becomes a Taxi Driver.

The Motorola MC68000, designed at about the same time, used a flat 32-bit linear address space and was much easier to program.

Not so sure. Mind you, that not only Unix was originally designed segmented (and C still carries this over), but as well the (original) MacOS. Here a user application had to make OS calls for dereferenceation all the time to allow its memory management to work. The flat 68k address space is maybe nice for embedded programs or cases where the simple assumption of one programmer, one program and one machine works out, but not in a more sophisticated environment.

I understand that source-level compatibility with the 8080 was a consideration but surely the CPU could have started up in, say, 8080 mode where the 16 MSBs of the address registers would be forced to zero then switch to full 32-bit (or 20-bit) addressing via a mode-switch instruction.

That would have restricted the machine to an emulation mode, where converted Software could only use a single 64 KiB address space, leaving the rest of memory dormant. And only one such program could run at a time. So effectively removing every reason to switch away for an existing 8080/85 system.

Unless one invents some kind of relocation mechanism that is. Like having a process pointer to give separate 64 KiB address spaces to each program, or the ability to address additional memory beyond 64 KiB with some banking or such... oh, wait, that's exactly what segments do.

Keep in mind that 8085 source compatibility is not a feature in itself, but a request to satisfy customer needs. And one of the biggest needs was to allow more code to be handled (and maybe a bit more data *1). The issue of source code compatibility is often seen as a rather easy thing. Just converting some instructions into equal and that's it. But software also and eventually foremost means data structures. A part that often needs more than a few key presses to be adapted. The instruction mix of the 8086 had for (next to) all 8080 instruction a functional equivalent. While the code representation (binary) may change not only in content but also in size, data structures and their interaction could be replicated without any change.

Adding segment registers offered an easy way to port application with only a few added lines to the 8086 and enabling the use of 64 KiB code plus 64 KiB data (*2). Depending on the applications call structure this could be extended to several hundret KiB of code

As well important is complexity of the CPU itself. The 8086 is a very clean 16 Bit CPU. There are no 32 bit operations (*3) except to optimize far pointer loading. Unlike the 68k with an outright bloated code requirement for handling 16 and 32 bit data types. Not the least reason why 8086 did outperform the 68k in real applications at comparable speed rates (*4).

So, why spending more than double the transistors (*5) on a more complex CPU design that yields less performance? For a feature that is hidden by compilers anyway?

What exactly were the reasons the designers of the 8086 chose a segmented memory architecture instead of a flat, linear one?

It's easy to count several good reasons:

(A non exhaustive list what just came to mind. There might be many more to be found by spending more time to think about)

  1. First and most important, it's a clean 16 bit CPU. There is no need to handle any other data type (beside byte for memory access).

  2. KISS.

  3. Easy port of existing 8080/85 software due a virtual maximum data type of 16 bit for pointers.

  4. Extension of available code space to full 64 KiB without modification

  5. Easy extension of available code space (with a minimal level of modularization) for existing (ported) software.

  6. Extension of available data space to full 64 KiB without modification

  7. Speed. By reducing the majority of pointer operations to 16 Bit instead of 32 on a 68k (*6)

  8. Speed. By reducing code size due a restriction of all code and data pointers to 16 bit. After all, more compact code needs less bus bandwith, leaving more for real work.

  9. Simple memory management for multi tasking

  10. Simple memory management for multi programming

  11. Support existing complex OSes like Unix

and quite important (and to make the dozend full):

  1. (well behaving) 8086 application can run seamless in a (future) virtual memory environment without any modification.

It might be useful keep in mind that in 1977 mini systems were still build (and bought) with 64 KiB RAM or less. Even a small memory model application (64 KiB for each segment), couldn't be handled by such. Here the 8086 was quite at the height of time. Segmented memory was the way to go. At that time next to all classic 'flat' architectures had reached their EOL. Eventually except the /370, but it might not count as realy flat due its general base+offset addressing scheme.

Long story short, the 8086's segmented scheme was not only on the heights of its time but also a very efficient and foreward looking design.


Addendum: There might be many details to critisize of Intels design decisions in hindsight, like supercat (*7) favouring an 8 bit offset (instead of 4), and some may look quite appealing (*8), still the 8086 did include a remarkable lot of features in a very small and limited design, making it back then a huge success outside the (later) PC market. x86 Unix systems did outsale 68k based ones by a magnitude, especially due its early ability to handle a large physical address space via segmentation.


*1 - The main market for 8080/85 systems were not dektop computers, but embedded systems. Desktop users were in 1977 not only a real minority, but also still happy to have even some memory, with full 64KiB being a dream for the wealthy. Complex embedded systems in contrast did already back then scratch the 64 KiB limit.

*2 - The additional stack segment doesn't really count, as such application would only need a rather meagre amount of stack.

*3 - For nitpicking, there is multiplication and division as 16x16 and 32x16

*4 - At the same clock rate a 80286 outperforms a 68000 by about 20%

*5 - An 8086 got ~29,000 transistor functions, while the 68k is said to have 68,000

*6 - Not to mention that the 68k here always had to shovel a 33% overhead

*7 - Part of this chat. mschaef did make a similar comment as Alex Hajnal reminded. Alex is a serious data archeologist :)

*8 - When thinking about the 8086 and its memory management I'm usually rather satisfied what has been made possible, as the segmented aproach is quite useful for multi programming (and multi tasking as well). I can only blame the designer(s) for not having thought a tiny little step ahead for a by adding support for a software managed MMU supporting memory protection and swaping. All the hardware needed would have been 4 segment size registers(SSR), 4 16-bit-wide OR gates and one comperator active during Effective Address (EA) generation.

Whenever an EA is calculated and the appropriate SSR register is non zero (checked via the OR gate) the resulting EA gets compared to its SSR. When low or equal processing continues, when higher a 'Segment Violation' Exception happens - much like INT 0Dh later on the 286. Now a some (OS) handler can decide what to do.

Similar all instructions loading any segment register (or SSR) would (when segment checking is active) issue a 'segment check' exception. Again like the 'Segment Not Present' (INT 0Bh) on a 286 (or the whole 0Ah/0Bh/0Ch group). As well an OS handler could check if the new segment value is one assigned to the program.

Whenever an exceptions happens the checking is disabled to allow the handler to act as needed (which also would make disabling it quite easy for an OS without memory protection by adding a NOP handler to kill any checking that got activated by 'accident') and must be switched on again before returning.

With this all need for a protected mode OS would be present. Sure, segement switching would be rather slow, compared with a hardware solution like with the 286, still better than noting and at minimal additional cost. Not to mention that if DOS had used this feature, many ugly programs and even less programming styles, would have prevailed :))

Raffzahn
  • 222,541
  • 22
  • 631
  • 918
  • 8
    Don't forget space. An architecture where application code was expected to keep track of 32-bit addresses by default would mean that pointer-intensive programs suddenly need twice as many (expensive-at-the-time) kilobytes to store their data in. – hmakholm left over Monica Jul 08 '18 at 14:15
  • Comments are not for extended discussion; this conversation has been moved to chat. – Chenmunka Jul 09 '18 at 09:37
  • 9
    I really do not know how you can argue that the segmented memory is simpler than a flat address space. It's simply false. The argument about support for Unix is also wrong. Unix likes a flat address space, not that the 8086 could support Unix with no virtual memory management or memory protection. That simply wouldn't have been a concern for the 8086 designers. – JeremyP Jul 09 '18 at 11:03
  • @JeremyP Then, how was it possible that for example 8086 based Unix systems like the PC-MX where sold in large numbers? Also, memory protection may be helpful, but is no requirement for Unix. The same goes for virtual memory. In contrast, segmentation is a great foundation for Unix. It provides tha ability of easy relocation and swaping as well. – Raffzahn Jul 09 '18 at 11:20
  • You say "1977 mini systems where still build (and bought) with 64 KiB RAM or less", if you meant *micro* systems, then this is true. However, the *mini* category was marked by the introduction of the VAX that year, and AFAIK none of them were ever sold with less than 128k memory, and from then on very few others were sold with that little memory.. – RBarryYoung Jul 09 '18 at 17:57
  • 9
    Having started my professional career on the IBM 370 in 1974, I assure you that segmentation did complicate things for me. A flat address space (like on the CDC 6600) was easier to work with. – David Thornley Jul 09 '18 at 18:05
  • 1
    @RBarryYoung Serious? You mean, with introduction of the VAX every other mini instantly vanished? Someone should have told that DEC, as they where sellings many michines with less than 64 KiB way into th 80s. Maybe not VAX11/780, but many 11 with as low as 8 KiB (11/03 @ 2100 USD in 1980:)) – Raffzahn Jul 09 '18 at 19:14
  • @DavidThornley Having worked most of my professional career on /370(ish) systems. I never felt a problem with the 8086's segmentation. in fact, having 64 KiB per base register was quite a luxury compared to 4 on a /370 :) – Raffzahn Jul 09 '18 at 19:17
  • 1
    @Raffzahn I think your *7 is here – Alex Hajnal Jul 09 '18 at 20:10
  • 1
    @RBarryYoung Nope. The PDP-11 is classed as a mini computer, as is the Data General Nova and many other pre-1977 designs. – JeremyP Jul 10 '18 at 08:52
  • @JeremyP Where they? Glad all the Users of such multiuser systems never knew. Memory protection is not a prerequisit. Neiter for multiprogramming nor multiuser. Just for crappy programmers. Also, swaping does not require recovery from memory fault. That's only needed for memory request on usage. Different issue and again not neccessary to have a Unix system. – Raffzahn Jul 10 '18 at 08:55
  • 3
    @Raffzahn Most programmers are crappy. Even the non crappy ones make mistakes sometimes. Without memory protection, writing data to an uninitialised pointer (a common issue) has the chance of bringing down the entire operating system. It might have been just about OK on an 8086 based system, but nobody ever seriously used an 8086 based system for multi-user. – JeremyP Jul 10 '18 at 08:59
  • 1
    @Raffzahn Unix programmers write their code in the expectation that they have a flat memory map and memory swapping is transparent to them. i.e. if they read or write to a piece of memory that is actually on disk, their program will be suspended automatically and the OS will pull in the memory from disk. That is not possible without some form of memory protection (so the OS can detect such writes) and a means of restarting the offending instruction. You can do swapping on an 8086 but only with the programmer managing the memory manually. – JeremyP Jul 10 '18 at 09:02
  • @JeremyP On might think that companies like Intel, Altos, Siemens or Tandberg should have known that before selling 8086 based unix systems. Like the PC-MX as a small (~6 terminals) multuser system. And no programers don't assume a flat memory space, at least not educted ones. But you seam to mix different concepts here. 8086 Unixes are based on swaping, what you describe memory allocation on demand, which (usually unless there is very specific hardware) is restricted to systems with paged memory. These are different memory systems. Also, for monologues you might want to switch for chat. – Raffzahn Jul 10 '18 at 09:07
  • 2
    @JeremyP It depends on the point of view if segmented or linear memory is simpler: DOS ".COM" files would definitely not be possible in linear memory; Atari's ".TOS" format would be the simplest working format I know. Next thing is that the 8086 was not planned as 16/32 bit chip (unlike the 68k). I know from the STM8 how more than 64k of linear memory is used in a Chip not having registers with more than 16 bits ... It's not so easy for the programmer! – Martin Rosenau Jul 10 '18 at 20:34
  • 2
    @Raffzahn I know what swapping is. The 8086 does not support it without programmer intervention. I also did not say programmers expect a flat memory model, I said Unix programmers expect a flat memory model. They expected pointers and sufficiently wide ints to be interchangeable. – JeremyP Jul 11 '18 at 13:43
  • Altos was a fairly successful seller of 8086-based Xenix systems in the 1980s. (I don't recall whether they added additional hardware for memory protection.) On Unixy systems the various memory models available on the 8086 and 80286 always caused pain and some of the earlier ones (Coherent?) without a really sophisticated compiler larger programs, such as Perl, simply couldn't be built, IIRC. – cjs Jul 09 '19 at 00:54
  • 1
    I just noticed this answer. The only things I really fault Intel for on the 8086 was the shortage of segment registers. Using a shift of 4 rather than 8 was a reasonable choice for the 8086; later chips should have split segment selectors into a descriptor-selector and an offset. The optimal architecture for something like .NET would be to have some segments use a scale factor of 16, while others use scale factors of 256 or 4096. That would have made it possible for .NET to use 32-bit segment values as object references to access many terabytes of storage. – supercat Jul 16 '19 at 17:21
  • 2
    I really never had any issue with the segment register setup. I found it convenient even. Instead of needing offsets on all pointer you can just point to a paragraph boundary and address from zero. On random access to large data structures it did incur a little overhead to select the correct segment, but really it was no big deal in most cases (exception being one large data sorting program I wrote which ran for so long that the extra overhead became noticeable). – Brian Knoblauch Jul 16 '19 at 18:11
  • @BrianKnoblauch: The effectiveness of the segment register setup could have been enhanced if some of the short-displacement modes had used a bit or two from the displacement byte to select sub-modes (limiting range to -64..+63 instead of -128..+127), thus allowing a few more addressing modes such as direct short (for accessing the first 256 bytes of a segment) or a mode which would treat BX as a segment selector. Programs which use 16-byte-aligned objects could have then used 16-bit segment selectors to identify them rather than 32-bit pointers. – supercat Sep 18 '20 at 17:20
  • 1
    "Unix was originally designed segmented (and C still carries this over)" <-- I don't think it so. There are only pointers in C. In DOS compilers, there were 32-bit far pointers (segment + offset) and near pointers (only offset), but it was a DOS-specific extension. Possibly on any real-mode OSes it was the same, or similar, and also in other languages using pointers (Pascal). But, the C standard itself has nothing about segmentation. It only defines pointers but it does not say anything about them. Although it does not even forbid them. – peterh Sep 19 '20 at 16:21
  • @peterh-ReinstateMonica You're denying that early Unix was segmented? Also, operating in a segmented environment doesn't mean that a language has to handle segments and/or different kinds of pointers. – Raffzahn Sep 19 '20 at 18:55
  • @Raffzahn I deny that that "C still carries it over", the C language standard is entirely unrelated to the segmentation. I know that early Unix was segmented. – peterh Sep 20 '20 at 09:31
  • 1
    @peterh-ReinstateMonica: The C Standard essentially assumes that addressing hardware has a carry chain that behaves linearly for any addressing region that will hold a single object or array, but does not require the carry chain to be usable beyond that (while an implementation could emulate an object-sized carry chain on platforms where that wouldn't be the case, and x86 compilers actually do that in "huge" mode, doing so imposes such a large performance hit that the language can't really said to be designed for such usage. – supercat Nov 23 '20 at 19:06
  • @supercat I never heard about carry chains, but I now googled them. I think it is about pointer addition, which is generally unrelated to segmentation. I could dig out the C standard somewhere, and I am sure, that nothing related segments is mentioned in it. – peterh Nov 24 '20 at 10:11
  • 1
    @peterh-ReinstateMonica: Basically, the point is that there are some platforms where pointers take e.g. 24 or 32 bits to store, but addition either only affects e.g. the bottom 16 bits, or behaves weirdly beyond that. For example, if one wanted build a 6809-based system (which uses 16-bit addressing) with 256K of RAM adapt a C compiler to support it, one could fairly cheaply build hardware so that the top two bits of the address select one of four 4-bit bank selectors, and addresses are formed by concatenating the bottom 14 bits of the CPU address with four bits from the bank-select register. – supercat Nov 24 '20 at 18:06
  • @peterh-ReinstateMonica: One could then arrange things so that 16K of commonly used code and data had bits 14-15 of the address both 00, other code pointers had those bits 01, and other data pointers had them 10. The compiler would need to be tweaked so that code wishing to perform a data address using a pointer would copy the top byte of the pointer to bank selector #2 and perform the access. Since bits 14-15 would of the pointer would be 10, the access would then be performed using the bank controlled by the bank selector. To allow for good performance, the compiler should support... – supercat Nov 24 '20 at 18:10
  • ...a qualifier to indicate that certain objects must be placed in the common 16K area, and that certain pointers would only be two bytes and could only access things in that area [IMHO, the Standard should have recognized a concept of 'near' and 'far' pointers even if most implementations would have treated them as synonymous, since code which copies bulk data into a common region, accesses it there using simpler pointers, and then copies it back can be much more efficient than code that uses general-purpose pointers for everything]. – supercat Nov 24 '20 at 18:12
  • @peterh-ReinstateMonica: On an implementation such as I described, the CPU would only perform pointer arithmetic on the bottom 16 bits of an address, and pointer arithmetic would behave very oddly if it crossed a boundary between 16K regions of address space. Nonetheless, such an implementation could access 256K of RAM in very practical fashion on a 16-bit CPU if it made suitable use of "near" pointers. Loops that access more than one non-near object per iteration would be slow, but many tasks could be written in such a way as to avoid that. – supercat Nov 24 '20 at 18:19
15

I am pretty sure the Intel engineers just weren't there, yet. And they were pressed by the market to push out a 16-bit CPU before all the others did to keep the market share they had already lost big time to small Zilog. (I am pretty sure that the design of the 8086 was much more driven by marketing pressure TTM and compatibility constraints than engineering creativity).

The x86 CPU was (opposed to the Motorola 68k) a backward-looking architecture that was designed to be able to re-use as much as possible from the 8-Bit 8080 world (both in terms of hardware, like peripheral chips and in terms of software, like the bulkload of CP/M software that was available for the 8080 - And - but this is only an assumption: As much as possible from chip-level building blocks. And such chip-level re-use is just much simpler achieved by a segmented approach than starting a flat-32-bit model from scratch like the 68k did). Note the Zilog 16-bit-CPUs chose a very similar approach that was even less radical.

I think the best indication for how desperate people were looking to hold on to the CP/M ecosystem at that time is actually a non-Intel product: The NEC V20 that even 3 years after the 8086 came with a re-engineered 8086 core and full 8080 emulation.

Your claim the original 68000 were a "non-segmented" architecture is only partially true. If you wanted to use one of the main features of the 68k, fully position-independent code that was very important for systems not employing an (at that time) expensive external MMU (or wait for the 68020 that lifted the limitation to 16-bit index registers), you deliberately decided to segment your memory into 64k chunks that could be reached by register-relative addressing. MacOS did that initially, and other systems like the Amiga and the Sinclair QL as well.

tofro
  • 34,832
  • 4
  • 89
  • 170
  • 7
    Right — Stephen P. Morse himself said that the 8086 was a stop-gap design, intended to give Intel a general-purpose 16-bit CPU with a decent migration path from the 8080 to address market requirements while the real long-term CPU was being worked on (iAPX 432). – Stephen Kitt Jul 08 '18 at 13:34
  • Tofro, you might want to reformat this and rephrase it into information about the 8086 to make it an answer to the original question. As of now it's an argumentative rant ("I'm prety sure...") and maybe a reply to my answer. But that's what comments are for. – Raffzahn Jul 08 '18 at 16:14
  • @StephenKitt being a stop gap measure doesn't explain anything about the resons behind a segmented design - or does it? – Raffzahn Jul 08 '18 at 16:16
  • 2
    @Raffzahn I don't think so - While your answer is looking into technical aspects (which is fine), it completely ignores the situation on the semiconductor market in the early eighties - which I am trying to address. It's obviously hard to come up with rock-solid facts in retrospective on what drove Intel to produce such a design, but it still needs to be looked into. I'm trying to ignore the "argumentative rant" bit here, as I find it a bit on the offensive edge. – tofro Jul 08 '18 at 16:22
  • @tofro Sorry, but I can't see any part of your 'Answer' refering to why decisions in the 8086 design where taken to make it segmented. I jsut contains assumption without any reference and personal opinions. Basic building blocks for a rant, but not any objective aproacht to the question asked. It's plain off topic, unless you can make this about the 8086, that is. – Raffzahn Jul 08 '18 at 16:33
  • 2
    @Raffzahn it doesn’t explain anything about the segmented design, no, but it does explain why there wasn’t much thought put into long term evolution of the design (at least, on that part of the design). – Stephen Kitt Jul 08 '18 at 17:26
  • @StephenKitt I'm not so sure. I still think the idea of using a segmented design was great. And the solution of avoiding all the hassles (and especially transistor cost) of a 'real' management by taking a shortcut via a fixed offset was as well a great one. After all, it did lay a foundation to make the system upward extendable, like it happened with the 286. Segmentation was state of the art for 'large' (well, mini) systems in the mid/late 70s. And the shortcut did offer all the benefits at low cost - that it got ridculed by user programs doing segment calculation is a diferent story. – Raffzahn Jul 08 '18 at 17:39
  • Minor correction: the 68020 did still rely on an external MMU – poncho Jul 08 '18 at 18:28
  • 3
    @Raffzahn I agree that the segmented design had its benefits, but I do think that it would have been implemented differently if the designers had realised it would survive for 30-odd years — for one, they might have allowed the multiplier to be adjusted... (Although really I think any argument around the x86 and its design needs to focus more on the longevity of DOS and its 16-bitness rather than perceived failings of the x86 itself — if we’d switched to 32-bit OSs in the late 80s with the 386 we’d have been much better off.) – Stephen Kitt Jul 08 '18 at 18:31
  • @poncho I didn't intend to say something different - The limitation that was lifted with the 68020 was the limit to use 16-bit indices only. – tofro Jul 08 '18 at 18:32
  • (Before anyone argues about the 30-odd year thing, note that segmentation is pretty much gone in 64-bit long mode.) – Stephen Kitt Jul 08 '18 at 18:32
  • @StephenKitt There's actually not much wrong with segmentation as such, IMHO - What's the main limit is the (relatively) small size of segments in the x86 architecture, which should have been avoided with a bit more forethought and wasn't much of an improvement over typical 8-bit CPUs. – tofro Jul 08 '18 at 18:35
  • @tofro we’re in violent agreement (I hope that came across in my next-to-last comment). – Stephen Kitt Jul 08 '18 at 18:45
  • @StephenKitt Yes, DOS - or better it's weired applications - is the root of all evil :)) An adjustable offset wouldn't have made much sense, as the address size was defined by the pins available. Also a larger granuality can as well be reached by selecting the right segment numbers. In contrast, support for basic memory management would have made a real difference. That would have been rather easy and opened quite some posibilities. I guess I'll add that as a footnote in aboves answer. – Raffzahn Jul 08 '18 at 18:46
  • @Raffzahn an adjustable offset would have meant that larger address spaces, made available by having more address pins on later CPUs, could have been supported without having to introduce new operating modes. Think “preserving the same ISA and operating mode across multiple generations of hardware“, not “allowing software to request stuff that the hardware can’t support”. – Stephen Kitt Jul 08 '18 at 18:48
  • @StephenKitt I did understand it that way, but the original 8086 didn't have more pins, thus an adjustable factor wouldn't have made sense to implement and the next generation (286) already got 'real' segments - and more address pins. – Raffzahn Jul 08 '18 at 19:40
  • @Raffzahn but the 286 didn’t get “real” segments in a backwards-compatible way, they were only usable in protected mode; that’s my entire point — if the x86 ISA had been designed for longevity from the get-go, segments would have been different from the get-go (whether descriptors à la 286, or with adjustable offsets). Even on the 8086, adjustable offsets would have had their use, e.g. when dealing with data structures larger than 64 KiB; but once you get into that sort of territory it makes more sense just to go with descriptors instead. – Stephen Kitt Jul 08 '18 at 21:09
  • @StephenKitt I got the feeling we're talking about the same,just have a different understanding of wording. So lets mark this for a Beer at some time in the future. – Raffzahn Jul 08 '18 at 21:18
  • @Raffzhan sounds good to me! ;-) – Stephen Kitt Jul 08 '18 at 21:23
  • 3
    The Amiga did nowhere divide the memory into 64k chunks. The decision to use smaller segment sizes was just a compiler option. You can write 68000 position independent code with larger segments, even without a dedicated addressing mode, it just implies a bit more complicated code. But Amiga applications were never required to consist of position independent code anyway, as the executable format allowed to post-fix absolute addresses after loading. – Holger Jul 09 '18 at 09:31
  • @Holger It's not of much relevance here, that is why I was not going into details, but all library references on the Amiga have to be made with a6 as a base register. This effectively limits at least the entry points into the library to +-32k from there, which is what I was referencing to. Of course, with "a bit more complicated code" you can also work around the 64k segment size on an x86. – tofro Jul 09 '18 at 09:45
  • 1
    @tofro Amiga libraries have a jump table below their base address, consisting of jmp <absolute address> instructions, each taking 6 bytes, so this just means that a library can only have up to 32k/6 functions, but I never head of any library having a problem with that 5460 function limit. Since the jump instructions use absolute addresses, they could be placed anywhere in the entire 32 bit address space (which allows patching ROM libraries with code in RAM, far away from it). The data space above the library usually contained pointers/absolute addresses of dynamically allocated structures… – Holger Jul 09 '18 at 10:00
  • @StephenKitt: What made the 8086 design better than its successors was that there was no need to have a descriptor for every segment. The proper way to extend the design to the 80386 would have been to have 32-bit segment selectors which combine a small few bits of selector (e.g. 4 bits or so) with a scaled offset (using the remaining ~28 bits). One could then have a small-object segment that could hold up to four GiB worth of objects with 16-byte granularity, a medium-object segment that could hold up to 64 GiB worth of objects with 256-byte granularity, and a large-object segment... – supercat Jul 10 '18 at 08:22
  • ...that could hold up to 1TiB worth of objects with 4096-byte granularity, and have 13 segments left for whatever other purposes one needs, all while being able to use 32-bit object references. – supercat Jul 10 '18 at 08:24
  • "re-use...the bulkload of CP/M software that was available for the 8080..." As well as that relatively small number of programs and users, there was also the far larger number of programs in the embedded systems market, which throughout the '70s was vastly larger than the PC market. When a new version of an embedded device using an 8080 blows its 64K address space limit, swapping in an 8088 and using Intel's translating assembler is a pretty attractive solution. – cjs Dec 12 '22 at 04:03
9

The 8086/8088 is designed to be a 16-bit CPU which means that its registers are all 16 bit wide. You can address 64kB with a 16-bit pointer, but the designers wanted to address more. So what are the options?

You can add special registers that are larger than 16 bits. This would have complicated a lot of things: increasing all registers would have been expensive and hindered porting existing code (easy porting from 8080 was an explicit design goal), mixing 16-bit and larger registers would have been cumbersome (and likely also expensive and hindering portability). Further, a more complicated design would have delayed the project and Intel wanted the 8086 to get to market fast (the 8086 was seen as a temporary solution by Intel at the time, as far as I can remember).

Or you can divide pointers into two parts. This is what Intel has chosen: it was a simple solution to get 20 bit pointers. Theres a dedicated adder in the BIU (Bus Interface Unit) which simply shifts one 16-bit value by 4 to the left and then adds another 16-bit value. This was very simple to implement and also fast. As mentioned it other more detailed answers, it also made it easy to port 8080 code. It's a "cheap" solution to address more than 64kB on an 16-bit CPU and was a good fit for the design goals of the 8086.

DarkDust
  • 1,488
  • 14
  • 20
  • Having registers that are slightly bigger than 16 bits is hardly impossible--it's perhaps the second best approach to addressing ~1MiB of address space--loading and storing such registers is often a lot more expensive than dealing with 16-bit registers. Unless an individual object exceeds 65,520 bytes, adding a displacement to a pointer to part of it will require a 16-bit read-modify-write cycle. Doing a read-modify-write on a linear 20-bit pointer would require reading and writing 3 bytes or two 16-bit words--much more expensive. – supercat Jul 10 '18 at 19:26
  • @supercat E.g. TI's MSP430X does it that way. – fuz May 13 '20 at 23:14
6

Source code compatibility (via assembly language translation) with the 8080/8085, as mentioned in the question, was a major design consideration with the 8086. To bootstrap the usefulness of the processor and get it into real systems as quickly as possible, allowing producers of existing software (especially CP/M) to get that software to work on the 8086 almost effortlessly was critical.

However, as we've discussed here before, the conversion process needs to replace some single byte instructions on the 8080 with multiple byte instructions on the 8086. Therefore, a program running on CP/M-80 that used the full 64KB of RAM couldn't run inside a single CP/M-86 16-bit segment, as its code may well expand to a large enough size that there's no longer enough space for its data. In order to allow easy conversion, there needed to be a way of separating the code and data segments to allow more memory than 64KB to be used by a single process, without needing to change data formats by using pointers longer than 16 bits (which would disrupt source compatibility for programs written in assembly language, which many of the big CP/M programs were).

The segmented model allows a program to set its CS and DS/SS registers to different values and therefore trivially expand from 64KB maximum to 128KB maximum. Doing so simply requires the addition of a very short piece of code to the process startup, making it easy to add without needing to disrupt the existing code. Using a 32-bit offset register or something similar could have worked too, but would have been more complex and wouldn't necessarily have made the processor any more useful, at least in the near term. And because nobody was expecting a microprocessor design to last as long as the 8086 eventually did (there had never been anything even remotely as long-lived designed before it), the near term was all that was considered.

Jules
  • 12,898
  • 2
  • 42
  • 65
  • Actually, even if you set aside the conversion process alone making the converted 8080 program bigger, I'd imagine that they also designed for easy porting of systems that were bursting at the seams in a 64 KB address space (because, say, the data it needed to store had grown larger) and needed to expand.I discuss that in further detail near the end of this answer. – cjs Sep 18 '19 at 11:01
6

There's another point I haven't seen anybody mention.

When Intel released the 8086, they were already working on the iAPX 432.

Intel's intent was that the iAPX 432 would be the CPU that would become popular in the desktop market. They were putting an immense amount of time and effort into the design.

At least from what I've heard from a few former Intel designers, the 8086 design was a direct result of that--the 432 was taking quite a while, and Zilog was doing well with the Z80, so Intel thought they needed a follow-on to the 8085. At the same time, they wanted to ensure against the 8086 (and successors) from dominating the market to the point that the 432 would never be able to gain any significant market share.

To that end, they designed the 8086 specifically for higher-end embedded applications. Typical embedded applications rarely did the sorts of things that were clumsy and difficult with segments, and using segments usually allowed somewhat denser code, which was quite important for embedded use.

So, to some extent they failed by succeeding, so to speak. Despite their attempt at crippling the 8086, when the 432 became available, it had exactly the problem they'd feared: the 8086 (and successors) already dominated to the the point that most people have never even heard of the 432, not to mention using it. Worse, the 432 was late to market, did poorly in benchmarks, and the early iterations were fairly buggy to boot. Intel didn't officially give up on selling the 432 until around the 386 time frame, but there's no real room for question that it was an absolute flop, for exactly the reason they'd feared and tried to plan against.

Jerry Coffin
  • 4,842
  • 16
  • 24
  • 1
    The 8086 is inefficient at trying to work with monolithic objects greater than about 65,520 bytes, but its segmentation design is, by a considerable margin, the most efficient 16-bit architecture I know of for working with objects smaller than that which are placed arbitrarily, or better yet on arbitrary 16-byte boundaries, within a larger-than-64K address space. – supercat Aug 08 '18 at 21:54
  • @supercat: Yup--or to consider what that ends up meaning: it works well for embedded applications, but poorly for desktops, exactly as it was designed to. – Jerry Coffin Aug 08 '18 at 22:08
  • 1
    There weren't really a whole lot of desktop applications for the PC that used monolithic objects bigger than 64K which wouldn't have benefited from being processed in smaller chunks. A text editor that limited individual lines to 65,535 bytes, for example, and required them to be aligned to a segment boundary, could keep a list of N lines' start addresses in 2N bytes, and be much faster than one which operated on the entire buffer as a monolithic blob, even on a processor that could support the latter. – supercat Aug 08 '18 at 22:18
  • @supercat: I haven't written such an editor, so I hesitate to opine on it, but at least offhand, it seems like it would depend heavily upon what you were doing. For example, inserting a number of empty lines in the middle of such an editor at least seems like it would be substantially slower than doing the same in a split-buffer editor. – Jerry Coffin Aug 08 '18 at 22:31
  • 1
    The most serious problems with the 8086 segmentation design were the lack of a third general-purpose segment register, the inability of many languages to really support it efficiently, and the failure to implement a model that could designate a range of segments to use a different scaling factor. The 8086 design could have been extended to a 16MiB address space in a fashion compatible with most existing code, and the 80386 design would have benefited from a mode where the lower portion (maybe 28 bits) of 32-bit segment descriptors would be shifted by an amount controlled by the upper portion. – supercat Aug 08 '18 at 22:36
  • @supercat: there were all sorts of things that could have been done, some of which would have been quite useful--but none of it changes the real reason Intel did it in the first place, which really was exactly as this answer points out: to keep it from cannibalizing the market for the 432 (in which regard, they clearly failed). – Jerry Coffin Aug 08 '18 at 22:40
  • Split buffer designs are certainly better than designs that have to move everything past the cursor on every keystroke, but moving from the end of the document to the start and then inserting a character, or vice versa, would require moving everything in the buffer. On a 4.77MHz xt, moving 256K worth of text would take about half a second. Further, working with a monolith would require 32-bit math all over the place. By contrast, subdividing into lines would mean it's almost never necessary to move more than 64K at once--cutting the worst-case time substantially. – supercat Aug 08 '18 at 22:41
  • If Intel's goal was to produce a bad design, they failed miserably. – supercat Aug 08 '18 at 22:41
  • @Supercat: Spoken like somebody who either didn't do a lot of real-mode development, or whose memory is equipped with some pretty seriously rose-colored glasses. The reality is that segments made all sorts of tasks more difficult than they would otherwise have been. As far as split buffers go, they certainly don't cure all ills--but they work reasonably well for quite a few purposes in practice. The design you expounded might work reasonably also, but I don't recall having seen source to a good editor that worked that way either. – Jerry Coffin Aug 08 '18 at 22:43
  • The biggest difficulty with segments was the shortage of general-purpose segment registers--only two of them to go with three index registers. For awhile I fought with segments because I didn't understand how to use them effectively, but my last time getting deep into 8086 programming was for an embedded system with a 16-bit NEC V40 clone, sometime around 2005, and after I'd already worked with the 68000, 8051, PIC, and TI DSPs. Given a 32-bit bus and an adequate amount of RAM, using linear 32-bit pointers is nicer than using segments, but the 68000-based Macintosh... – supercat Aug 09 '18 at 04:52
  • ...had a lot of 32,767-byte size limits for things which on the 8088 could easily have gone up to 65,520 bytes. – supercat Aug 09 '18 at 04:53
  • I never got into 80286 segment programming, and from what I can tell the designers of the 80286 architecture completely missed what made the 8086 segment design so nice. – supercat Aug 09 '18 at 04:55
4

The 8086 was 16-bit processor, and thus could only access 64K of RAM. IIRC, the 8086 had 20 memory address lines, so it could address up to 1M of RAM. In order to directly address this memory, they would have needed at least 1 20-bit register. I'm assuming that was too expensive or difficult to fit into their architecture, so they decided to use the segmented memory scheme.

For the time, it wasn't so bad. There were 4 segment registers (Code, Stack, Data, and Extra) each addressing 64K of RAM.

Unrelated to the question, but as a student I couldn't afford a 'real' compiler, but the student edition of Turbo C++ was free. The catch was you could only use the Tiny memory model (which someone mentioned previously was for .COM files and CP/M compatibility), which meant you could only address 64K for code, data, and stack. Not knowing any better, I learned many good optimization techniques to make games run in that memory space. Imagine my joy and wonder when I finally bought a compiler and discovered I could access an entire megabyte! :-)

TimF
  • 41
  • 3
  • I think the reason of the segmentation was not the 20-bit thing, but to increase the 64K address space without breaking compatibility with pre-XT (PC, original) apps. – peterh Sep 18 '20 at 16:33
  • 1
    @peterh-ReinstateMonica The XT and original PC both had an 8088. – RETRAC Sep 18 '20 at 16:47
1

It is convenient to be able to address data as separate from code. It is convenient to be able to set aside space for the stack so that you can guarantee that it won't overrun anything too.

This arrangement means that the same 16 bit pointer operations (save, load, 16 bit increment) can still be used as in the i8088, but the address space is still increased.

Omar and Lorraine
  • 38,883
  • 14
  • 134
  • 274
  • 1
    Your second paragraph seems to imply to me you assume the 8086 was a successor of the 8088 - It's actually the other way round. – tofro Jul 08 '18 at 16:33
  • I can't really see why you think there were any compatibility considerations in the design of the 8086 with the 8088 that was not bound to exist until about 1 year later? – tofro Jul 08 '18 at 16:56
  • If they were intent on releasing a range of CPUs, why not. – Omar and Lorraine Jul 08 '18 at 17:07
  • 1
    Look at the 68k family with the 68008 (which is pretty much the same thing in the 68k world as the 8088 to an x86) - It doesn't use segmented addressing, it rather is a fairly exact copy of "the real thing". Internal register layout doesn't have a lot to do with how the bus looks like. – tofro Jul 08 '18 at 17:11
  • 4
    The major difference between the 8086 and the 8088 is the data bus width; 16 bits in the 8086, 8 bits in the 8088. (That and, I think, instruction cache size; I seem to recall that was 8 bytes in the 8086 but only 5 bytes in the 8088, but could be wrong on the specific numbers.) For the programmer, even one working in assembly, the 8086 and 8088 are identical, irrespective of segmentation or memory model in use; the only difference from a software point of view is their speed in practice (even when both are clocked at the same rate). Particularly here, both have a 20-bit address bus. – user Jul 08 '18 at 17:32
  • 1
    @MichaelKjörling There wasn't a cache in either, but a 5 (8088) or 6 (8086) prefetch queuein the BIU. When there is an unused bus cycle (or the CPU needs an instruction and none is ready - which is the same), the BIU will fetch ahead from CS:IP. The size is defined as 5 to hold the longest possible instruction (which is 4 on a classic 8086) plus 1. Prefixes except for REP don't count, as they are handled by the BIU. Since the 8086 data bus is 16 bit, its queue must be one byte longer to hold a possible overflow. – Raffzahn Jul 08 '18 at 21:16
  • @Raffzahn True about prefetch queue vs cache, but about the same thing in this case, and only peripherally related to this answer. 5 vs 8 bytes of cache vs prefetch isn't visible to the programmer, except insofar as it impacts execution performance. – user Jul 08 '18 at 21:19
  • @MichaelKjörling You missed one point: it's 6 bytes on the 8086, not 8, the larger size is due the word wise access. Also, there is a huge difference between both. Even a 6 byte cache could hold a tiny loop and speed execution up, while a prefetch queue doesn't. The 8086 does always fetch all instructions in a loop over and over, even when shorter than the buffer size. (and yes, there are useful loops shorter than 6 bytes, like LODSB; XLAT; JNE) – Raffzahn Jul 08 '18 at 21:34
  • @Raffzahn: The only way a 6-byte cache could hold a loop would be if the earlier items didn't get displaced by a prefetch by the time the branch occurred. If I recall, the 68340's "loop mode", given e.g. a move instruction followed by a decrement-and-branch, would end up displacing the move instruction from its buffer by the time the branch had executed the first time, but then the act of branching back two bytes would disable prefetching, so the instruction preceding the branch would effectively get locked in the cache until the instruction after the branch was needed. – supercat Feb 13 '19 at 15:50
  • @supercat That's why it would have to be a cache and not a prefetch buffer. A prefetch buffer is filled ahead of time, while a cache holds already used data for potential reuse. It's the fundamental difference between those two. There are three baic useages of buffers on the memory interface: Prefetch buffer, reading ahead of the execution, line buffer, converting between memory word size and CPU access and cache. each of them has it's own workings and purpose. And while some features can be alike (like a line buffer holding an alligned loop), they are not the same. – Raffzahn Feb 13 '19 at 17:01
  • @Raffzahn: The concepts of caching and preteching are not mutally exclusive. Many caching systems will also prefetch data if there is reason to believe it will be needed and fetching the information immediately would be cheaper than fetching it later. If the processor's memory bus is idle while an instruction is being executed, and the next instruction hasn't yet been fetch, initiating the fetch immediately will save at least a cycle if the fetched value ends up getting used. – supercat Feb 13 '19 at 17:16