41

It's been a while, but I've read in a system programming book that you could switch your Intel 80286 CPU from the normal real mode to a more powerful protected mode. I clearly remember that they said it was impossible to switch back though, unless you fully restarted the computer.

Why was this (made?) impossible? Was it a bug, intention by the DOS developers or a hardware restriction of the CPU?

Chenmunka
  • 8,141
  • 3
  • 39
  • 65
Byte Commander
  • 1,205
  • 1
  • 12
  • 15
  • 2
    Exactly what happens when CPUs are created by programmers (you remember, 8086 was architected by a pure programmer. Probably 80286 still was done the same). :) – lvd Apr 08 '18 at 15:56

3 Answers3

29

My guess is that it was merely a design decision based upon the assumption that once a protected mode OS is started, there is no need to go back. Most microprocessors at that time already booted in its most privileged mode and had at least two levels of protection. The 80286 had to boot in real mode to keep compatibility with DOS and I think they thought DOS would reduce itself to just a minimal procedure to boot the main OS.

It seems that Intel engineers didn't realize that DOS was going to live for about ten years after launching the 286, and software engineers along with motherboard manufacturers would figure out a way to switch the CPU back to real mode in order to call DOS services from a protected mode program (who decided that the keyboard controller is a good place to put a register with a bit to reset the CPU and another one to enable CPU addresses beyond 1MB?)

By the time the 80386 was produced, they added the feature to switch the CPU back to real mode, and the dirty trick used with the 80286 -to reset it with a magic number at a certain memory location so the BIOS could read it and jump to some predefined code to resume operation- was not needed any more.

Then, the undocumented LOADALL instruction was spoiled on the USNet, which could allow not only switching from and to real mode, but allow non standard CPU states, such as the so called "unreal" mode, which allowed a real mode DOS program to access memory beyond the 1MB barrier.

This article from the OS/2 museum discuss the use of LOADALL to switch back from protected mode, and how Microsoft used it in HIMEM.SYS to allow fast above-1MB-memory copying without having to leave real mode.

user1354557
  • 103
  • 3
mcleod_ideafix
  • 18,784
  • 2
  • 70
  • 102
  • 4
    See: https://en.wikipedia.org/wiki/Real_mode#Switching_to_real_mode One of the main drivers for these hacks was getting access to in13 BIOS calls, which were real mode only. –  Apr 22 '16 at 14:53
  • A real protected mode OS should not need int 13h BIOS, but provide its own device drivers to access to the hard disk. – mcleod_ideafix Apr 22 '16 at 14:56
  • 2
    "Should", but often didn't. –  Apr 22 '16 at 15:03
  • 2
    IBM's PS/2 systems had an ABIOS, which was a protected-mode BIOS designed to be used with OS/2, and allowed the system to stay in protected mode (except for the DOS "penalty box"). – Stephen Kitt Apr 27 '16 at 19:16
  • 3
    @mcleod_ideafix It's a nice theory, but given the fact that hardware manufacturers at the time didn't really see much point in supporting any operating system other than DOS and that the interfaces to their hardware were often entirely undocumented, the fact that cards often had BIOS extension ROMs onboard to make them work with DOS was about as good as you were going to get, so if you wanted to make another OS work with them, you needed that OS to be able to interface with the BIOS. – Jules Jun 02 '16 at 19:32
  • 2
    Someone had to start departing MS DOS, and Intel tried to support other OSes. In fact, the very datasheet of the 80286 starts with "The 80286 is an advanced, high-performance microprocessor with specially optimized capabilities for multiple users and multi-tasking systems...". Intel assured that the 80286 was compatible with existing legacy real mode programs, but looked towards MS DOS and saw XENIX, UNIX, etc, and realized that some sort of hardware assisted protection was needed to support them. – mcleod_ideafix Jun 02 '16 at 22:46
  • 1
    As these operative systems had nothing in common with MS DOS, turning back from protected mode to real mode didn't seem necessary. Real mode would be just a trampoline to set up the CPU prior to jump into protected mode. – mcleod_ideafix Jun 02 '16 at 22:46
  • 1
    Of course, everybody knows that industry and customers seemed happy with MS DOS, to the point that Microsoft considered adding multitask capabilities to MS DOS and set up a poll to find out what multitasking features people demanded from a future version of MS DOS. It turned out that the most demanded feature was to be able to print a document in the background, so they added the PRINT command to queue print jobs and send them to the printer in the background. – mcleod_ideafix Jun 02 '16 at 22:47
  • So people wanted MS DOS, but also wanted more memory and more computing power, and so the 80386 was born, with its Virtual 8086 mode, its ability to switch back from protected mode to real mode, and its removal of the LOADALL back door (which AFAIR is another way to switch back from protected mode to real mode without having to reset the CPU) – mcleod_ideafix Jun 02 '16 at 22:47
  • @jdv Ah, yes, the infamous triple fault trick. – user Jun 03 '16 at 11:34
  • Perhaps a better LOADALL link would be this question – wizzwizz4 Jun 24 '16 at 17:12
  • Is there any way to make code in protected mode use multiple regions of memory as efficiently as would be possible in real mode with some external banking circuitry? If code can work effectively with 64K of very fast RAM, another 512K of sorta-fast RAM, and a large amount of RAM that is slower to bank-switch, I would think the faster LES and MOV ES,xx instructions of real mode could easily outweigh the increased cost of using bank-switching when needed. – supercat Jul 04 '16 at 21:28
  • 1
    "I think [Intel engineers] thought DOS would reduce itself to just a minimal procedure to boot the main OS." I doubt the engineer thought about MS-DOS one way or the other. MS-DOS was released August 1981. The Intel 802086 was released a half year later, February 1982. I doubt they gave any thought to MS-DOS. – Shannon Severance Aug 15 '17 at 19:03
28

This was intentional so that the CPU would support secure operating systems. In a secure operating system with rigorous memory access protections you could not allow any software - user or kernel extension or driver - to switch back to the full freedom of real mode.

They had a lot of interesting memory management hardware on the '286: rings and call gates - swiped nearly directly from the most secure hardware/software platform of the time: Multics. And they had new extensions too: task segments and task gates for multiprocessing.

These features never got used as they wanted. One major reason was that due to compatibility with all the peripherals - including disk controllers and graphics hardware that in those days were nowhere near sufficiently standardized - OS and other software that needed computer model/manufacturer independence needed to use the BIOS to access peripherals - and the BIOS was real mode - and the hacked-up switch to real mode (as discussed in other answers) was slow. Another major reason was that calling through call gates, and using hardware tasks, was much much slower than just implementing normal procedure calls and concurrent threads. Hardware task switching in particular was several times slower than just saving/restoring register context with normal instructions. So no OS software got written for these special modes.

These capabilities were also available when 32-bit processors were made - starting with the '386 (and they're still there in the x86 architecture to this day) - but with 32-bit addressing it turns out it is much easier to use paging hardware (address translation tables and so on) to get OS security - also much higher performance. No 32-bit OS used the stuff: Windows, OS/2, and all the Unixes used paging for process isolation and security.

I was actually saddened by this back in the day: I loved Multics, and the Multics architecture. The Intel designer's hearts were in the right place, and it was brilliant work sticking all that stuff in a commodity microprocessor, but it turned out to be the wrong solution for the problem. Operating system engineering had progressed past the Multics days and all that special hardware just did not lead to an economic solution.

davidbak
  • 6,269
  • 1
  • 28
  • 34
  • 1
    Unfortunately, I think the designers of the 80286 and 80386 failed to recognize what was good about the 8086 segmented architecture: it allowed a 16-bit CPU to access objects up to 65536 bytes which were located on 16-byte boundaries, without having to add per-object overhead. Use of the 8086 architecture was a bit clunky because it had one too few general-purpose segment registers and couldn't do arithmetic on segment registers or even use load-immediate with them [were it not for that, exe files could easily use DS for "whatever" and cheaply reload with a program's main segment], but... – supercat Apr 09 '18 at 00:45
  • 1
    ...code which needed to use more than 64K but less than 640K could run more efficiently using real mode than in protected mode. Many kinds of code would have been well-served by a mode in which the top two bits of a segment number selected a segment group, each of which would have a base address, size, and scale factor, and the bottom 14 bits were scaled by the indicated factor. Loads of segment registers could be processed at the same speed as other register loads [no need to load descriptors], but bumping the scale factor to 256 would allow a segment descriptor to control up to 4MB. – supercat Apr 09 '18 at 00:54
  • 1
    The ring system IS used by modern OSes... not all of them though. – rackandboneman Apr 12 '18 at 22:34
  • @rackandboneman - which ones? I'm very curious to know. It doesn't exist in x64 AFAIK, so these would be x86 OSes? – davidbak Apr 13 '18 at 03:38
  • 7
    Ring 0 and 3, sure, but calling that "using the ring system" is a stretch. The intention was for the OS to be multi-layered. As used, the rings are just an implementation detail of how OSs set up a wall between user and privileged modes in x86/x64. – Euro Micelli Apr 13 '18 at 04:28
  • Even Multics used only 3 of the 7 rings it had available ... – davidbak Apr 13 '18 at 04:44
  • 1
    I don't understand why there is any security problem if the kernel (i.e. ring 0) is able to switch back to real mode... surely at that level it has access to everything anyway, so why would being able to do so in real mode be an issue? – Jules Apr 13 '18 at 07:30
  • 7
    OS/2 is notorious for causing virtualization software developers headaches due to its additional use of ring 2, not just 0 and 3. – rackandboneman Apr 13 '18 at 07:49
  • 5
    @rackandboneman - I did not know that but Google tells me they put device drivers in ring 2. Good for them! Sorry it didn't work out ... – davidbak Apr 13 '18 at 14:56
  • 1
    @Jules - I believe it is sort of a "defense in depth" argument. But it could be more and I'm just not aware of it. – davidbak Apr 13 '18 at 17:43
  • 1
    Can you explain why allowing the most privileged part of the OS to switch to real mode is insecure? – cjs Aug 17 '19 at 06:25
  • 2
    @davidbak: Conceptually there's really no good argument for putting device drivers in ring 2 rather than ring 3 and letting the OS mediate whatever additonal access they need, beyond a performance argument that becomes irrelevant once all IO with performance needs is going over DMA... – R.. GitHub STOP HELPING ICE Feb 23 '23 at 14:37
  • @R..GitHubSTOPHELPINGICE - conceptually, yes, and of course that's the way the L3/L4 series of operating systems worked: Only the teeny tiny message passing kernel was root and everything else ran protected - with different services mediating access to devices and other resources for their peers. – davidbak Feb 23 '23 at 15:30
7

Just speculating here, but it might have been a product decision to encourage writing code for protected mode. It's also possible it was a combination of technical difficulties and product priorities.

The CPU is put into protected mode by setting the PE bit in MSW using the LMSW or LOADALL instructions. Clearing the PE bit has no effect using either of those instructions, thus it is not possible to switch back to real mode. The fact that LOADALL allows undocumented behaviours like "unreal mode" (by loading values into descriptor caches to access memory outside of 1MB) and even allows you to put the CPU into an unusable state by loading nonsensical values, but disallows clearing the PE bit makes it likely it was a deliberate product decision. OTOH, LOADALL is undocumented, so maybe there really was a technical problem.

Just like switching from real to protected mode, going back to real mode is a carefully choreographed dance that needs to make sure all memory accesses go to defined memory locations (especially instruction fetches). The documentation has a strange passage that says "After executing LMSW instruction to set PE, the 80286 must immediately execute an intra-segment JMP instruction to clear the instruction queue of instructions decoded in real address mode" hinting that there is more to it than just setting or clearing a bit somewhere. It probably wasn't completely trivial to implement (not very hard either, but it definitely would have had a cost if you're counting transistors or microcode ops. The 80286 had ~134,000 transistors, and using a couple 100 transistors to implement an unnecessary feature would probably have meant dropping other, deemed more important features). Since they didn't anticipate a need for it, maybe they weren't willing to pay the "silicon tax".

There are still ways to get back to real mode by resetting the CPU without restarting the computer. On the PC, this could be done by putting a magical value into a special memory location to prevent the BIOS from reinitializing the computer after a reset, and then causing a reset through the i8042 keyboard controller which would externally reset the CPU, or by generating a triple fault (faulting in a fault handler, usually by invalidating the IDTR and causing an interrupt).

Using the triple fault is usually much faster. Going through the keyboard controller can take almost a millisecond, the triple fault only a couple hundred microseconds. Even with the triple fault, the reset circuitry is still external, but it doesn't have to be processed by the very slow i8042 (some PCs of the time "short-circuited" the reset bit in the 8042 port to directly reset the CPU without going through the i8042, in that case, there's no noticeable difference).

The reasons to switch back to real mode are all due to technical debt, you wouldn't need it in a clean system that's designed for protected mode from the ground up. In reality, people kept writing software for real mode and were unwilling (or unable) to quickly port stuff like device drivers to protected mode, and the 2 don't mix. Maybe Intel was trying to "encourage" people to port their drivers to protected mode by making it necessary, but as we know, this didn't work out.

Intel didn't make that mistake again, and the 80386 fixed the glitch both by allowing mode switches in all directions, and by adding a virtual 8086 mode that was essentially a protected mode that looked like real mode to the application.

Rico Pajarola
  • 1,015
  • 9
  • 9
  • 1
    It would had a cost.... or not. The 80286 is a microprogrammed arquitecture, and there was already a means of altering the MSW using SMSW. Only that it didn't allow to reset the PE bit. In addition, they implemented LOADALL, which is way much sophisticated than SMSW (and I believe it could put the micro back into real mode) – mcleod_ideafix Jun 24 '16 at 13:57
  • I'm not sure there was a technical reason for not allowing the switch back to real mode, it might really have been a product strategy thing. Editing the answer to mention the microcode, MSW and LOADALL – Rico Pajarola Jun 25 '16 at 14:34
  • @RicoPajarola: Wouldn't the fact that real mode can switch segments faster than protected mode be an argument in favor of using the former except in places where code needed the features of the latter? – supercat Apr 10 '18 at 18:58
  • 3
    Don't know where else to stick this story so I'll stick it here: In those days one use of RAM "above 640K" was for a RAM disk (RAM cache). E.g., IBM VDISK.SYS, though several manufacturers shipping DOS had variants. HP had one. They'd frequently use protected mode to access that memory - switch DOS to protected mode, do the ramdisk stuff, then use the 8042 hack to switch back. Alsys' Ada compiler for the x86 - which I worked on at Alsys - was a DOS compiler which ran in protected mode (and shipped with a 4Mb card!) - also it also used it the 8042 hack. But, in a horrible brain fart ... – davidbak Apr 13 '18 at 03:46
  • 2
    ... the HP engineers, on their version of the IBM PC AT, made the BIOS buzz the PC speaker each and every time the switch back to real mode was done! Arrggghhhh! We had to clip the wires to the PC speaker on our test lab HP PC and had to hope none of our customers were buying HP! (The BIOS, of course, was in ROM.) (Just one minor - but annoying! - example of the incompatibilities between various PC BIOSs at the time.) – davidbak Apr 13 '18 at 03:48
  • @supercat I don't know anything about segment switching in protected mode being slower (there certainly isn't any reason for that). What's slow is switching between real and protected mode. – Rico Pajarola Apr 13 '18 at 16:58
  • @RicoPajarola: In real mode, loading a segment register behaves just like loading any other register. In protected mode, loading a segment register causes the processor to load an 8-byte segment descriptor for the associated segment. – supercat Apr 16 '18 at 14:57
  • @supercat that's not true, even in real mode, the 80286 stores the base address of each segment in a hidden descriptor-cache register. Each time a segment register is loaded, the base address, size limit, and access attributes are loaded into these hidden registers. http://www.rcollins.org/ddj/Aug98/Aug98.html – Rico Pajarola Apr 17 '18 at 18:13
  • @RicoPajarola: According to that page, every load of a segment register in protected mode populates the "descriptor cache" from the descriptor table. – supercat Apr 17 '18 at 19:03
  • @supercat but it does the same in real mode. The only difference is that in real mode, most of the values are fixed. Still, most of the work performed is the same. – Rico Pajarola Apr 19 '18 at 08:43
  • @RicoPajarola: The descriptor table is stored in main memory, which means that in protected mode every segment switch in protected mode requires four 16-bit reads (3 cycles each) in addition to the time required to fetch the instruction. In real mode, there is no descriptor table in memory, so mov es,ax would just require the three cycles to fetch the instruction; in protected mode, the time would be fifteen cycles. – supercat Apr 19 '18 at 19:09
  • @supercat yeah, you're right. I finally found a place that calls out the timings in real mode vs. protected mode (still can't find it in an official manual). It's 2 in real mode vs 17 in protected cycles (which would mean it's 5x3 cycles for 5 16bit accesses?). – Rico Pajarola Apr 21 '18 at 10:47
  • @RicoPajarola: Now do you see why programmers might favor using real mode for most of their program? In real mode, like for(int i=0; i<size; i++) arr1[i] = arr2[i] + arr3[i]; would only take about 72 cycles/loop even if all three arrays were in different segments and the only optimization was to keep "i" and "size" in registers. If hand-optimized, the code could end up being about 36 cycles/loop. The minimum protected mode penalty for this code would be 15 cycles/loop if hand-optimized with alternating iterations fetching arr2 then arr3, and arr3 then arr2. More likely... – supercat Apr 21 '18 at 20:46
  • ...the penalty would be 30 or 45 cycles. If the 80286 had added the FS: and GS: registers which were later added on the 80386, protected mode might have been more usable, but adding an extra 15 cycles on every segment load when there are only two general-purpose segment registers can be a major performance drain. – supercat Apr 21 '18 at 20:49
  • @supercat I still contend that this was not the reason people didn't use protected mode. 30 cycles on a 286 is not that much: your example loop that only adds 2 numbers is already 72 cycles per iteration in real mode, and could be restructured to be faster in protected mode (copy first, then modify in place, use string instructions etc.). Code is not usually dominated by segment loads. And once you need more than ~500kB of RAM, protected mode is orders of magnitude faster than using EMS/XMS, but people still used real mode because that's what the tools and the existing code supported. – Rico Pajarola Apr 21 '18 at 23:38
  • @RicoPajarola: If a program can be structured to use under 64K of high-use storage and a bunch of general-purpose storage, protected mode is fine. Loading data from other storage into the 64K of fast storage would be faster in protected mode than using real/protected-mode switching, but if most of what code does is operate within the 64K the performance of accessing outside storage won't matter too much. Using real mode is better if the working set is over 64K but still small enough to fit in real-mode storage. – supercat Apr 22 '18 at 17:52
  • @RicoPajarola: Another factor to consider is that real-mode allows objects up to 65,520 bytes to be allocated anywhere in a 1MB address space without having to worry about crossing segment boundaries. I suppose one could probably allow objects up to 63.75K to be allocated likewise in protected mode if one created a bunch of overlapping segments that start at 256-byte intervals, but if one were going to do that one would be better off just having a mode which simply shifted the lower bits of the segment register by 8 rather than 4. – supercat Apr 22 '18 at 17:59
  • @supercat You're right, for working sets >64KB and <500KB real mode is faster, and in protected mode index overflows become even more problematic because you can't "normalize" a far pointer. Proper tool support could have fixed most of that. But this is all hypothetical because none of the generally available tools supported protected mode, and if they did, they did it in a completely braindead way, so for most developers this was not a choice to be made. – Rico Pajarola Apr 23 '18 at 22:21
  • @supercat In hindsight, the whole 16bit segmented architecture was a bad idea and an evolutionary dead end, and they would have fared much better by investing the extra transistors into true 32bit (or anything >16) offsets instead of this overly complicated protected mode. Heck even a larger shift (why not 16?) for the segment register bits would have done the trick. But they didn't... and to come back to the original question: they didn't anticipate the need for backwards compatibility, which was a mistake. – Rico Pajarola Apr 23 '18 at 22:22
  • @RicoPajarola: To the contrary, the 16-bit segmented architecture was a good idea, though with some slight missteps in implementation. Unfortunately, languages of the day supported it poorly. If 80386 had a mode which behaved like 8086 real mode, but with 32-bit segment values, systems like the Java Virtual Machine could easily and cheaply use 32-bit object references to access up to 64GiB of virtual address space. With a few slight tweaks to the addressing logic, the amount of storage that could be addressed with 32-bit references could be pushed well beyond that. – supercat Apr 23 '18 at 22:42
  • @RicoPajarola: While some might argue that using 64-bit pointers is "easier", they take up twice as much space in cache as 32-bit references would, and thus code which uses a lot of references would end up being less cache efficient using 64-bit pointers than using 32-bit pointers. Some versions of the JVM exploit the 80386 Index*8 address scaling feature to access 32GiB with 32-bit addresses, but that precludes use of that feature for other purposes. – supercat Apr 23 '18 at 22:44
  • @supercat Now you're just plain making up things. The 16-bit segmented architecture was incredibly wasteful, throwing away 12 bits of address space for nothing. And 32-bit indirect references are not more cache efficient than 64-bit pointers, unless you hand optimize your allocations. – Rico Pajarola Apr 25 '18 at 07:44
  • @RicoPajarola: The 8086 had sixteen times the address space of its predecessors, and could locate objects up to 65,520 bytes anywhere within that address space while using 16-bit offsets. Programs for the 68000 which used 16-bit offsets (common on e.g. the Macintosh) were generally limited to objects of 32,767 bytes, and those that used 32-bit offsets had to spend twice as much space storing them. – supercat Apr 25 '18 at 14:33
  • @supercat there are obviously tradeoffs, but I don't think they were good ones unless your only goal was to port 8080 assembly to the new architecture. The 8086 essentially used 32bit address to address 20bit of address space. Sure it was awesome to be able to address almost 64KB large objects anywhere in this address space with just 16bit offsets, but it came at the price of fixing the address space in size, and I don't think it matters all that much in practice. Just compare what software on the mac/amiga/atari looked like versus a PC of the day... – Rico Pajarola Apr 26 '18 at 17:41
  • @RicoPajarola: I programmed the Macintosh back in the day, and it imposed a lot of 32,767-byte limits in places the PC imposed 65,520-byte limits. There are a number of ways the 16-bit x86 architecture could have gone beyond 20 bits of address space while still allowing code to access 768K of storage reasonably fluidly. For example, have a mode where loading a segment value whose upper two bits are not both set would behave like the 8088/8086, and loading a segment value whose upper two bits are are set would would shift the bottom 14 bits left by 10 rather than four. – supercat Apr 26 '18 at 19:42
  • @RicoPajarola: Code would then be able to have up to 768K worth of objects that could be placed at 16-byte granularity, and 16MiB of objects that could be placed at 1024-byte granularity. Alternatively, it could say that a loading of lower segment values would treat their values as in real mode [perhaps adding in a configurable segment base and bounds-checking the resulting addresses], but higher values would load descriptors as in protected mode. In any case, a point many people seem to forget is that the 8086 isn't a 32-bit architecture. Sure 32-bit architectures can... – supercat Apr 26 '18 at 19:57
  • ...do some things more nicely than 16-bit architectures, which in turn are nicer than 8-bit architectures, but 16-bit and 8-bit architectures are cheaper. The only architectures I've seen which can access more than 128KiB of storage more nicely than the 8086 are all based upon 32-bit register sets; despite some flaws in implementation, the 8086 segment-mode design is nicer than any other approach I've seen for a 16-bit architecture to accommodate 1MiB of address space. – supercat Apr 26 '18 at 20:01
  • @supercat The 8086 was using 32bit of register real estate for addresses. Of course there were a number of more graceful ways it could have been extended to go beyond 1MB, but that's not what they did. They could even have simply extended the segment register to be 32 bits to address 256MB without breaking existing code that didn't use the upper bits. They didn't do that either (the 32bit index registers on the 386 were a lot more intrusive for existing code). Both the M68000 and the Z8000 were 16bit architectures but did this a lot better. – Rico Pajarola Apr 28 '18 at 10:50
  • @RicoPajarola: The M68000 was a 32-bit architecture; I've not looked at the Z8000. The 8088 has a primary 16-bit bus which has just about everything sitting on it, plus a segment bus which is fed only from segment registers and feeds only to a 16+12 adder. – supercat Apr 29 '18 at 15:32
  • @supercat I get it, it was a tradeoff. I just don't think it was a good one. The 8086 sacrificed the future for being a better 8080, and yet somehow, it "won" despite being the worst design around at the time. – Rico Pajarola Apr 30 '18 at 19:40
  • @RicoPajarola: What gripes do you have about the 8088 other than (1) it doesn't have enough general-purpose segment registers (adding even one more would help a lot), and (2) the shift amount is fixed at 4? The latter issue could have been addressed with a mode that uses the upper 3 bits of the segment register to select one of eight internally-stored descriptors, each with a programmable scaling factor for the other 13. The silicon cost of eight descriptors would be a small fraction of the cost of the Z8010's 64 descriptors, and the descriptors could have been set up... – supercat Apr 30 '18 at 21:02
  • ...so that the first 640K would work like normal, the next 128K would be similar except shifted to an upper address for the display hardware, and then segments 0xC000-0xDFFF could be configured to scale by ten bits rather than four, allowing 8MiB of address space for objects that are large enough that padding to the next multiple of 1024 bits wouldn't be a huge loss. – supercat Apr 30 '18 at 21:04
  • @RicoPajarola: By contrast, when using the Z8000, how would you set up memory to accommodate a bunch of objects that might vary in size arbitrarily from 1 byte up to e.g. 60,000 bytes? If you configure a an MMU with segments at 4Kbyte intervals, one could place objects anywhere within a 512Kbyte address space--smaller than the 8088. If one doesn't configure segments that way, large objects could only be placed at certain addresses. – supercat Apr 30 '18 at 23:00
  • @supercat What gripes me is that the limitations are deeply ingrained in the architecture, every piece of code that doesn't want to live within 64KB (or 128KB if you do split I/D) has to treat every pointer as 32-bit, while only getting 20-bit of address space. And it is encouraged (required) to perform arithmetic on segment values. I agree that it could have been extended in sensible ways as you describe, but again, that's not what happened. The Z8000 differs in that segments were opaque selectors, but you could also just treat them as linear 23bit addresses. – Rico Pajarola May 05 '18 at 20:03
  • ... and address objects up to a size of 8MB (precomputed 23bit addresses) anywhere within an 8MB address space, leaving the option to transparently add more bits to both the segmented and unsegmented mode in the future. The only code that had to concern itself with the gory details of how segments work was the code that set up the segments, and not ~every piece of code that used a pointer. Also, every register register pair could be used as a pointer. – Rico Pajarola May 05 '18 at 20:16
  • ...the 8086 is only "good" if you see it as an improved 8080 and accept that you can't extend the address space beyond 1MB without breaking compatibility while still wasting 32bit on a pointer. Being able to address 64KB large objects anywhere in address space, pretending it's only a 16bit address is nice, but once you're past that it dwarves compared to the pain of indirectly addressing extended memory in 16KB chunks. In contrast, the Z8000 could have trivially added a linear 23bit or even 32bit address mode, that would only have broken code that relied on address wraparound. – Rico Pajarola May 05 '18 at 20:25
  • @RicoPajarola: On the 8088, code to index into an object would use LES BX,whatever to load the address into ES:BX, and could then do pointer arithmetic on BX without having to worry about crossing a 64K boundary. On the Z8000, not only did code have to worry about crossing hard 64K boundaries, but it would need to increase the top half of the address by 256 rather than by 1. – supercat May 05 '18 at 22:45
  • @supercat you seem to be making a lot of assumptions. LES BX does not normalize the pointer. You still have to worry about crossing 64K boundaries, but, and that's the big mistake, allowing user code to make assumptions on the segment to offset relation makes it impossible to ever change it without breaking all existing code (as history proves with the 286s 16bit protected mode which completely failed to catch on). I agree it was a trade-off. It did make some code easier, but it made everything else much harder and made it impossible to extend the address space in a backwards compatible way. – Rico Pajarola May 07 '18 at 00:14
  • there is no way you can extend the address space in the x86 architecture in a backwards compatible way without either a) adding more registers b) changing the shift by 4 relation between segment and offset c) adding more bits to the registers Either of these options break existing code which won't be able to process "new style" pointers. In contrast, the 68k started with clean 32bit pointers from the get go, and the Z8000 had a clean path to transparent 32bit pointers (going 32bit would have been as easy as respecting the upper 8bits and disabling segment wraparound). – Rico Pajarola May 07 '18 at 00:27
  • @RicoPajarola: LES doesn't normalize the pointer, but normalization is never necessary for objects up to 65,520 bytes in length. That's what makes the segmentation design cool. The only parts of the code that needs to care about the x16 scaling factor are those that deal with chunks of memory over 65,520 bytes in size. Chunks over 65,520 bytes can be split into smaller chunks that can then be used without dealing with the relationship between segments and offsets. – supercat May 07 '18 at 14:48
  • @RicoPajarola: Relatively little code needs to know or care about the segment scaling factor. If different parts of the address space could be scaled differently, code which uses pointers only to access objects which are either aligned to a multiple of the scaling factor or are smaller than 65536-scaling_factor bytes wouldn't need to know or care about the scaling factor of the region in which those objects are located. The problem with the 80286 design is that it's necessary to know, before creating a segment descriptor, how much space... – supercat May 07 '18 at 15:51
  • ...is going to be needed within that segment. If segments were cheap enough that one could be created for every allocated object (as was the case on 8086), that might not be a problem, but protected-mode segments are sufficiently costly that it's necessary to combine objects into segments, which in turn requires that an allocator guess at what size segments to create. – supercat May 07 '18 at 16:01
  • @supercat The 286 protected mode was obviously not very useful and a technological dead end, we don't need to argue about that. But it's simply not true that code that made assumptions about the scaling factor is rare. Compilers at the time routinely normalized every pointer they came across. That's necessary to avoid edge-cases when processing pointers to arrays elements (using arrays is how you get such a large object in the first place). – Rico Pajarola May 08 '18 at 21:46
  • There is a cool aspect of this model: it allows code to use "near" (16bit) or "far" (32bit) pointers without changing the semantics of the addresses, and it saves a few register bits by allowing you to access 1MB of memory with only a 16bit base register, instead of 32bit (or 20bit or 24bit or whatever). But you still need 32bits to store a pointer in memory, and limit your address space to 1MB in a way that's not extensible. – Rico Pajarola May 08 '18 at 22:00
  • You might argue that other contemporary CPUs were cheating by combining two 16bit registers, but so did most 8bit CPUs, and there is no price for being a "pure" 16bit design. And even so, the 8086 used 2 16bit registers to form a 20bit address, and it saved nothing except for a few bits in the adder, and possibly, a 16bit load when accessing a pointer (I say possibly, because for every example I can come up with, I find a way to avoid the extra load in other ways). – Rico Pajarola May 08 '18 at 22:13
  • @RicoPajarola: There's a difference between far and huge pointers, even though both use the same representation. The latter require that code normalize them at just about every opportunity, while the former forbid that. Compilers often had a mode to treat pointers as huge by default, but programmers wanting any kind of decent performance would avoid that like the plague if there was any way they could limit the sizes of individual objects to 65,520 bytes or less. Given static char far *p, huge *q;, the code for p+=123; would simply be add [word p],123, while... – supercat May 08 '18 at 22:33
  • ...the code for q += 123; would be some horrible mess. If you've only ever used huge pointers, you may not appreciate the 8086, but that's simply because "you've been using it wrong". In large mode, if I do char *p = malloc(49152); the segment part of p will be in the range 0..16384. If I then do p += 8192; the segment part will be 8192..24575. No sequence of additions and subtractions that stays within my 49152-byte object will require any change to the segment value. Unless code deals with individual objects greater than 65,520 bytes, there's no reason... – supercat May 08 '18 at 22:45
  • ...it should need to do any kind of math with segment registers or check for carry when performing offset calculations. – supercat May 08 '18 at 22:45
  • @RicoPajarola: When I first started programming the 8088 many decades ago, I found the segmented architecture frustrating because I was always wanting to normalize pointers. After I came to realize that normalizing pointers was very seldom necessary, I started to appreciate the benefits of the 8086 design. – supercat May 11 '18 at 18:50
  • @supercat I get what you're saying. I still disagree. What you are saying is what I would call Stockholm syndrome... it makes sense only if you assume that there is some inherent limitation that prevents them from doing better. Far/Near pointers are a C-centric concept (there are other languages out there, you know). They require the programmer to make a choice between two not so great options. Far pointers are easy to use, but limited to 65k while still wasting 32 bits on a pointer. Huge pointers are clumsy and slow. – Rico Pajarola May 12 '18 at 14:38
  • ... there's no inherent reason huge pointers couldn't be handled in hardware to provide a flat address space like the 68k did. The Z8000 made a different tradeoff, but at least it left the window open for future expansion (IMHO the tradeoff in the Z8000 wasn't a good one either, but at least it left the possibility for future expansion). – Rico Pajarola May 12 '18 at 14:43
  • @supercat to reiterate, my beef with the x86 is that it makes future expansion impossible, while still making it hard to use (and requiring the user to make choices like near vs far vs huge pointers), for the sole benefit of being able to address 65520 bytes with a pseudo 16 bit pointer, while still wasting 32 bits of space both in memory and registers. A flat address mode with register pairs as pointers could have used a 20 bit adder to address 1MB of memory, not require those silly near/far/huge tradeoffs and be expanded in the future simply by adding more bits to the adder. – Rico Pajarola May 12 '18 at 14:51
  • @RicoPajarola: The architecture could have been extended while retaining compatibility with a lot of code by allowing different ranges of segments to use different bases and scaling factors. Even something as simple as having a mode that treats segments whose upper two bits are set as having a hard-coded scaling factor of 1024 rather than 16 would have extended the addressing range to 16 megabytes, maintained compatibility with code that uses the bottom 768K in whatever fashion, and allowed code that is scaling-factor agnostic (typical of most large-mode code) to... – supercat May 12 '18 at 17:14
  • ...access objects anywhere in the 16MiB addressing range without modification. So I don't think "lack of expandability" is really a fair complaint. Deciding whether to use "huge" pointers isn't hard if one remembers a simple rule: avoid them if at all possible. Given long far *arr, the code for arr[i]++ would be mov ax,[_i] / add ax,ax / add ax,ax / les bx,[_arr] / add bx,ax / add word [bx],1 / adc word [bx+2],0. Under the scheme you describe, how would you avoid having to do something like: – supercat May 12 '18 at 17:35
  • mov ax,[_i] / lptr bx,[_arr] / signextend ax / add ax,ax / adc axh,axh / add ax,ax / add axh,axh / add bx,ax / adc bxh,axh / inc byte [bx] / jnz done / add bx,1 / adc bxh,0 / inc byte [bx] / jnz done / add bx,1 / adc bxh,0 / inc byte [bx] / jnz done / add bx,1 / adc bxh,0 / inc byte [bx] / done:? Imposing a 32-bit alignment requirement on "long" values would make things less horrible, but still nowhere near as nice as what the 8086's "far" pointers can do relatively effortlessly. – supercat May 12 '18 at 17:45
  • but just for entertainment: long far *arr; arr[i]++ would be mov di,[_i] / shl di, 1 / shl di, 1 / les bx,[_arr] / inc word [es:bx+di] / adc word [es:bx+di+2], 0 What I mean with flat 32bit pointers is that "es:bx" is treated a 32bit entity (exactly like the 8080's HL register), with indexed addressing modes that "just work" and appropriate instructions to manipulate them as a single entity (we already have lds and les, why not have something to add/subtract from them with proper overflow handling?). Code that doesn't make assumptions about what the bits mean can stay unchanged. – Rico Pajarola May 19 '18 at 17:16
  • @RicoPajarola: Programs that need to access more than a few megs of address space will often need to do enough 32-bit bit math that they should be run on 32-bit architectures. The fact that a 16-bit architecture would peg out at 16MiB really isn't a limitation. The performance of something like Java or .NET on a platform that extended segment registers to 32 bits and scaled them using two sizes of region, similar to what I described earlier, could be better than performance in 64-bit mode (since object references would only need to take up 4 bytes in cache rather than 8) while being able... – supercat May 19 '18 at 21:50
  • ...to access 64GiB of "small" objects aligned on 16-byte boundaries and 1024GiB of "large" objects aligned on 1024-byte boundaries. Some Java implementations can use the 8x-scaling effective address mode to handle 32GiB of address space with 32-bit references, but using segment registers as object IDs would be more efficient. – supercat May 19 '18 at 21:53
  • @supercat sure, 16MB in 16bit mode is not really a limitation. I still think having fixed translation schemes that wastes bits is a bad idea. A non-uniform schema is worse. This applies to 32bit architectures even more, since it's not 1976 anymore and you don't need to save a few bits on an adder anymore: if you're already wasting 8 bytes on a pointer, might as well use all the bits. As for java: why on earth would using segment registers as object ids be more efficient than scaled offsets in a flat address space? It just makes everything really complicated and probably ruins your cache. – Rico Pajarola May 21 '18 at 08:21
  • @RicoPajarola: In a well-designed segmented architecture, allocations may be identified using just the segment part of a pointer, and multiple locations which are known to be within the same allocation can be identified using one copy of the segment along with one offset for each distinct location. Some pointers would need to be stored as 64-bit values, but when using a language and architecture that are designed to work well together, such pointers would be in the minority. – supercat Sep 07 '23 at 16:02