Why was there a need for separate I/O address space in addition to a memory address space already?

Question

I was reading through PCI and PCIe configuration access mechanism in Chapter 3 (page 96) of PCIe System Architecture (Mindshare series). As a solution to prevent locking (in case of multiple threads) due to two CPU I/O space access for a single Configuration Space access and expanding configuration space (256 B to 4 KiB), there has been a shift to use memory address space instead of I/O address space.

Why wasn't this done in the first place and what would have motivated having two different address spaces in the past?
(With x86 having special in and out instructions to access I/O address space.)

I tried finding the motivation for separate address space on the web, but I couldn't find it. All resources I laid my hands over mention the presence of two address spaces (I/O address space and memory address space), but they do not mention why or why not a single memory address space could be used from the beginning.

This was posted in StackOverflow. Adding @MargaretBloom 's comments from there:
"Maybe Retrocomputing can answer this better. My guess is that comes from a time when there were really two separate buses for memory and IO and the memory bus was just wires dedicated to the RAM. With time we realized we can unify the two at least in the initial segment." — analogkp, Oct 30 '22 at 14:40
crossposting is not considered a great idea. you should have the question migrated over here. — Raffzahn, Oct 30 '22 at 14:52
"I/O space" in the x86 sense is really just device numbering. A port number denotes a device (or part of a device). Once you've decided that you're having specific instructions to do I/O, you're pretty much got yourself an I/O space. — dave, Oct 30 '22 at 15:31
In my oppinion (it's an oppinion, that's why its just a comment), the main point of a separate I/O space is that the whole memory space can be used for memory (as the name says, that's its point!). Today, x86-64 provides us with 48 bits of physical memory space, but no system provides 256TB of RAM. In the old days, 64KB of RAM was not that uncommon in system like the C64, as the memory space was just 64KB, Instead of bank switching between being able to access the SID or VIC registers or RAM at an adress aliasing to it, having a separate I/O space is actually a good idea. — Michael Karcher, Oct 30 '22 at 21:58
@Raffzahn: Thank You. I've deleted it from stackoverflow to avoid duplication. — analogkp, Oct 31 '22 at 05:29
In 1961, DEC introduced the PDP-1. It had two busses and two adress spaces. By 1971, DEC had introduced the PDP-11. It featured one bus, the Unibus, that could address both memory locations and device controllers. Both of these predate microcomputers. — Walter Mitty, Oct 31 '22 at 10:32
In the real old days, it was much more complicated https://en.wikipedia.org/wiki/IBM_System/360_architecture#Input/Output — John Doty, Oct 31 '22 at 17:33
@JohnDoty Not much, as the CUA was essentially a 16 bit port number, much like with an 8086 or Z80. The big difference was that the I/O instructions were much more sophisticated - essentially macros to be executed by a DMA processor. Something that could easy be done as well with micro. In fact, the 8089 IO-Processor would have allowed an operation quite like on a/360. Serious, the PC would have been a realnice machine if IBM had used the 8089 instead of cheap ass 8 bit DMA with crappy l´bank registers. — Raffzahn, Oct 31 '22 at 22:19

score 32 · Accepted Answer · edited Oct 31 '22 at 20:29

32

do not mention why or why not a single memory address space could be used from the beginning.

Simply because a dedicated I/O space simplifies system design.

It may be assumed that you're asking mainly about the way it is done on x86 machines. As 8080 descendants, they signal I/O access by a dedicated addressing cycle and using a dedicated address space but using the same address lines. These are not two separate buses - due to reduce pin count

Having an I/O Space Has Advantages:

Memory decoding does not need to care for I/O specialities, like slower access times.
I/O address decoding and memory decoding can be designed independently of each other
- Decoding of I/O chips did not need to decode the whole address space, but only the way smaller I/O space, as a dedicated I/O signal does the rest.
- Different approaches for incomplete decoding can save chips
The full primary address space are available for code/data
- 64 Ki RAM (8080) aren't that much to start with, especially with ROM and buffers included, excluding I/O reliefs that (a bit)
- But even with the 1 Mi address space of an 8086, having additional 64 Ki for I/O is as helpful (*1).
Full 256 (8080) or later full 64 Ki (Z80, 8086) can be used for I/O
- The later quite handy to take for example video and/or disk buffers out of main memory
By separating I/O instructions from memory instruction no random memory access can initiate an unwanted or even dangerous I/O process.
Last but not least, a dedicated I/O space and dedicated I/O instructions ease the task of handling I/O privilege and I/O virtualization

It's a Matter of Heritage:

The i8086 inherited that concept from the i8080 (*2)
The i8080's implementation is a generalized version of the way the i8008 handled I/O
The i8008 is in turn just a single chip implementation of the Datapoint 2200 CPU.
The Datapoint 2200 was a discrete TTL design featuring about 100 chips. Having dedicated I/O instructions removed the need for address decoding at all. Quite useful to keep it simple.

It Wasn't Just Intel's Thing

Other early CPU followed the same or similar concepts:

The Valvo/Signetics 2650 had an 8 bit address space, much like the 8080, and in addition a 1 bit space.
TI's 9900 supported an additional 12 bit address space for bitwise I/O which could transfer 1 to 16 bits from either address.
The Fairchild F8 in turn had no address bus at all, but featured two I/O ports that could transfer addresses to an external unit containing the PC (3851) or generate an address bus (3852) - but these two ports cold be as well used for direct I/O (1 bit address space). They were part of a 4 bit address space to be accessed by dedicated instructions.

So there is (well, was) way more out there and the 64Ki 8086 I/O space is eventually just the most simple and generic implementation of that idea.

*1 - That IBM did nonetheless put I/O into memory is design decision - not the best, but that's a common theme with the original PC, isn't it?

*2 - After all, it was THE main requirement of the 8086 design to be bus and instruction compatible to allow low effort redesign of systems and mostly automated software conversion.

edited Oct 31 '22 at 20:29

Riot

103
2

answered Oct 30 '22 at 15:06

Raffzahn

222,541
22
631
918

3

x86 IN AL,DX takes a 16-bit IO port number in DX. IO address space has 16-bit addresses. Are you talking about 8080 when you say there are only 256 IO addresses, or did IBM-PC only wire up / support the low 8 bits? (Some things that use fixed port numbers in x86 use a number in the low 8 bits anyway, because the immediate form of in al, imm8 only allows an 8-bit port number.) – Peter Cordes Oct 30 '22 at 20:19
4

When you say 8086 was "instruction compatible" with 8080, important to clarify that it's not binary compatible; a translating assembler was needed. But yes 8086 was designed to make mechanical translation possible. (I know you know this, but I worried your phrasing might have misled some readers.) See The start of x86: Intel 8080 vs Intel 8086? and other Q&As for more details. – Peter Cordes Oct 30 '22 at 20:23
@PeterCordes a) 256 vs. 64Ki, please read at whole before commenting, as that is noted, not to mention that basic (compatible) IN&OUT (E4h..E7h) is as well limited to an 8 bit I/O address. b) It's a footnote to add background, to hint about the reasoning. No deed to add implementation details about what level of compatibility has been reached. Compatible is a wide range, if you want to learn more, a search on RC.SE will help a lot. – Raffzahn Oct 30 '22 at 21:02
1

I did read the whole answer. It doesn't clearly state whether that first bullet list is talking about 8080 or x86. It mentions that 8086 is an 8080 descendant without specifically saying you're now talking about 8080. It's only from prior knowledge and guesswork that I can infer you might be talking about 8080 having an 8-bit IO address space, because I know it's not true for x86 and I assume wasn't true for first-gen 8086. – Peter Cordes Oct 30 '22 at 21:27
@PeterCordes read on and you may as well reach "Full 256 (8080) or later full 64 Ki (Z80, 8086)" :)) – Raffzahn Oct 30 '22 at 21:36
5

Another aspect that you tangentially touch on - a separate IO space means the processor can assume that all of memory behaves like memory. (Reads return what was written, speculative accesses are safe, accesses can be restarted, access sizes are unimportant, etc.) Modern processors already have significant amounts of machinery to do quick translation table lookups and so can 'just' encode that info into the translation table (ARM MAIR indexes, etc); this would be far more expensive (comparatively speaking) on a smaller cpu that doesn't already have all that machinery. – TLW Oct 31 '22 at 03:01
1

@TLW Not really, as Memory can still contain ROM which does never return what was written (or only under special circumstances in case of FLASH/...). – Raffzahn Oct 31 '22 at 08:36
1

@TLW: From a hardware perspective, the 6502's approach of treating everything as a unified address space simplifies things both inside and outside the CPU, for systems that had commonplace amounts of memory. If a system has e.g. a pair of 1Kx4 RAM chips, two 2Kx8 ROM chips, and five I/O devices, using a 74138 to decode bits address bits 11-13 would be simpler than having to use a three -way decoder to handle RAM and ROM and a five-way decoder for I/O. For CPUs which had a reason to care about whether regions reliably acted as RAM, having separate RAM/IO areas was useful, but... – supercat Oct 31 '22 at 21:59
...for CPUs which process every read operation in a manner that's agnostic to any reads or writes to the same address that might have preceded it, memory-based I/O can be just as efficient if not more so than dedicated-instruction I/O. – supercat Oct 31 '22 at 22:00
@Raffzahn: Even if the PC had been designed to decode the top half of the I/O address space for display-buffer use, the performance of code using I/O instructions for display updates would have been much worse than the performance of code using a memory-mapped display buffer. – supercat Oct 31 '22 at 22:01
@supercat Not at all. A MOV [BX],AX needs 18 clocks on a 8088 (buffer address prepered in BX), while OUT DX,AX is just 12 clocks (same buffer address, only in DX). Even better if a system of address latches would have been used, eliminating all need for segment setting and segment overwrite. – Raffzahn Oct 31 '22 at 22:11
1

@Raffzahn: What if you compare REP MOVSW versus LODSW / OUT DX,AX / LOOP? Or, XOR [BX],AX with IN DX,AX / XOR AX,BX / OUT DX,AX? Or, for that matter, a screen scroll performed via IN AX,DX / SUB DX,SI / OUT AX,DX / ADD DX,SI / LOOP? – supercat Oct 31 '22 at 22:22
@supercat look at the BIOS (or various program sources) and you'll see that the overwhelming number of access cycles are single character modifications. called between lots of other code. Long story short, memory mapped makes no real world advantage. such will only come with dedicated hardware (bitblit etc) which can be implemented either way. – Raffzahn Oct 31 '22 at 22:28
3

@Raffzahn: BIOS text routines are horribly inefficient, but many programs for the CGA and MDA use other means of performing screen I/O. Graphics code benefits from being able to use instructions like AND, OR, and XOR, as well as the ability to perform sequential addresses to e.g. SI, SI+80, SI+160, etc. without having to use an AND instruction between each pair of addresses. – supercat Oct 31 '22 at 22:48
4

@PeterCordes In the original (8088) PC, it was a 16-bit I/O address space but there was a design rule that only the low ten bits were significant to peripherals. See https://retrocomputing.stackexchange.com/questions/12370/did-the-ibm-game-control-adapter-have-i-o-port-aliases – smitelli Nov 01 '22 at 13:09
@smitelli: Was there a rule that peripherals should ignore the upper six bits, or merely that they should not rely upon any other peripherals to decode upper address bits to avoid collisions? – supercat Nov 01 '22 at 17:46
I don't know anything aboutt he 2650, but what you say about the sizes of its address spaces seems like a mistake. "an 8 bit address space, [...], and in addition a 1 bit space" seems absolutely tiny, doesn't it? – Omar and Lorraine Dec 07 '22 at 14:04
@OmarL Well, its the mechanic expressed in terms of address space. There is a set of IN/OUT instructions to output using an 8 bit address and then another set of IN/OUT of 8 bit data using a single address bit. Those addresses do not overlap, so they are separate address spaces. The second set has the advantage of being single byte instructions, so 4 ports (two in, two out) can be accessed in the shortest and fastest way possible. Very useful considering that many (embedded) applications only have few ports at all, and usually only one or two high priority to be polled all the time. – Raffzahn Dec 07 '22 at 15:02
1

@supercat I never heard of a rule that the upper 6 bits must be ignored, just that the upper 6 bits may be ignored by any card. Later cards did indeed use the top 6 bits as further address bits on that card (think 8514/A for example, or the EMU8000 on the Soundblaster AWE32). EISA specified that on-board components (000..0FF) need to decode 16 bits, and got all the mirror areas of the on-board components (400..4FF, 800..8FF up to FC00..FCFF) as free extra I/O space - I never heard of serious compatibility complaints. PCI then decided to also just use this space. – Michael Karcher Feb 07 '24 at 23:00

score 0 · Answer 2 · answered Dec 07 '22 at 11:02

This answer should not be seen as @Raffzan's rival but a complement.

There were indeed many microprocessors that did not use dedicated I/O space. Computers using the Motorola 6800 and derivatives (I'm using the term "derivatives" in the broad sense) usually had restrictions on where the ROM had to be placed and this had some unpleasant effects on how they where programmed.

First of all the ROM area had to be placed at the high-address area - fact that physically limits the code and makes it unexpandable unless jumps are made (if you heard the term "spaghetti code", that's it exactly). I/O area was by convention mapped at FExxh inside the ROM space, which means there were cells in the ROM that had to be sacrified for the sake of I/O. Having the I/O in the ROM space also meant more decodification had to be done.

Additionally, there are even more troubles, especially when expanding the memory space:

When designing a microcomputer with bank-based memory expansions to use with most 8-bitters not derived from Intel this means either the page corresponding to the ROM position must sacrifice the cells at the I/O area which turns out to be extremely inneficient memory wise. The other way is to not bank at all the page containing the ROM.
When designing a microcomputer with a linear memory but expecting it to be code-compatible with a previous generation microcomputer with a microprocessor with a smaller memory map things also can go awry very easy. Remember that those microprocessors expected the ROM area at the end of the memory map... but the legacy I/O and ROM may not be at the end to ensure compatibility. This means both ROM and RAM may not have a single and big space but multiple smaller ones across the memory map. The same happens with the I/O: wou may end with not one but multiple and scattered I/O spaces across the map. There are also other hazards related to this implementation but are outside of the scope of this answer.

To sum it up a separate I/O:

Enables the designers of having the ROM and RAM areas to be compact and without having to sacrify cell space.
Enables easy addition of more memory in subsequent models of the microcomputer line in case the microprocessor gets replaced with a derivative (now in the strict sense) with more addressing space.
Allows easier backwards-compatibility among models because it is always located at the same positions.
Allows designers simple decoding circuits, with also simple modifications in case the I/O space is increased with newer microprocessors offering more area for this purpose.
Eases bank-switching mechanisms and makes possible clean implementations.
Nobody forbids to use a microprocessor with separated I/O to have memory-mapped I/O. In fact back in the day most 8-bitters using 8080 and derivatives (again in the broad sense) had an hybrid I/O system by using the I/O designed area for most purposes but had the video memory in the CPU memory map - only systems such as the MSX, Tatung Einstein, Sega 8-bit, NES/Famicom et all which had separate video memory managed by a dedicated VDP ic or similar everything else used this doctrine.

Why was there a need for separate I/O address space in addition to a memory address space already?

2 Answers2

Having an I/O Space Has Advantages:

It's a Matter of Heritage:

It Wasn't Just Intel's Thing