do not mention why or why not a single memory address space could be used from the beginning.
Simply because a dedicated I/O space simplifies system design.
It may be assumed that you're asking mainly about the way it is done on x86 machines. As 8080 descendants, they signal I/O access by a dedicated addressing cycle and using a dedicated address space but using the same address lines. These are not two separate buses - due to reduce pin count
Having an I/O Space Has Advantages:
- Memory decoding does not need to care for I/O specialities, like slower access times.
- I/O address decoding and memory decoding can be designed independently of each other
- Decoding of I/O chips did not need to decode the whole address space, but only the way smaller I/O space, as a dedicated I/O signal does the rest.
- Different approaches for incomplete decoding can save chips
- The full primary address space are available for code/data
- 64 Ki RAM (8080) aren't that much to start with, especially with ROM and buffers included, excluding I/O reliefs that (a bit)
- But even with the 1 Mi address space of an 8086, having additional 64 Ki for I/O is as helpful (*1).
- Full 256 (8080) or later full 64 Ki (Z80, 8086) can be used for I/O
- The later quite handy to take for example video and/or disk buffers out of main memory
- By separating I/O instructions from memory instruction no random memory access can initiate an unwanted or even dangerous I/O process.
- Last but not least, a dedicated I/O space and dedicated I/O instructions ease the task of handling I/O privilege and I/O virtualization
It's a Matter of Heritage:
- The i8086 inherited that concept from the i8080 (*2)
- The i8080's implementation is a generalized version of the way the i8008 handled I/O
- The i8008 is in turn just a single chip implementation of the Datapoint 2200 CPU.
- The Datapoint 2200 was a discrete TTL design featuring about 100 chips. Having dedicated I/O instructions removed the need for address decoding at all. Quite useful to keep it simple.
It Wasn't Just Intel's Thing
Other early CPU followed the same or similar concepts:
- The Valvo/Signetics 2650 had an 8 bit address space, much like the 8080, and in addition a 1 bit space.
- TI's 9900 supported an additional 12 bit address space for bitwise I/O which could transfer 1 to 16 bits from either address.
- The Fairchild F8 in turn had no address bus at all, but featured two I/O ports that could transfer addresses to an external unit containing the PC (3851) or generate an address bus (3852) - but these two ports cold be as well used for direct I/O (1 bit address space). They were part of a 4 bit address space to be accessed by dedicated instructions.
So there is (well, was) way more out there and the 64Ki 8086 I/O space is eventually just the most simple and generic implementation of that idea.
*1 - That IBM did nonetheless put I/O into memory is design decision - not the best, but that's a common theme with the original PC, isn't it?
*2 - After all, it was THE main requirement of the 8086 design to be bus and instruction compatible to allow low effort redesign of systems and mostly automated software conversion.
"Maybe Retrocomputing can answer this better. My guess is that comes from a time when there were really two separate buses for memory and IO and the memory bus was just wires dedicated to the RAM. With time we realized we can unify the two at least in the initial segment."
– analogkp Oct 30 '22 at 14:40