Why were there no 32-bit versions of 65xx CPUs, or 64-bit versions of m68k CPUs?

Question

I don't understand why Western Design Center made the 65816 a 16-bit upgrade to the 6502 but Commodore Semiconductor Group/MOS Technology didn't make their own variant & why neither company made 32-bit or 64-bit versions of the architecture.

Also I don't understand why Motorola switched over to PowerPC architecture instead of developing a 64-bit variant of the 68000 architecture or why they never made more powerful 32-bit processors after the 68040.

Things were very competitive and RISC processors were starting to win. It took intel a tremendous effort ($$$) to keep its processors competitive; investment that I suspect others simply couldn't justify. — Erik Eidt, Oct 28 '20 at 23:36
Regarding the 68040: That was by no means the last of the 68k CPUs - The 68k series was a true 32-bit CPU starting from the 68020. The series also contained the 68060, which had an internal architecture very close to the Intel Pentium. — tofro, Oct 29 '20 at 06:53
Regarding the 6502/65816, we've had this before: https://retrocomputing.stackexchange.com/questions/14864/what-happened-to-the-65832/ — Michael Graf, Oct 29 '20 at 07:03
Note that even Intel didn't make a 64-bit version of the x86 architecture. They went with the Itanium instead. It was AMD that extended the x86 architecture to 64-bits. — , Oct 29 '20 at 14:57
@TEMLIB: From what I understand, the ColdFire makes no effort to be binary-compatible with most programs for the 68000. Functions of the 68000 which were troublesome to implement, and wouldn't be too hard to do without, were eliminated, but many such functions were in fact used by programmers and compilers who were targeting the 68000. — supercat, Nov 07 '20 at 20:50
@supercat. Yes, ColdFires didn't support all of the original MC68k familiy instructions. Targeted as embedded CPUs, compatibility was less important (like some low end ARM cores with reduced features). There is still projets like http://firebee.org which implements an ATARI ST-like computer using a 260MHz ColdFire CPU. — Grabul, Nov 07 '20 at 22:21
@TEMLIB: I'm loath to regard subset architectures as belonging to the same family as their predecessors unless the omitted functions are genuinely obscure; I don't think that's really true of the Coldfire. — supercat, Nov 07 '20 at 22:23
@supercat The FireBee project claims "With an additional lightweight software layer, the ColdFire CPU can be made compatible with existing 680x0 programs.". Probably some "trap and emulate" method for unimplemented instructions. Motorola did that before when they discarded all complex MC68881/2 instructions in the MC68040 FPU. Motorola decided to restrict the 68k family for embedded devices and favoured higher-end applications to the failed MC88000, then PowerPC. — Grabul, Nov 07 '20 at 22:28

Liam Proven · Answer 1 · 2020-10-31T22:25:46.313

The premise in the question is incorrect. There were such chips. The question also fails to allow for the way that the silicon-chip industry developed.

Moore's Law basically said that every 18 months, it was possible to build chips with twice as many transistors for the same amount of money.

The 6502 (1975) is a mid-1970s design. In the '70s it cost a lot to use even thousands of transistors; the 6502 succeeded partly because it was very small and simple and didn't use many, compared to more complex rivals such as the Z80 and 6809.

The 68000 (1979) was also from the same decade. It became affordable in the early 1980s (e.g. Apple Lisa) and slightly more so by 1984 (Apple Macintosh). However, note that Motorola also offered a version with an 8-bit external bus, the 68008, as used in the Sinclair QL. This reduced performance, but it was worth it for cheaper computers because it was so expensive to have a 16-bit chipset and 16-bit memory.

Note that just 4 years separates the 6502 and 68000. That's how much progress was being made then.

The 65C816 was a (partially) 16-bit successor to the 6502. Note that WDC also designed a 32-bit successor, the 65C832. Here is a datasheet: https://downloads.reactivemicro.com/Electronics/CPU/WDC%2065C832%20Datasheet.pdf

However, this was never produced. As a 16-bit extension to an 8-bit design, the 65C816 was compromised and slower than pure 16-bit designs. A 32-bit design would have been even more compromised.

Note, this is also why Acorn succeeded with the ARM processor: its clean 32-bit-only design was more efficient than Motorola's combination 16/32-bit design, which was partly inspired by the DEC PDP-11 minicomputer. Acorn evaluated the 68000, 65C816 (which it used in the rare Acorn Communicator), NatSemi 32016, Intel 80186 and other chips and found them wanting. Part of the brilliance of the Acorn design was that it used slow DRAM effectively and did not need elaborate caching or expensive high-speed RAM, resulting in affordable home computers that were nearly 10x faster than rival 68000 machines. (The best layman's explanation of this I've seen is the Ultimate Acorn Archimedes Talk at the Chaos Computer Congress 36C3.)

The 68000 was 16-bit externally but 32-bit internally: that is why the Atari machine that used it was called the ST, short for "sixteen/thirty-two".

The first fully-32 bit 680x0 chip was the 68020 (1984). It was faster but did not offer a lot of new capabilities, and its successor the 68030 was more successful, partly because it integrated a memory management unit. Compare with the Intel 80386DX (1985), which did much the same: 32-bit bus, integral MMU.

The 80386DX struggled in the market because of the expense of making 32-bit motherboards with 32-bit wide RAM, so was succeeded by the 80386SX (1988), the same 32-bit core but with a half-width (16-bit) external bus. This is the same design principle as the 68008. Motorola's equivalent was the 68EC020, as used in the Amiga 1200.

The reason was that around the end of the 1980s, when these devices came out, 16MB of memory was a huge amount and very expensive. There was no need for mass-market chips to address 4GB of RAM — that would have cost hundreds of thousands of £/$ at the time. Their 32-bit cores were for performance, not capacity.

The 68030 was followed by the 68040 (1990), just as the 80386 was followed by the 80486 (1989). Both also integrated floating-point coprocessors into the main CPU die. The progress of Moore's Law had now made this affordable.

The line ended with the 68060 (1994), but still 32-bit — but again like Intel's 80586 family, now called "Pentium" because they could't trademark numbers — both have Level 1 cache on the CPU die.

The reason was because at this time, fabricating large chips with millions of transistors was still expensive, and these chips could still address more RAM than was remotely affordable to fit into a personal computer.

So the priority at the time was to find way to spend a limited transistor budget on making faster chips: 8-bit → 16-bit → 32-bit → integrate MMU → integrate FPU → integrate L1 cache → integrate L2 cache

This line of development somewhat ran out of steam by the mid-1990s. This is why there was no successor to the 68060.

Most of the industry switched to the path Acorn had started a decade earlier: dispensing with backwards compatibility with now-compromised 1970s designs and starting afresh with a stripped-down, simpler, reduced design — Reduced Instruction Set Computing (RISC).

ARM chips supported several OSes: RISC OS, Unix, Psion EPOC (later renamed Symbian), Apple NewtonOS, etc. Motorola's supported more: LisaOS, classic MacOS, Xenix, ST TOS, AmigaDOS, multiple Unixes, etc.

No single one was dominant.

Intel was constrained by the success of Microsoft's MS-DOS/Windows family, which sold far more than all the other x86 OSes put together. So backwards-compatibility was more important for Intel than for Acorn or Motorola.

Intel had tried several other CPU architectures: iAPX-432, i860, i960 and later Itanium. All failed in the general-purpose market.

Thus, Intel was forced to to find a way to make x86 quicker. It did this by breaking down x86 instructions into RISC-like "micro operations", re-sequencing them for faster execution, running them on a RISC-like core, and then reassembling the results into x86 afterwards. First on the Pentium Pro, which only did this efficiently for x86-32 instructions, when many people were still running Windows 95/98, an OS composed of a lot of x86-16 code and which ran a lot of x86-16 apps. The Pentium Pro also had 8KB of onboard L1.

Then with the Pentium II, an improved Pentium Pro with L2 cache (first on a separate die, later integrated) and improved x86-16 optimisation — but also around the time that the PC market moved to Windows XP, a fully x86-32 OS.

In other words, even by the turn of the century, the software was still moving to 32-bit and the limits of 32-bit operation (chiefly, 4GB RAM) were still largely theoretical. So, the effort went into making faster chips with the existing transistor budget.

Only by the middle of the first decade of the 21st century did 4GB become a bottleneck, leading to the conditions for AMD to create a 64-bit extension to x86.

The reasons that 64-bit happened did not apply in the 1990s (EDIT: OK, except in high-end RISC chips for workstations -- between 1991 - 1994, all the main RISC processors went 64-bit: MIPS, SPARC and finally IBM POWER. DEC's Alpha chips were 64-bit from the start in 1992 and were arguably the only fully-native 64-bit mass-market CPU.)

From the 1970s to about 2005, 32 bits were more than enough, and CPU makers worked on spending the transistor budgets on integrating more go-faster parts into CPUs. Eventually, this strategy ran out, when CPUs included the integer core, a floating-point core, a memory management unit, a tiny amount of L1 cache and a larger amount of slower L2 cache.

Then, there was only 1 way to go: integrate a second CPU onto the chip. Firstly as a separate CPU die, then as dual-core dies. Luckily, by this time, NT had replaced Win9x, and NT and Unix could both support symmetrical multiprocessing.

So, dual-core chips, then quadruple-core chips. After that, a single user on a desktop or laptop gets little more benefit. There are many CPUs with more cores but they are almost exclusively used in servers.

Secondly, the CPU industry was now reaching limits of how fast silicon chips can run, and how much heat they emit when doing so. The megahertz race ended.

So the emphases changed, to two new ones, as the limiting factors became:

the amount of system memory
the amount of cooling they required
the amount of electricity they used to operate

These last two things are two sides of the same coin, which is why I said two not three.

Koomey's Law has replaced Moore's Law.

BTW I would be extremely interested if anyone managed to implement a 65C832 on an FPGA based on that datasheet, and I am sure that many other people would be too. :-) — Liam Proven, Oct 29 '20 at 18:05
I've been working on 65816 emulation recently and trawling through comp.sys.apple2 circa 1990; if you believe the scuttlebutt then the 65816 was hand drawn long after that was sensible, and had both bugs and terrible timing characteristics. Tony Fadell (later of the iPod) may or may not have reimplemented it as a gate array during his student days in the late '80s using borrowed equipment and time, producing a faster more robust version; it definitely never shipped in volume but some claimed to have samples. Allegedly the 65832 was never fabricated just because no customers ever wanted it. — Tommy, Oct 29 '20 at 18:13
"The reasons that 64-bit happened did not apply in the 1990s." - someone forgot to tell DEC, who introduced the DEC Alpha c. 1992 :p — Alnitak, Oct 30 '20 at 09:42
@Alnitak Or the Cray-1 in 1976. It is fair to say that these were extremely niche applications. — J..., Oct 30 '20 at 10:20
I wouldn't have called the Alpha "niche". It certainly wasn't a consumer machine, but they were very popular as workstations at the Uni I was working in back then. — Alnitak, Oct 30 '20 at 11:45
Yes, my uni bought an Alpha-based machine - the size of a large desktop unit - to replace an entire wall full of VAX hardware. — Will Crawford, Oct 30 '20 at 15:26
Yes, it's a fair cop. I didn't think of the early 64-bit RISC chips, but I submit that they weren't in the same market as the 680x0 family. I'd give as an example that IBM was happily selling POWER workstations for years before the AIM alliance devised the smaller, cheaper, single-chip PowerPC.
I think Alpha was 1992 and SPARC v9 1994. (I can't quickly find when MIPS went 64-bit.) It took a while after that for them to become affordable. They weren't competing in the same market at all. — Liam Proven, Oct 30 '20 at 21:19
"The R4000 is a microprocessor developed by MIPS Computer Systems that implements the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991" — Martin Schröder, Oct 31 '20 at 20:12
BTW, from discussions of this post in other places, I've learned that there multiple 32-bit 6502 designs -- but AFAIK, none have ever been implemented.
http://wilsonminesco.com/links.html

There are more here, too: http://forum.6502.org/viewtopic.php?f=1&t=4216

I even got a comment from Bill Mensch himself! — Liam Proven, Oct 31 '20 at 22:01
@Alnitak Nobody ever said 64-bit "happened" with DEC. That's how you know it's niche. Of course it doesn't feel niche to you because you spent a lot of time with them. This only exposes the biases in our own subjective experience. — J..., Nov 01 '20 at 10:42
" the 68020 (1984). It was faster but did not offer a lot of new capabilities," strongly disagree. It added a lot of addressing modes (which was imo the big error that doomed the family later), added instruction cache, pipelining, unaligned memory access, coprocessor interface and fixed up a lot of the small error of the 68000. The 68030 was much less inovative, it was only a 68020 with integrated 68885 and data cache. — Patrick Schlüter, Nov 03 '20 at 12:55
68EC020 had a 32 bit data bus. It was crippled in the address bus width which was only 24 bit wide. 68020 and EC020 could handle 8, 16, 24 and 32 bit wide data buses. Amiga 1200 used a 32 bit wide data bus. — Patrick Schlüter, Nov 03 '20 at 13:14
The timeline concerning level 1 cache is also wrong. 68020/030 had already cache (thus Level1 cache), even if it was really small (256 Icache for 020 and 256*2 for 030). As or the x86 side, some 386 concurrent had cache (IBM 386SLC afaicr) it was 80486 that introduced it with 8K, not Pentium (16K). — Patrick Schlüter, Nov 03 '20 at 13:21
Pentium Pro had Level 2 cache 256K or even 512K of 2nd level cache at full CPU speed on package. Pentium II had the Level II at only half the clock speed. — Patrick Schlüter, Nov 03 '20 at 13:24
This answer uses the wrong value for Moore's Law. From the link: The doubling period is often misquoted as 18 months because of a prediction by Moore's colleague, Intel executive David House. In 1975, House noted that Moore's revised law of doubling transistor count every 2 years in turn implied that computer chip performance would roughly double every 18 months — Barrington, Sep 13 '22 at 04:48

score 24 · Answer 2 · answered Oct 28 '20 at 23:49

The 65816 was close to the bare minimum of a 16 bit processor. It was primarily used where compatibility with existing 6502 code was needed, such as with the Apple IIgs. It was also used where the designers of a new 16-bit system were already familiar with 6502. This is probably why the SNES has the 65816, given the NES had the 6502.

By the time the 32 bit era came into its own, the 8-bit 6502 codebase was very obsolete. Apple never developed a successor to the IIgs and there were few if any other consumer computers using the 65816. Given that most programming had shifted from assembly to high level languages, designers would have also felt more free to switch architectures. I suspect there was no real target market for this processor, so it was not developed.

Regarding the 68000, that is part of a much larger industry switch away from the CISC (complex instruction set computer) architectures of the 70s and 80s to RISC (reduced instruction set computers) which offered superior performance.

In my view, it's more of the exception that the PC family did not migrate to RISC as well. The need to maintain compatibility with an extraordinarily diverse set of hardware and the need for binary compatibility (with multiple operating systems) created a unique pressure to maintain the architecture. For vendors like Apple, with a closed system and control over the software stack and hardware, something like the PowerPC transition was much easier to pull off. Same for the UNIX world, where most software was in C and readily recompiled.

So once RISC chips offered better performance, many of the vendors using the 68000 started to abandon it. Motorola could probably have pursued the same CISC-to-RISC translation approach used in the Pentium Pro, but they already had their own RISC designs that were faster than any 68000, and with a shrinking 68000 market segment. They probably saw no market for a "68080".

Well, internally, the PC family x86 processors did migrate to RISC (by implementing the x86 iSA on top of a RISC architecture), and more important, laid the foundations for migrating to full out-of-order processing, although they only reached that fully after abandoning the Pentium II, III and IV architectures, switching to the Pentium Pro architecture. — chthon, Oct 29 '20 at 07:34
@chthon: Pentium II and III are P6 microarchitecture, basically PPro + MMX and + SSE, with very similar internals to PPro. Pentium IV (Netburst microarchitecture) was also fully out-of-order execution with a modern-style back-end but weird front-end (trace cache). All Intel decode-to-uops microarchitectures have been out-of-order execution. P5 Pentium had a dual-issue superscalar in-order pipeline but didn't crack complex instructions into multiple uops, so optimizing for it often meant using a RISCy subset of the x86 ISA, like avoiding memory-destination instructions. — Peter Cordes, Oct 29 '20 at 18:42

score 13 · Answer 3 · edited Jan 26 '23 at 05:58

I don't understand why western design centre made the 65816 a 16bit upgrade to the 6502 but commodore semiconductor group/MOS technology didn't make their own variant

For one, the 65816 is only a 16 bit CPU in a very restricted way. All external transfers are still 8 bit wide and address expansion is rather clumsy. The main improvement wider architectures offer is plain management of large address space. As a result the 65816's performance isn't much higher than a plain 6502. Improved performance comes mainly from increased clock speed.

& why neither company made 32bit or 64bit versions of the architecture.

32 or 64 bit do not increase much performance on their own - and not much without new software. Performance gain comes from wider buses and most importantly a larger usable address space. The 8088 is a great example. Performance wise, a 4.77 MHz 8088 doesn't deliver notably more processing punch than a 1 MHz 6502 - but the ability to address up to 1 MiB without much hassle (*1) made a huge difference.

Any 65xx extension to 32 bit would have been essentially a complete new CPU, maybe offering some emulation mode, which wouldn't carry over the many advantages of the 6502, being dead simple and cheap to produce. Chip technology had advanced by several magnitudes between the mid 1970s when the 6502 was done and mid 1980s, making more complex but also more powerful CPUs taking up that niche.

Also I don't understand why Motorola switched over to powerPC architecture instead of developing a 64bit variant of the 68000 or architecture or why they never made more powerful 32bit processors after the 68040

For one, there was of course the 68060, which was at the time comparable to a Pentium, delivering up to 3 times the throughput of an 68040.

More importantly, in the mid 1990s (the '060 came in 1994) the usage of non-x86 CPUs in (consumer) desktops was history. Amiga and Atari were gone, Sun had already switched to SPARC. There was simply no way that Motorola as a single designer could compete with close to a dozen different companies designing x86 CPUs and pushing performance limits at an unimaginable speed.

At the time (early 1990s), Motorola investing in two different and incompatible CPU lines, 68k and PowerPC, for the same market (high performance) doesn't make much business sense, so joining forces with IBM and focusing on PowerPC (*2) was the sensible way to go.

*1 - No, segment registers aren't a hassle - in fact, they are a huge performance boost, comparable to the gain ZP addressing has on the 6502. Especially not when considering what effort 8-bit machines, such as the Apple II, required to manage larger data sets.

*2 - In turn Motorola not only reduced the 68k development effort, but also scrapped their beautiful new 88k RISC line.

Comments are not for extended discussion; this conversation has been moved to chat. — Chenmunka, Nov 01 '20 at 19:00

score 9 · Answer 4 · answered Oct 29 '20 at 14:22

9

Because binary compatibility in most cases is overrated and not worth compromising a design to maintain.

The only reason to extend a chip family like that is to maintain binary compatibility. In contrast, "reinventing the wheel" with a new design empowers the designers embrace everything the field has gained over time.

If anything, Intel is the exception to that, putting a lot of work in to ensure its chips are compatible, but nobody else was in the position Intel to really need to maintain that. (That said, Intel certainly innovated the underlying processor design while still maintaining compatibility.)

Apple felt they needed that when they worked with WDC for the 65816 in order to leverage the Apple II market, but that ended up being a dead end compared to the path of the Macintosh, which is a great example of how binary compatibility isn't necessary for a successful platform.

Apple switched over to the PPC because it was a better chip in terms of power and performance than what the 68K line was, important for their laptop series, and clearly IBM/Motorola felt that they could do better (for assorted values of "better") investing in the Power architecture than sticking with the 68K family.

Similarly, they switched to Intel because PPC was not advancing more on the power/performance spectrum since PPC was being more deigned to the server market.

The IBM PC helped establish that the industry was able to easily move from one architecture to another by establishing a "CP/M like" environment that was ostensibly "source code" compatible with the legacy CP/M base. Software vendors readily adopted the PC. The early PC was, essentially, a "better" CP/M platform with better, standard hardware, a better OS (MS/PC-DOS), and more memory. This made legacy code easy to port, even at machine language.

But by them, the modern machines were powerful enough to be efficiently coded in high level languages which were more readily ported. The UNIX market demonstrated that hardware manufacturers that adopted UNIX could rapidly see vendors support their platform, regardless of the underlying architecture. 68000, PPC, PA-RISC, 88000, x86, SPARC, single processors, multi processors, etc. etc. etc. The UNIX server and workstation market was incredibly diverse, yet the overlying UNIX OS allowed vendors to quickly move their software from platform to platform.

This diversity and rapid expansion allowed companies to truly innovate at all levels, rather than be trapped with 15 year old design decisions of 5 generations of technology.

answered Oct 29 '20 at 14:22

Will Hartung

12,276
1
27
53

1

Re: binary compatibility, I also wonder whether there would have been any in practice with 65816 follow-ups to most of the 6502 machines since they're all so tightly coupled to a particular set of support hardware that usually also doesn't look to have been designed to scale. Compare and contrast with the PC where the fast influx of clones promoted not binding yourself too rigorously to specific timing or other undocumented behaviour. – Tommy Oct 29 '20 at 18:17
@Tommy the story was that Apple and the development of the '816 were closely entwined, so the goal was, notably, Apple ][ backward compatibility. But there were also C64 accelerators based on the '816, so the capability was able to be leveraged across platforms, even though the software itself not necessarily so. – Will Hartung Oct 29 '20 at 18:22
1

@WillHartung: Code which wrote to the Apple II floppy drive required absolutely precise timing, moreso than most code for other 6502 platforms. If a Disk II controller were used with a 6502 clone which processed STA abs,X without doing a read of the address formed by combining the specified high byte of the address with a computed low byte, before performing a write to the correct address, attempts to write the disk woudl store gibberish (I think the newer IWM-based controllers base their timing on when the write occurs, rather than the preceding read, but I'm not sure about that). – supercat Oct 29 '20 at 19:20
I think the Apple IIgs is almost a cautionary tale in what happens if you try to drag an 8-bit platform onwards; there’s special hardware optionally to drag the processor down to 1Mhz while the disk motor is on as per @supercat’s observation on the Disk II but the biggest issue is having all writes to video similarly throttled to 1Mhz when you’re talking about a 32kb frame buffer with no acceleration. Name another 256kb machine that feels the need to offer you the option of using large chunks of that only as a write-through cache because the other part is slow for compatibility reasons. – Tommy Oct 29 '20 at 19:29
1

@Tommy: Many games on the PC use main RAM to hold a copy of display contents, because doing a read-modify-write in main RAM followed by a write to display memory is faster than doing a read-modify-write with display memory. That's done at the hardware rather than software level, but it was common for accesses to video memory to be slower than accesses to main RAM, except on platforms where CPU accesses to all RAM were never done at more than half of the RAM's maximum speed. – supercat Oct 29 '20 at 19:41
@supercat I’m aware of that, and it follows naturally on the PC, an architecture from 1981 where the slots remained stubbornly slow for a prolonged period. The GS is a 1986 machine with video on the motherboard and a ‘G’ in the name to emphasise graphics. So appropriate comparisons are the ST and Amiga, both doing a much better job of video, earlier. – Tommy Oct 29 '20 at 20:10
... and architecturally, video is just one example; is there a good reason why the language card shows up in the memory map anywhere beyond bank $00? That’s completely orthogonal to backwards compatibility, and not useful so one assumes it’s an acceptable side effect of other decisions, not a deliberate act. – Tommy Oct 29 '20 at 20:13
@Tommy: Accesses to the first 512K of Amiga memory are often much slower than accesses elsewhere. I don't know how the ST is implemented. What matters for performance is not whether memory is on a card or on the motherboard, but rather whether the CPU is free to access it any time it wants without contention from anything else like video display refresh. I've never looked at the GS design in much detail to see where display data is fetched from in the new video modes, but if such operations are limited to low memory its' probably using a design similar to the Apple II, where... – supercat Oct 29 '20 at 20:40
...half of the potential memory bandwidth is dedicated to video refresh. – supercat Oct 29 '20 at 20:41
@supercat the card thing is relevant re: the ISA bus, the bottleneck for most 1986 PCs — in the the video isn't in a slot so I took that off the table. The ST offers around twice the bandwidth of the GS, as does the Amiga in comparable modes. In the GS the memory for all video modes, even the new one, is locked behind the Mega II which provides the IIe emulation and that gives the 1Mhz write limit. I don't actually know how the physical chips are finagled; there's a chain-4-esque bit to select linear video for the new mode, but it's undocumented what happens if you don't select that. – Tommy Oct 29 '20 at 21:19
@Tommy: The slowness of video memory compared to main RAM was an issue even on machines where the ISA bus and main CPU ran at the same speed, and relatively few 80386-era display cards could handle zero-wait-state accesses at an 8MHz bus speed. As for the GS, a system with a 16-bit memory bus, showing 320x200x16-color graphics, would need to clock out roughly two million 16-bit words per second. To accommodate zero-delay accesses from a 2.5MHz CPU, it would have to be able to handle 4.5 million transactions/second. That would have been a rather tall order in 1987. – supercat Oct 29 '20 at 21:34
@Tommy: Limiting the CPU speed to 1MHz when performing such operations would reduce the required transaction rate from 4.5MHz to 3MHz, increasing the available cycle time from 222ns to 333ns, which would allow the use of significantly cheaper memory. – supercat Oct 29 '20 at 21:37
@supercat the 65816 uses an 8-bit data bus, as does the RAM-connected bus heading out of the video controller. Otherwise, it's clear that we just disagree. The machine is substantially worse at one of the things in its name than its older competitors, and its whole architecture is made awkward by the desire for backwards compatibility — from the instruction set through the memory map and all the way to the throttled buses. I'm glad other companies didn't try the same route forwards. – Tommy Oct 29 '20 at 23:08
@Tommy: The Apple //e, when in 80-column or double-hi-res mode, uses a 16-bit bus for fetching video data; CPU cycles can act upon either the upper or lower half, but video accesses fetch both simultaneously. Video refresh gobbles up memory bandwidth, and the bandwidth requirement for the IIgs is twice that of the Apple //e. Modern display cards use dual-port VRAM chips, but those would have been prohibitively expensive in the //gs era. – supercat Oct 30 '20 at 04:49
@supercat I think I've waded us into semantics; from what I can see the functional diagram from tech note #2 is accurate — a latch holds 8-bits of internal data while 8-bits of auxiliary video data flows. Once an auxiliary byte has been received by the video circuits, the latched data is forwarded. So two banks of 8-bit RAM are accessed at once but then that's funnelled down onto a single 8-bit channel on which data just arrives rapidly. – Tommy Oct 30 '20 at 14:43
@Tommy: Memory alternates between serving the CPU and the video. When it's serving the CPU, both banks effectively behave as a single 8-bit bus. When serving video, all 16 bits are latched separately. behaving as a 16-bit bus. The data from the two banks are then fed sequentially into the video circuitry, but what's important is that a total of 24 bits of useful information are fetched using only two DRAM access cycles, which wouldn't be possible with a single 8-bit bus. My key point is that during the displayed part of each frame, the display circuitry needs to be fed... – supercat Oct 30 '20 at 15:02
...four million bytes of data per second, and thus any memory system attached to the display will need to be able to process enough access requests each second to serve up that data in addition to any accesses by the CPU. So a key question would be how many access cycles the DRAM can perform each second. If 3.06 million, that would be enough for two 16-bit video fetches and one CPU access per 1.02MHz "classic" cycle. Allowing two CPU accesses per classic cycle would require a memory bandwidth of 4.08 million accesses/second. – supercat Oct 30 '20 at 15:13
You're not saying anything that you haven't already said, and I have nothing to say that I haven't already said. Thanks for the discussion, I think there's nothing left to add. – Tommy Oct 30 '20 at 15:34
I would disagree that the Apple //gs and Macintosh are "a great example of how binary compatibility isn't necessary for a successful platform". The Macintosh won out over the Apple ][ because Apple was determined to kill the Apple ][, no matter what. For several years the Apple ][ series including the //gs outsold the Macintosh, and the best-selling software on the //gs was good old 8-bit Appleworks. In the end, the only thing that saved the Macintosh was Microsoft, which needed antitrust protection (and whose technology was always highly backward compatible). – fluffysheap Oct 30 '20 at 22:03
The Macintosh has and is succeeding even now that it is going through its 4th significant architecture change. – Will Hartung Oct 30 '20 at 22:30

John Dallman · Answer 5 · 2022-09-11T22:51:54.363

8

To add to the other answers, the 65xx family design, with just a few on-chip registers, made sense when transistors were expensive, and memory accesses were cheap. That allowed using the zero page as, essentially, a large and flexible register set. A quick look at the WDC 65C832 datasheet reveals that it sticks to that philosophy, presumably because the instruction encoding doesn't have room for addressing a lot more registers.

However, this approach stopped working as clock speeds advanced, because memory access times didn't increase as fast as clock speeds. That makes more registers, capable of being used flexibly, very valuable for saving memory accesses, and the 65xx approach of a few registers, all with specific jobs, became a major handicap.

The x86 family had more registers than the 65xx, and managed to maintain competitive performance via complex and hard-to-design cache systems. Until quite recently, Intel were the world experts at running processor fabrication plants. They've lost that position to TSMC at present, but their abilities in manufacturing were important to keeping x86 competitive.

WDC has always been a small operation, did not have the manpower to design high-end cache systems, and could not rely on Intel-level manufacturing.

edited Sep 11 '22 at 22:51

answered Oct 29 '20 at 15:25

John Dallman

13,177
3
46
58

1

The division between address and data registers made it possible to select among 16 usable registers even though many instructions only had a 3-bit register-select field. I think the 68000 does a better job of managing opcode space than the ARM Thumb, which has 13 general-purpose registers, but the use of anything past the first eight rather awkward. The only way in which I would see the division as problematic would be when trying to design an efficient calling convention that allow C functions without prototypes to be invoked in a manner compatible with those that have prototypes, and... – supercat Oct 29 '20 at 15:55
...even that could be accommodated by having compilers generate two entry points--one for an efficient prototyped function, and one for a compatibility wrapper. – supercat Oct 29 '20 at 15:57
I agree with @supercat but also find Motorola's stated reasoning kind of weird: that it's useful to have some storage dedicated to addresses because obviously you want to be able to manipulate that without bothering flags, etc. I suspect that wasn't the real original motivation. – Tommy Oct 29 '20 at 18:15
1

@Tommy: Having instructions that can perform address calculations without affecting flags is useful; x86 handles that via "LEA", and by having DEC affect Z but not carry. I also like the separation of the X and C flags on the 68000 to allow CMP to be used for loop control without interfering with carry propagation in an ADC sequence. I really don't see any downside to splitting the registers other than the problems with C argument passing, which I view as a problem with the evolution of the C language rather than the CPU design. – supercat Oct 29 '20 at 19:15
Having separate A regs meant that some could be dedicated as stack, frame, global pointers and still leave all eight general purpose data regs for the programmer. Agree it's messy, but they had to fit two addresses into a 16 bit instruction. – Hugh Fisher Oct 29 '20 at 22:56
@JeremyP: I think the biggest issue is that using 8-bit opcodes makes sense on a platform with an 8-bit bus, but doesn't really make sense on a platform with a 16-bit or 32-bit bus. The 8-bit instruction format was designed around the limited number of registers, but would make it difficult to expand the number of registers. – supercat Nov 07 '20 at 20:55
@HughFisher: IMHO, the ARM Thumb instruction set could have benefited from having many instructions use R0-R7, but having addressing modes be able to chose from among R0-R3, R8-R11, in a pattern rather like the 68000's partitioning of address and data registers. Since the ABI's argument passing convention uses R0-R3 interchangeably for passing addresses and data, four registers would have to be common to both sets, but otherwise most code would have a relatively clear partitioning between the two usages. – supercat Jan 25 '23 at 18:05

score 1 · Answer 6 · answered Jan 25 '23 at 14:29

I found this quotation of an RISC-based supercharged 6502 interesting and may clarify to some extent this question and its comments on the relation of RISC and 6502.

A RISC based implementation of the Apple II 6502 Processor: In mid ’85 I performed an analysis that showed a simple RISC style implementation of a 16‐bit binary compatible superset of the 8‐bit microprocessor used in the Apple II 6502, along with some judicious use of on‐chip caching, could substantially improve performance – to the point of potentially outperforming the 68000 used in the Mac, and given the simplicity of the 6502 the implementation was “doable” by a small team. This was a more direct approach than emulating 6502 compiled binaries by a different processor as was done some four years later in the Mobius project in the Advanced Technology Group (ATG). I set about completing a feasibility study that went through several revisions (Turbo‐I and Turbo‐II), which included a complete micro‐architecture design of the processor along with resource usage diagrams for every clock phase of every instruction. When the design seemed solid and I was ready to move on to an implementation, I sought the counsel and the support of my mentors in the IC Technology group (to whom I owe a huge debt of gratitude), Bob Bailey and Walt Peschke. As usual, when they felt it was time to impart some wisdom upon me, they said, “Pete, lets go for a walk”. As we walked around the local residential neighborhood in Cupertino they explained to me that marketing/sales/biz dev would have no idea what to do (how to position, etc) with such a thing and I would just end up with a black eye. Of course they were right and I stopped working on it. Their warnings were prescient, as four years later Jean-Louis Gassee was to shut down a similar project called Mobius in the Advanced Technology Group (ATG) where the ARM microprocessor was used to emulate another architecture. .

Source: https://www.byrdsight.com/apple-macintosh/
YCombinator:
https://news.ycombinator.com/item?id=18807490
https://news.ycombinator.com/item?id=27843189

One thing I've pondered lately as a design concept for a 6502 variation with 16-bit index registers would replace the eight addressing modes with six that didn't involve address arithmetic beyond forcing the address LSB high or low, and have the bit patterns that would identify the other addressing modes instead behave as a "set effective address" instruction. During a cycle where the processor would need to compute the address high byte, it would fetch an opcode saying what to do with the effective address. This would have made instructions using indexed addressing modes bigger, but... — supercat, Jan 25 '23 at 15:57
...allow the address MSB calculation cycle to be used to fetch something useful. When processing something like "ADC (zp),y", the CPU wouldn't need to know or care during the effective address calculation whether the instruction was an add, subtract, load, or whatever, and once the processor was fetching data at the computed effective address the processor would no longer need to care how it was computed, so even if instructions were expanded to two bytes, the CPU wouldn't have to hold two entire instructions. — supercat, Jan 25 '23 at 16:01

score -1 · Answer 7 · answered Sep 11 '22 at 22:56

-1

Some dedicated Amiga fans have produced a new-generation 68000-family processor, in FPGA form. The Apollo Core 68080 seems to be considerably faster than the 68060, and has some 64-bit instructions, although it is limited to 32-bit addressing.

answered Sep 11 '22 at 22:56

John Dallman

13,177
3
46
58

1

While that's interesting, this doesn't answer the question of why the original supplier didn't do so. – Toby Speight Jan 26 '23 at 05:54

Why were there no 32-bit versions of 65xx CPUs, or 64-bit versions of m68k CPUs?

7 Answers7