How do accelerators and CPU cards work on the Apple II?

Question

An Amiga 1200 exposes the entire CPU bus on the expansion port, so that an accelerator only needs to assert BR which causes the onboard CPU to stop all computation and electrically disconnect from the bus entirely, and then wait for BG which should mean that the CPU on the accelerator is ready to consider itself the Master Of The Bus.

Or something.

And in the case of the Commodore 64, the VIC-II asserts the BA so that it can completely take over the bus.

The thing is, I don't see anything like BA, BR or BG on the Apple II slot. So what's unclear to me is, when a CPU is added to the Apple II, how it takes over the control. I suppose it would be problematic if the 6502 was still running. Take the Z-80 Softcard. It's got a CPU obviously, and a handful of TTL for glue logic. But it does not have any RAM. It just uses the same RAM which is already installed in the computer. This means that if some application is running on the Z80, its state might get clobbered by whatever the 6502 is doing.

TL;DR how do CPU cards work on the Apple II if there's no way to take the bus over?

I can see that PHI1 and PHI2 are available on the slot, but these are presumably outputs, not inputs. — Omar and Lorraine, Oct 29 '18 at 10:24
Isn't that exact what the DMA signals are for, as in https://retrocomputing.stackexchange.com/questions/5633/apple-ii-bus-irq-and-dma-priority ? Asserting DMA is supposed to halt and isolate the CPU such that the card now gets ownership of RAM. — Tommy, Oct 29 '18 at 11:53

Raffzahn · Accepted Answer · 2018-10-29T16:56:05.983

24

how do CPU cards work on the Apple II if there's no way to take the bus over?

That's what /DMA (pin 22) is good for. It halts the CPU and tristates the bus. Now any card can take over.

Unlike its daddy, the 6800 (and many other CPUs as well), the 6502 can be halted in at any clock state by pulling /RDY. It will extend the actual cycle (*1). This doesn't tristate the bus or anything else, and is primarily meant to allow singlestepping and support slow memory, but can do DMA as well. Not very sophisticated, but it does the job.

On the Apple II Woz added a bunch of buffers to tristate the address bus when needed. When DMA gets pulled, the CPU will get its /RDY pulled and the buffers will tristate the address/data lines.`

Video update and RAM refresh will continue as before - The Card May Only Access the Bus During Phi0, even with DMA pulled continuous, as video and refresh logic will access the RAM during Phi1 (*2)

But there are Apples fited with a NMOS 6502 (*3). Due its dynamic nature (*4) it needs to do some workout to keep the register content intact. In general, one successful cycle every 10 cycles will do it (*5, 6). Each card needs a appropriate solution.

This can be some counter halting the card every 4th, 8th or 16th cycle and hand one cycle to the 6502, or maybe there are times where the card's CPU doesn't need the bus, which may be always handed over to the 6502 (*7)

The Z80 card is doing the later, as the CPU needs an internal cycle after each opcode fetch (signaled by M1), where no bus activity happens. relinquishing that to the 6502 makes it happen often enough (*8) without throttling the Z80 at all.

Apple has published a detailed information about DMA handling in their Technical Note Nr.2 for the Apple IIe. While there are some differences with the Apple II, and even more with the IIgs, it does give a good base to understand the workings without traceing schematics.

*1 - Well, it shouldn't be pulled within a write cycle, as the CPU will not repeat such. Then again, when pulled during Phi1, it will always stop.

*2 - So no, there is no chance to access the RAM at 2 MHz from an I/O card. Would have been nice, wouldn't it?

*3 - so if the card needs only to work with a CMOS like in an enhanced IIe, it will be possible to use all cycles.

*4 - The registers are basically dynamic memory and need refresh.

*5 - It's said that a MOS/Rockwell chip can do 10-17 cycles without, while a Synertec is guaranteed to do 40.

*6 - yes, this means the 6502 is active running at maybe 5-10% speed. So here's your chance for real dual CPU action.

*7 - If these holes are common, the 6502 may get even more usable CPU time.

*8 - Well, going by the books there are some instructions (most notably all modifying with index register + displacement addressing like INC or RES) which need 23 cycles. Since the Z80 runs at double the 6502 clock (*9), this ends up being 12 cycles, thus still on the good side.

*9 - Well, it's way more complicated than that, but also a great example of clever hardware design. The Z80 runs from the Apples 7 MHz clock, divided by two but only during Phi1 which will result in a full and a half cycle in the first 6/7th of Phi1 plus the other half of the second stretching over the rest of Phi1and all of Phi0. Effective clock speed is 2.04 MHz, with a non symetric clock (*10). Some instructions are extended by one of the 'short' cycles to synchronize for memory access, making it a little less than 2 MHz in average. Not much.

*10 - Needing a 3.5 MHz Z80 to do so.

edited Oct 29 '18 at 16:56

answered Oct 29 '18 at 11:52

Raffzahn

222,541
22
631
918

1

Ah, DMA is maybe a misnomer then. Does the CPU card need to implement some DRAM refresh while DMA is asserted? – Omar and Lorraine Oct 29 '18 at 11:57
3

It's not technically a misnomer; you want direct access to memory, you get direct access to memory. Though I guess that the fact of the 6502 being disabled is only an implication? – Tommy Oct 29 '18 at 12:13
So you're limited to accessing the system's RAM at around 1 MHz? Perhaps, some cards had their own memory onboard to access it at a faster speed, like some Amiga accelerators do. That would have been right dandy for the Microsoft Z80 one if there was enough space/money. – Omar and Lorraine Oct 29 '18 at 12:51
1

Access has always to be synchronized with the rest of the system - nothing different on the Apple. And yes, such cards where available. Just way more expensive than basic cards. While in the early 80s a Z80 clone card could be bought at less than 50 USD, one with it's own RAM was rarely available as clone and called at 400+ USD. Similar 6809 cards. The great part about the Apple was that you could geet next to any CPU on a rather cheap card, so much variety - and one who realy needed speed would rather by a native Z80 (etc.) computer. – Raffzahn Oct 29 '18 at 13:13
@wilson - note that a z80's memory access is slower than its clock rate - it issues memory instructions in one cycle and expects them to be finished by the end of the next cycle - the memory gets 1.5 cycles to respond in M1 cycles or 2 cycles to respond in execution cycles – Jules Oct 29 '18 at 14:19
1

The 6502 cannot be halted in all clock states. If /RDY is received coincident after the data-read portion of a read-modify-write instruction or the low-address read of a JSR, it will not take effect until the third following cycle (the next instruction fetch, or high-address read). If received during the second cycle of a BRK or interrupt-handling event, it will not take effect until the fourth following cycle (the low-address read). – supercat Oct 29 '18 at 16:08
@supercat Not entirely true, as what you describe is not tied to these instruction, but pending write cycles. And is only an issue when RDY is pulled during such cycle. – Raffzahn Oct 29 '18 at 16:55
1

@Raffzahn: The 6502 has a relatively small number of situations where it performs write cycles. If interrupts are disabled, the worst-case delay on responding to RDY will depend upon what kind of instructions might be performed; a device that will need the bus at a certain time may assert RDY 1, 2, or 3 cycles before that. Looking at the data sheet, it appears RDY is sampled on the rising edge of phi2. I suppose it's not clear whether a transition on RDY that occurs in response to the rising edge of phi2 would simply be delayed a cycle versus... – supercat Oct 29 '18 at 17:34
1

...throwing execution off the rails entirely, but I think it is clear that the 6502 will be committed to performing an unimpeded sequence of writes a half cycle before it actually starts the first write. – supercat Oct 29 '18 at 17:37
The Acorn Electron stops its 6502 entirely to display a scan line in some screen modes. It's said that is why it uses the Synertek chip, although there's pictures of the motherboard about with the Rockwell (NMOS) 6502 and BBC Micros seemed to use SY6502 anyway (with R6502 replacements suggested if having problems with dodgy memory expansions). – Tom Hawtin - tackline Jun 10 '19 at 17:53
@TomHawtin-tackline 64us would be way beyond the 10us the NMOS CPU is specified to work without being clocked. – Raffzahn Jun 10 '19 at 19:52
@Raffzahn Seems to be 40us. http://archive.6502.org/datasheets/synertek_sy6500_microprocessors_apr_1979.pdf That's the time necessary to display 640px at 16 MHz, so presumably just 0.5-1.0us out of spec (the time for the actual access). – Tom Hawtin - tackline Jun 10 '19 at 21:25
Standard NMOS (MOS, Rockwell) guarantees 10 us. So 64 for a line (or some 40+ for a portion of a line) is quite outside, isn't it? – Raffzahn Jun 10 '19 at 21:41
@Raffzahn: A lot of 6502-based designs push things substantially beyond what's specified. I would expect that if a part whose dynamic latches are engineered to handle 10us stall at 70C would probably work reliably if stalled for 40us stall at a lower temperature like 40C. – supercat Dec 08 '20 at 18:18

How do accelerators and CPU cards work on the Apple II?

1 Answers1

Linked