How did the Atari 2600's 6507 handle zero page and stack with only 128 bytes of RAM?

Question

The Atari 2600 used a cut-down version of the 6502 called the 6507. The 6507 was cost reduced by not supporting interrupts and (more importantly) having fewer address lines resulting in that it only supported 8KB. While this had its limitations, it worked just like a regular 6502.

Somewhat famously, the Atari 2600 had a ridiculously small amount of RAM: 128 bytes. This was provided by the 6532 RIOT chip and was sufficient for the primitive games that the VCS was originally designed for (basically a programmable Pong/Tank machine). Most of the address space for the 6507 would ultimately be provided by ROM cartridges which would be plugged into the Atari.

The question I have is that the 6502 treats two pages of RAM in a special way:

0x0000-0x00FF: The zero page. Every RAM addressing command supported a zero page mode which not only saved an operand byte (since it only needed 8 bits of a 16-bit address) but also saved a processor cycle or two because of the simpler address decoding.
0x0100-0x01FF: The stack. Any push/pop/subroutine operations would leverage this space and the pointer was managed by one of the 6502's registers.

(I presume that these locations are hard-coded into the workings of the 6502)

I'm curious how the Atari 2600 mapped RAM considering these two special memory pages. You need a stack and I believe that would start at 0x01FF and work downwards with every push, but then you wouldn't get the benefit of the zero page. If you mapped into the zero page, the stack operations probably wouldn't work because I'd guess an address line would be tripped up. If you overlapped at the top of the ZP into the bottom of the stack page (e.g. 0x0080-0x017F), you'd have the problem of the stack register needing to be defaulted to 7F instead of FF.

I suppose you could give up zero page and run everything out of the top half of the stack page and just try to limit your stack usage to maximize available RAM which would start at 0x0180. Or was there something fancier going on in the hardware?

So how did this work?

Stack is overrated - you can easily do without ;) – tofro Apr 19 '18 at 21:24 — tofro, Apr 19 '18 at 21:24

score 17 · Accepted Answer · edited Jul 09 '18 at 11:31

17

I presume that these locations [Zero Page and Stack] are hard-coded into the workings of the 6502

Yup, they are.

So how did this work?

Simply by partial address decode. Address bit A8 was not decoded when RAM was accessed and A7 was used as chip select. This mirrored the 128 ($80) bytes two times over ZP and Stack (at $0080 and $0180) (*1).

So all 128 bytes could be used both as Stack and ZP at the same time. Of course, SP does crawl down and could overwrite other data - but in real life game application stack is only used in small and quite controlled portions.

In fact, this addressing scheme did allow several neat tricks. For example a background routine prepares screen data to be displayed in ZP. While ZP-addressing is already fast, it would still limit what can be displayed during a line. A loop using zp,X addressing would need 6 clocks per byte (4 at LDA plus 2 to increment X), but by setting SP at the bottom of the display data and popping the data the same process gets reduced to just 4 clocks (*2)

The 6507 was cost reduced by not supporting interrupts [...]

I can't help, but this calls for some nitpicking, as it's not entirely true. There's still reset. Sure, a bit hard to handle, but still possible. The VCF badge is a great example showing how to use reset as interrupt to read a serial input signal :))

*1 - Well, due the fact that the whole decoding for RAM just goes by A7=1 and A12=0, the RAM is mirrored at $80..$FF in every page of the first 4 KiB. Much like the TIA is occupying the other half of each page - here mirrored several times in every half page.

*2 - Of course, unrolling the loop would reduce it to 3 clocks per byte (fixed address LDA zp), but also double the code length. So there's always a trade-off.

edited Jul 09 '18 at 11:31

Omar and Lorraine

38,883
14
134
274

answered Apr 19 '18 at 17:42

Raffzahn

222,541
22
631
918

1

I've seen BRK documented as a soft interrupt too, but I guess that's semantics? The cost reduction is the pins after all, not the logic. – Tommy Apr 19 '18 at 18:21
1

Jup, BRK is not only a soft interrupt, but the real one. Internally the 6502 feeds a BRK into decoding whenever an interrupt is detected - just to be later on redirected accoding to which vector is to be taken. – Raffzahn Apr 19 '18 at 18:24
1

Oh, yeah, I'd forgotten about that! With the race condition that an exactly on-cue NMI will change which vector is used while processing a BRK, thereby having the same effect as if the BRK had been skipped. I'll bet there are a few protection schemes predicated on that. – Tommy Apr 19 '18 at 18:32
Using PLA instead of LDA nn,x would seem unlikely to save more than two cycles per scan line, and that only if one could use something like a zero-terminated list to zero-terminated list. – supercat Apr 20 '18 at 01:12
1

I think it's no accident that bit 1 is used in the TIA registers $1F, $1E and $1D to enable the balls and the missiles. This is the same bit as the Zero flag in the 6507 status register. So a very simple technique to handle single or double-line missiles and balls is to load up A, X and Y with the vertical position of the missiles and ball and set a zero page line counter to 0. Each kernel line can do "INC line; LDS #$1D; CMP line; PHP; CPX line; PHP; CPY line; PHP". I think I saw this in Combat but can't remember. Not super sophisticated but clearly the designers had it in mind. – George Phillips Apr 20 '18 at 08:31
@GeorgePhillips Not sure if it was on purpose, but yes, that's another one of the Stack = ZP = Registers tricks. – Raffzahn Apr 20 '18 at 09:08
@GeorgePhillips: The game Combat, which was programmed by the same Joe Decuir that invented the TIA, exploits that. The design is clever, but having to use the X register to load the stack pointer means that the actual benefit is rather limited. – supercat Apr 20 '18 at 18:46
@Tommy: I wonder if there's any particular reason the 6502 didn't put a vector at $FFF8? Using the same vector for both IRQ and BRK greatly increases the cost of IRQ handling in any code that needs to handle both. – supercat Apr 24 '18 at 15:43
@supercat It just saves the extra circuitry to generate a different vector. internally all interuprs are BRKs. Clearing the B-bit instead comes for (almost) free, as it's just the resed interupt flag input. – Raffzahn Apr 24 '18 at 19:07
@Raffzahn: The processor has to keep track of what kind of interrupt it's doing in any case. If every opcode with a certain six bits of the opcode clear was a "transfer to interrupt vector" instruction, then IRQ, NMI, and Reset could jam one of those into the instruction register, while leaving open a software-interrupt instruction. – supercat Apr 24 '18 at 19:18
@supercat Jup, and with a little more it could have been a 16 Bit processor with virtual memory, right? There is no gain in complaining about better design decisions later on. – Raffzahn Apr 24 '18 at 19:54
1

@Raffzahn: My point was that since CPU has to keep track of what is causing an interrupt in any case, and since it needs to feed two bits to the vectoring logic, having four different kinds of interrupt yield different combinations of those two bits would seem like a natural way of doing things, and I find it curious that things weren't done that way. – supercat Apr 24 '18 at 20:12

How did the Atari 2600's 6507 handle zero page and stack with only 128 bytes of RAM?

1 Answers1