pdp11 grace stack, when does grace time end

Question

some models (45 for example) have a stack grace area for use after a stack overflow Stack Limit Violations. Quote from handbook

When instructions cause a stack address to exceed (go lower than) a limit set by the programmable Stack Limit Register, a Stack Violation occurs. There is a Yellow Zone (grace area) of 16 words below the Stack Limit which provides a warning to the program so that corrective steps can be taken. Operations that cause a Yellow Zone Violation are com· pleted, then a bus error trap is effected. The error trap, which itself uses the stack, executes without causing an additional violation, unless the stack has entered the Red Zone.

So this says that at some point its OK for code to use those addresses (346-400), ie after an initial < 400 overflow. But at what point does it become illegal again? Ie

hit 376 -> trap 4
now in yellow zone
stack
stack
stackety
stack
all in YZ so no more trap 4
.... something happens 
la
la
la
...
r6 < 400 ->trap 4 again

what is the 'something happens', RTS? priority level shift down?

EDIT: Answer from simh mailing list

The grace space is just to make sure that everything can be pushed onto the stack (PC,PS, FP regs,...). The code that runs the trap is on its own, no stack , or has to find somewhere else to save things.

My interpretation is that the CPU completes the current instruction and then throws the trap. So it can't trap partway through an instruction. — user253751, Jul 09 '20 at 18:05
my read was that the 16 words can be used by the stack overflow trap handler to do its work to try to recover from the stack overflow. — pm100, Jul 09 '20 at 18:59
I see you're on the simh list; you might ask there. Supnik will probably know, or can consult the ucode. — dave, Jul 10 '20 at 13:41

dave · Accepted Answer · 2020-07-10T02:36:22.727

I have difficulty imagining what could be done to rectify the program such that it can be resumed, except simply killing the offending thread so the program (operating system) as a whole can continue.

The stack boundary is a kernel-mode mechanism. Its intent, I believe, is to protect the interrupt vectors from corruption. Vector corruption is very bad; there's a wild jump to somewhere that's probably going to happen at some point in the future.

As for recovery: this is the kernel. It probably hasn't any mechanism to abort a "thread" of execution and it probably has only a single kernel stack anyway. The systems I am used to had non-reentrant kernels (rescheduling took place only on exit from kernel mode) so one k-stack was all you needed.

You could conceivably forcibly empty the stack (reload SP with the stack bottom) and then exit (to user mode or the null loop), but you basically aborted kernel processing at some random point, so who knows what state the world is in. It's no more recoverable than most other trap 4s in k-mode.

I therefore suppose that the only way to recover from stack overflow is to completely reinitialize the kernel. Maybe you disable interrupts, reset the stack, and reload the core image from disk.

Remember that process control was a considerable part of the PDP-11 target base. If your system is so borked that it just got a stack violation, maybe the best way to avoid disaster is to restart ASAP. It's a lot cleaner than random jumps through corrupted interrupt vectors.

The specific question of when "it's ok to use the yellow zone" ends is a good one. I have no authoritative answer. I suspect it might be a consequence of reloading SP. But that's very hand-wavy.

P.S. You figured the yellow zone as 346-400. I make it 340 to 400. It's 16 words, or 32 bytes, or 40 in god's own radix.

I have an hypothesis, completely untested. Here it is:

The yellow zone is a spacewise construction. Note that the description says you only get a trap by a reference of the form -(SP) or @-(SP).

Therefore (I guess), you get a "yellow trap" on an instruction that actually crosses the limit; for a conventional push, like MOV R0,-(SP), it would be the transition from 400 to 376; for something like the useless MOV -(SP),-(SP) it would be a transition from 400 to 374. The cue is the before-value equaling the limit.

Once the SP is less than 400, it's ok to reference through it until it goes below 340, at which point you get the "red trap".

According to this hypothesis, if you get a yellow trap on MOV R0,-(SP), and the trap service routine immediately executes RTI, then you're still in the yellow zone.

An interesting experiment might be to transport yourself into the yellow zone without passing through the limit: MOV #370,SP; CLR -(SP). Trap or no trap?

Erik Eidt · Answer 2 · 2020-07-09T19:20:44.743

2

Operations that cause a Yellow Zone Violation are com· pleted, then a bus error trap is effected.

When it says "Operations" it means the one instruction, and the plural there means it could be any one of many instructions that increase stack space.

2.3.3 Stack Register (with Memory Management option)
All PDP·ll's have a Stack Overflow Boundary at location 400,. The Ker· nel Stack Boundary, in the PDP-11/40 is a variable boundary set through the Stack Limit Register found at location 777774. Once the Kernel stack exceeds its boundary, the Processor will complete the current instruction and then trap to location 4 (Yellow or Warning Stack Violation). If, for some reason, the program persists beyond the 16-word limit, the processor will abort the offending instruction, set the stack point (R6) to 4 and trap to location 4 (Red or Fatal Stack Viola· tion).

Stack Overflow Trap-Stack.
Overflow Trap is a processor trap through the vector at address 4. It is caused by referencing addresses below 400, through , the processor stack pointer R6 (SP) in autodecrement or autodecrement deferred addressing. The instruction causing the overflow is completed before the trap is made.

There isn't a lot said but the trap handler is expected to rectify the situation. I believe that if it doesn't rectify the situation, and resumes the program, the program may get more traps.

I have difficulty imagining what could be done to rectify the program such that it can be resumed, except simply killing the offending thread so the program (operating system) as a whole can continue.

edited Jul 09 '20 at 19:20

answered Jul 09 '20 at 19:13

Erik Eidt

3,357
1
14
21

Some older systems have a region of address space that is shared between a heap that grows up from the bottom and a stack that grows down from the top. If a system has a hardware stack range trap, it may be useful to have it start with the stack limit set rather high in memory, and have traps check how much space remains between the heap high-water mark and the pointer, and expand the space available to the stack (reducing the space available for the heap) if so. An implementation that does that could then resume execution. – supercat Jul 09 '20 at 20:32
@supercat, thanks. I take it from the earlier PDP-11's where the stack limit is hard coded to 0400 that the stack is expected to be in low mem (not sure why they chose that but they're probably thinking of a fairly small stack). So, to use what you're saying you'd (re)locate the stack from low mem to high mem, and set the limit register appropriately high as well, then lower the stack limit each time the stack hits it, whenever possible. – Erik Eidt Jul 09 '20 at 21:59
If pushed items are stored at decreasing addresses, that would be the design. The classic Macintosh had stack and heap grow from opposite ends, and would check some top-of-heap "canaries" whenever it grew the heap, as well as in a 60Hz periodic interrupt, but couldn't reliably detect stack/heap collisions that occurred while processing any other interrupts in time to prevent code from accessing corrupted objects on the heap. A stack-bottom register such as you describe for the PDP-11 could have made things more robust. – supercat Jul 09 '20 at 22:05
1

Note that the stack boundary is only applicable in k-mode or when the MMU is off. Its purpose is to protect (some of) the interrupt vectors . It's not that the stack is "expected" to be in low mem, it's that that is where the valuables live. Overwrite vectors => wild jumps. I'm not sure why the hardwired limit is 400 and not 1000, since vectors are 0-777. Maybe that dates from older systems. Stack expansion was never a thing in any 11 OS I used; there's never a spare 4K page in your 32K (words) address space just hanging around in case you need more stack. – dave Jul 09 '20 at 22:59
For stack expansion, I think you would not use the stack boundary. Instead you'd have an 'expansion downward' page and an initial small size. On an MMU length violation you could then extend the stack (maybe physically relocating the page if necessary). This does not allow "heap at one end, stack at the other", but we're talking about a 16-bit address space here. – dave Jul 09 '20 at 23:17
@another-dave: The "heap at one end; stack at the other" is most useful on systems with a small address space. Using a guard page to trap stack overflow would require wasting an entire page of an already small address space. If there were a configurable function to trap when the stack pointer gets too low, one could allow the heap to reach all the way up to the low-water mark of the stack or allow the stack to reach all the way to the high-water mark of the heap. – supercat Jul 14 '20 at 16:47

pdp11 grace stack, when does grace time end

2 Answers2