Why did instruction sets since the late 1970s seemingly stop including an "execute" instruction?

Question

Many mainframe instruction set architectures (ISAs) in the 1960s included an Execute instruction, which would treat data as an instruction.

I haven't found an architecture designed after 1976 which includes such an instruction. The Wikipedia article (which I wrote) speculates that this is because the Execute instruction interferes with various CPU optimizations like pipelining, but provides no source. That predates RISC architectures.

So:

Were there any ISA designs after 1976 that included Execute instructions?
Is there any published discussion of the design decision to not include them?

The Wikipedia article has many footnotes, but none resolve these questions.

If a processor has a "load data with post-increment" instruction, the simplest way of handling instruction fetching may be to have an instruction that performs as "load instruction register using PC with post-increment", and making it so that the instruction register has that instruction's bit pattern forced into it after executing each any other instruction. This would avoid the need to have dedicated circuitry to increment the program counter. If one wants to allow an instruction to be fetched while the previous instruction executes, however, one will need dedicated circuitry... — supercat, Aug 17 '21 at 16:52
...to handle the instruction fetching, and supporting an execute instruction would make it necessary to support two entirely different means of loading the instruction register. — supercat, Aug 17 '21 at 16:54
Isn't it a tiny bit misleading to cramp these very different instructions into one entry and try to describe them at once? Also there are a few points worded less than perfect. for example, instructions must be half word aligned on a 360 in all cases, no matter if executed in line or not. Or calling it 'emulating self modifying' when it's about assigning variable parameters, isn't it? (And BTW, if you wrote the article, then how on earth should it contain footnotes helping you in finding information?) — Raffzahn, Aug 17 '21 at 17:17
@Raffzahn What are the "very different instructions" you're referring to? — Stavros Macrakis, Aug 17 '21 at 18:42
@Raffzahn Re article footnotes -- I didn't want people wasting down reading the cited articles, since I've already read them. — Stavros Macrakis, Aug 17 '21 at 18:43
@StavrosMacrakis Footnote: Isn't that a given? Instructions: I'm talking about the vastly different execute instructions of the mentioned machines. They have different intentions and different usage - just compare with your knowledge about usage on the PDP-10 and the missing knowledge about a rather different situation on a /360. (also, please try to combine your comments. It's also possible to edit a comment to add a sentence or two. — Raffzahn, Aug 17 '21 at 20:01
While not in any sense an answer, it;'s interesting that in some architectures, trap dispatching has the nature of an execute instruction: the trap forces the execution of an instruction stored at some known address. I believe UUO dispatch in PDP-10 CPUs is like this. — dave, Aug 18 '21 at 01:32
@Raffzahn re "please try to combine your comments" -- I'm new to StackExchange and haven't figured out the conventions. In the absence of comment threading, I've tried to keep to one topic per comment. It would seem strange to me to (e.g.) start discussing the 360 EX instruction in this comment. — Stavros Macrakis, Aug 18 '21 at 19:09
@StavrosMacrakis Of course it's not about mixing comments from different location. But you have put up sequential comments here within the same minute. Same at the answer. As you already noted, the RC.SE system is not a threaded system, as it is not meant to support discussions. That's what forums are for. SE is about asking and answering. Comments are meant to hold remarks intended to improve an answer and directed to have the related question/answer edited for improvement. After all, comments can and will be deleted (by moderation) at any time. They should not hold any long term information. — Raffzahn, Aug 18 '21 at 20:43
Arguably the PDP-6/10 get XCT "for free" since it's already there in the interrupt mechanism. I don't know if similar reasoning can be applied to other computers. Presumably this style of handling interrupts was not carried forward to future architectures, so that's one less reason to have an "execute". EDIT: What @another-dave said. — Lars Brinkhoff, Aug 19 '21 at 18:21
@Raffzahn Re "But you have put up sequential comments here within the same minute." -- That is because ENTER ended the comment, and SHIFT-ENTER was ignored.
I am surprised that comments are not supposed to contain long-term information. I've often learned a great deal from comments. — Stavros Macrakis, Aug 19 '21 at 21:10
@StavrosMacrakis - StackExchange has a bible and we who find discussion useful are but sinners. — dave, Aug 19 '21 at 22:30
@StavrosMacrakis Yeah, that is something one simply has to learn. Hitting RETURN by accident means clicking edit next :) Regarding deleted comments, you may have noticed the 'Comments are not for discussion' comments, did you? well, that's usually when the comments have been deleted. And yes, they may be useful, but don't you think having relevant information worked into answers isn't even more useful? It's one of the advantages of the SE format over some discussion board that Answers (and Questions) are edited for content to improve them - some many years later. — Raffzahn, Aug 19 '21 at 22:31
@another-dave Well, I confess being a sinner at heart - striving to be better, but can't help being one. — Raffzahn, Aug 19 '21 at 22:32
Since this is retrocomputing, I should perhaps mention that a similar remark was made by Barry Mailloux to the Algol 68-R implementors, apropos of deviations from the original Report : "It is a question of morality. We have a Bible and you are sinning!" — dave, Aug 19 '21 at 22:46
TMS-9900 came out 1976 and several of its derivative much later. It had the X instruction which would execute the instruction which opcode is in the given register. Not a difficult thing for the CPU as the register were placed in memory. — Patrick Schlüter, Oct 11 '21 at 07:37

score 13 · Answer 1 · answered Oct 10 '21 at 21:27

The HP-3000 first introduced in 1972 was a 16-bit stack-based architecture that included an XEQ instruction that would treat a word on the stack (between TOS and 7 words below that as selected in the instruction) as a regular instruction and execute it.

This was utilized in some calling conventions where you needed to execute a different version of the EXIT procedure return instruction based on how the function was called. The instruction EXIT n would return and pop n words off the top of the stack in the process, and the programmer would do something like (in SPL):

     TOS := %031400 + N; << Create appropriate EXIT instruction >>
     ASSEMBLE(XEQ 0);    << Return from function >>

Patrick Schlüter · Answer 2 · 2021-10-13T07:02:59.033

9

TL;DR Too complicated for not much benefit.

You ask why not include an execute instruction. The reason is quite simple. Since around the time you observed the absence of this kind of instruction, the CPU's got more and more optimized in their memory access. Code access was separated from data access as the access pattern are quite different.

To execute a data word as an instruction would require to place that datum in front of the decoder requiring complicated bypass networks, busses and buffers, then the instruction must be inserted in the right place but, how many instruction have already entered the pipelines when it executed the EXEC instruction? The CPU would need to cancel all the instructions already partially executed, etc. i.e. something extremely costly. It is simpler to just write new instruction in memory and jumping to that address. It does not require supplemental plumbing besides what is already necessary for cache coherence.

For a TMS-9900 which already has its register in normal memory, it is extremely simple to add the X instruction, which will fetch the instruction contained in the given register (which is a simple memory fetch), for other CPU's it is more involved.

edited Oct 13 '21 at 07:02

answered Oct 12 '21 at 16:32

Patrick Schlüter

4,120
1
15
22

I'm not sure I understand why it's easier on the TMS-9900, than on, say, the 6502 or the Z80 or whatever. Of course, variable-width instructions also come into play here as well. – Omar and Lorraine Oct 13 '21 at 08:50
TMS-9900 only has 3 real CPU register. PC (program counter), WP (workspace pointer) and a status register. The WP register is an address in memory where register R0 to R15 are stored. This mean that the instruction X Rn, which executes the instruction in register n, fetches the instruction from memory at the address WP+2*n instead of PC. A Z80 or a 6502 would have to store the content of the register with the instruction somewhere in memory and use a call/jsr/jump instruction. – Patrick Schlüter Oct 13 '21 at 14:06
The simplicity in the TMS-9900 comes from the reason why it was a dead end. The registers in RAM, which makes it quite slow (see TI-99/4A for example where a simple ADD of 2 registers taking between 14 and 22 cycles). – Patrick Schlüter Oct 13 '21 at 14:06
Are you saying that an instruction fetch from memory is easier to do for hardware than move from general purpose register to an instruction register? – Omar and Lorraine Oct 13 '21 at 14:56
On the PDP-10, the registers were part of the address space and were executable. And yet, the XCT instruction was useful. ... .... Also, on the cheapest original models of the PDP-10, the registers were not part of the CPU, but fetched from core every time (!!). – Stavros Macrakis Oct 13 '21 at 17:07
@OmarL Yes, it is easier simply because it is what the CPU has to do in any case (i.e. fetch from memory). Moving from one internal register to another requires that these registers are somehow connected (via the internal bus or via specific connexion). At least the IR must be dual ported which it does not need generally. In any case, it complicates slightly the circuit for not much gain. As said, old CPU like 6502 or Z80 can easily work with self modifying code and even 8086/8088 had no problem to do that even if the prefetch queue had to be dealt with. – Patrick Schlüter Oct 14 '21 at 06:44

Raffzahn · Answer 3 · 2021-10-10T22:23:45.013

7

I haven't found an architecture designed after 1976 which includes such an instruction.

TL;DR: There is no use case for next to all later GP architectures.

The Long Story:

Because next to all general purpose (GP) architectures developed since then followed a rather simplified structure that holds all parameters that can/may be modified in registers (or alike structures) (*1). There is no need to create synthetic (modified) instructions if all parameters of an instruction can be created dynamic anyway.

When it comes to dynamic instruction adjustment, there are usually two main parameters that need to be adjusted: memory addresses and data length (*2).

Let's for example take a look at parameters of instructions of the eventually most widely used architecture with an execute instruction (where use was rather common), the IBM /360:

Memory addressing could be easy made dynamic,

as each and every memory reference was register relative (base register + offset). Using an Offset of Zero essentially meant to use 'modify' an instruction to use that registers content as address.
Data length for string operations (character or BCD) on the other hand was fix coded into the second byte of such instructions.

Changing that - for example to move a variable length input string or pack a variable length number was not possible. Sure, one could have used self modifying code, but while it was clear to everyone that this is a real bad idea (*3), it is as well simply impossible in a read only setup. So code executed from a ROM stack needed a way to vary the length field (*4).

The EX instruction was created to solve this by alowing to executing an arbitrary instruction from program memory (*5) while at the same time ORing the lower 8 bits of a register with the second byte of the to be executed instruction.

For example EXINST MVC 0(1,R4),0(r5) executed via EX R3,EXINST would essentially transfer as many bytes as R3 holds from the address pointed to by R5 to the one pointed to by R4.

*1 - In fact, one could even note that most post 1970 ISA are based on instructions so primitive that the only part that may need modification is are addresses which in turn are anyway held in registers, thus disabling all need for synthetic instructions.

*2 - A Third may be some constant (like the character in a CLI), but that's usually a rare case, as any constant can usually be replaced by a memory reference.

*3 - Already in the 1960s and way before any considerations about caching or performance at all. It adds countless pitfalls in usability, especially in stateless and/or multitasking situations, calling for synchronisation and so on.

*4 - No, noone in his right mind wants to write a multi instruction loop to handle strings - even less if the machine is able to do this in a single instruction, letting the microcode perform at maximum speed.

*5 - No not data memory. The Instruction to be executed is not data, but an instruction that has to follow all necessary rules for instructions, like aligned to half word, being in instruction memory, being in an executable section, addressing only reachable memory and so on.

edited Oct 10 '21 at 22:23

answered Aug 17 '21 at 17:17

Raffzahn

222,541
22
631
918

1

What do you mean by GP architectures, and how does that relate? – knol Aug 17 '21 at 17:53
@knol GP -> General Purpose, in contrast to special purpose architectures like DSP or controllers. – Raffzahn Aug 17 '21 at 18:04
1

In your final example of EXINST the amount of data moved would only be the value in the last byte of the register and not 'transfer as many bytes as R3 holds' which would be 32 bits ... just the low order 8 bits. – Hogstrom Aug 17 '21 at 18:16
@Hogstrom As explained right in the sentence right before before. For moving more than 256 bytes the /370 introduced MVCL - which in turn has all parameters in Registers - including a filling character (Jup, I've done my share of /360 :)) – Raffzahn Aug 17 '21 at 18:29
1

On the PDP-10 (which I'm most familiar with), XCT was rarely used to execute dynamically constructed instructions -- which were not necessary, as it had index registers. The most common use was to support what we'd now call virtual methods which were normally very short. – Stavros Macrakis Aug 17 '21 at 18:35
2

For example, the process scheduler in ITS would XCT RR(P), where P is the location of the process information block and RR is the offset of the "runnable" instruction. If the instruction skipped, the process was runnable. So a runnable process would have SKIPA (skip always) in RR(P). A process which was waiting for some counter to go down to zero would have SKIPZ . A non-runnable process would have a NOP.
(How do you enter multi-paragraph comments?)
– Stavros Macrakis Aug 17 '21 at 18:39
1

@Raffzahn I don't know why you're mentioning string operations. Are you talking about the REPEAT instruction (https://en.wikipedia.org/wiki/Repeat_instruction) found in some ISAs? – Stavros Macrakis Aug 17 '21 at 18:47
1

@Raffzahn Re "not data memory" -- Please mention what ISA you're talking about. On most architectures with Execute instructions, the instruction to be executed does not have to be in Instruction memory. – Stavros Macrakis Aug 17 '21 at 18:49
@StavrosMacrakis There are no multi paragraph comments. The example ist, as stated, about /360(ish) ISA. Here EX is common, often used instruction. As mentioned, usually for length modification of character/BCD string instructions (MVC, CLC, ... PACK, UNPK, ED, AP, ...) which all carry one or two length indicators. You might really want to look into /360, were EX was not only common, but also is a mainframe architecture quite different form a rather micro like PDP. – Raffzahn Aug 17 '21 at 19:54
3

ICL 1900 used OBEY for general parameter access. The subroutine call would be followed by a 1-word instruction for each actual-argument, which left the result in a register agreed by convention. The called routine would OBEY the relevant instruction to load/store the argument. I suppose thunk-style arguments were automatically possible. (Side note: OBEY, while authoritarian, is not as final as EXECUTE). – dave Aug 17 '21 at 20:48
@another-dave +2 for the side note :)) OBEY might not only be less final, but also way past what a EXecute instruction is. I would see it as a more complex linking than for example the x86 ENTER/LEAVE/RET n sequence. – Raffzahn Aug 17 '21 at 20:54
1

@Raffzahn - Hm? 1900 OBEY just executes the instruction that is located at the effective address of the OBEY instruction, and then execution continues after the OBEY instruction. That's just like the execute instructions in other architectures (it's just the local British computer terminology). – dave Aug 17 '21 at 22:41
@another-dave Thanks for the info. Where can I find an ICL 1900 machine language reference? and info on subroutine linkage conventions? – Stavros Macrakis Aug 18 '21 at 19:25
Try manual TP4037 on this page. PLAN is the main assembler for the 1900 series. You'll need a DjVu viewer. Not sure where I read about the calling sequence, I'll look around. – dave Aug 18 '21 at 22:26
Page with 1900 calling sequence info – dave Aug 18 '21 at 22:34
@Raffzahn The PDP-10 is certainly not a "microcomputer", nor even a "minicomputer". I certainly agree that the 360 was a hugely important ISA, but it was not the only one. – Stavros Macrakis Aug 19 '21 at 21:08
@StavrosMacrakis Please do not read anything that has not been written. I never said 'microcomputer', but 'micro like' and it doesn't take much to see the similarities. – Raffzahn Aug 19 '21 at 22:25
1

@Raffzahn, I think your "micro" comment begs for an extended explanation. I might see what you're getting at but I'm curious what your take is. Of course, this comment section is not the right place... but I'm open to breaking the rules here. – Lars Brinkhoff Aug 20 '21 at 06:05
@LarsBrinkhoff :)) To me, the way an ISA is build. Maybe the most notable micro like feature about the PDP-10 is the huge number of instruction (variants) needed to work around limitations. Think all the left and right register variants, best visible with Hsd(m(t)), but coming up elsewhere as well. Heck, ever had a look at a byte pointer? And then there is I/O. IMHO a very basic issue for something being a mainframe is about abstraction. This can be well seen when looking at the two most successful, /360 for commercial and CDC for scientific. (Half way thru and comment space is used up :)) – Raffzahn Aug 20 '21 at 14:11

Why did instruction sets since the late 1970s seemingly stop including an "execute" instruction?

3 Answers3

The Long Story: