20

During the time when programming was done in machine codes, there was no need for object modules; it just wouldn't make sense. People would write the whole program they needed to run in "zeroes and ones", put it on a computer medium, and introduce it into the computer.

It is natural to assume that when assembly languages started to appear, their initial functionality was only to provide the ability to write machine code using opcode mnemonics and symbolic addresses to facilitate the familiar development process.

The notion of separate compilation which would require linking, allow libraries of pre-compiled modules, and necessitate a binary format combining relocatable code and a table of external symbols would likely come somewhat later, but, one would think, sometime between the first assembler (SOAP for IBM 650, 1955) and the prominently mentioned IBM/360 linkage editor (1964).

Unfortunately, none of the referred wiki pages focuses on that aspect of the history. When was the first relocatable object file format invented, then, and by what company, if not IBM?

Leo B.
  • 19,082
  • 5
  • 49
  • 141
  • 4
    AFAIK, Grace Hopper invented the linking loader in 1951 for the Univac. Though the idea was very much entwined with the idea of a compiler, using a more natural way to express programs. And the necessity of relocation depends a lot on the underlying CPU: An ISA featuring relative addressing doesn't need it, and neither does an ISA geared at High Level Languages like the Burroughs B5000 (1961). – dirkt Mar 31 '17 at 04:52
  • @dirkt Thank you, but I think that's giving her too much credit. A program was specified as a sequence of subroutines and arguments looks more like threaded code to me. I don't see why the article compares a converter of that code to the machine code with a loader or a linker. No mention of object files either.

    "Relocatable" doesn't mean "needing relocation information", it means "capable of being relocated". If it comes naturally with the ISA, all the better.

    – Leo B. Mar 31 '17 at 06:39
  • 4
    Even if you enter your program into the computer directly as binary, there would still be a notion of "loaders" and "modules". Entering a program via console switches is time-consuming, which means you will want to use a "loader" to load/save sections of memory to cards, tapes, or disks. Once you have a loader, you will quickly identify common routines that you will want to load separately from your program (e.g. library code), which introduces the idea of "modules". It then becomes natural to want your libraries to be relocatable (which requires a more sophisticated "linking loader"). – Ken Gober Mar 31 '17 at 13:20

1 Answers1

13

Grace Hopper invented a kind of linking loader in 1951 for the Univac, as part of the A-0 "compiler" (not a compiler like we understand it today).

One must keep in mind that memory was extremely limited on those very early computers, so what today would be an object module with many routines was equivalent to handling a single subroutine at that time. Also, the result of a linked program couldn't be stored in memory (again because there was not enough of it), so it would be stored on tape in a first pass, and then this tape would be executed in a second pass.

In the paper Automatic Coding for Digital Computers, Grace Hopper describes it this way:

Compiling routines [i.e., a linker/loader in modern terms], which also read a pseudo-code, but which withdraw subroutines from a library and operate upon them, finally linking the pieces together to deliver, as output, a complete specific program for future running: [For example] UNIVAC A-compilers, BIOR, and the NYU Compiler System.

So just like a relocatable object module loader would operate on an executable and read in a pre-compiled modules from a library, resolve relocations and load the whole thing into memory, Grace Hopper's compiling routines would take a pre-compiled subroutine in a specific format, make changes to it (e.g. substitute fixed parameters for formal arguments if apropriate, and probably also "call/return" jumps similar to the relocation), and then "compile" (arrange) the modified subroutines together with the main program.

A similar idea seems to have appeared even a bit earlier, in David Wheeler's Programme Organization and Initial Orders for the EDSAC (1950), but I can't find an accessible version of that paper.

Edit

I wouldn't say that the development of the idea of a linker preceeded symbolic coding. There's a fascinating collection of sessions presented at the MIT Summer Session 1954 called Advanced Coding Techniques. Among other things it contains a bit more details about the A-2 "linker" (p.30), and various forms of "Automatic Coding Systems" (p. 55) like the IBM 701 Speedcoding System (a sort of interpreted, more complex ISA) by John Backus (p. 41).

But earlier on in the collection, it becomes quite clear that various forms of symbolic coding were already in widespread use, and practice was far removed from "zeroes and ones".

Page 7 (Whirlwind I was designed 1947):

  • Mnemonic coding: it is convenient for humans to use suggestive letter pairs or triples to denote the operation sections of instructions, or to identify subroutines. Machines usually ultimately require a digital representation. This is a simple change of type 1b. Whirlwind I has always performed this change on the operation sections of instructions.

  • Identification of words: the control unit of a machine can operate on words only if it is given the absolute addrssses. Humans find it more convenient when coding to use relative or symbolic addresses. The necessary change is of type 2 and can be performed by the machine, usually before execution of the routine.

  • Representation of numbers: these may originate as, say, .023, 47.10^-5, or 7/45. The conversion to the standard machine form can be performed by the machine, though the routine required to do this is often lengthy. Type 1b.

And page 21:

Stanley Gill described a technique used on the Illiac which makes it unnecessary to list constants separately and refer to them in the program; instead, the constants may be written directly in the instruction which use them; thus, instead of writing

ccf al
al, +123

one might write simply

ccf +123

And:

Donn Combelic gave, as an example of mnemonic coding, the way in which input and output editing is requested in the M.I.T. Comprehensive System. Three letters specify respectively the medium, whether input or output, and the notational form; this may be followed by a sample number.

There's also a webpage which claims that the "Initial Orders" system I mentioned above should be called the first assembler. That would have been 1951, way before SOAP.

Edit:

And here is Grace Hopper talking about how subroutines were "invented" by borrowing notebooks during her work on the Aiken Mark 1 in World War 2, and she already hints at the necessity of "relocating" them ("when you integrate a piece of code into another program, you frequently have to add addresses, and programmers can't add").

dirkt
  • 27,321
  • 3
  • 72
  • 113
  • The development of the idea was very gradual, then, and effectively pre-dates symbolic coding. That's interesting. – Leo B. Mar 31 '17 at 22:13
  • If "Initial Orders" is the first assembler, so close in time to the first subroutine library, then a theoretical question still remains: is it symbolic coding that brings about the idea of modularity, or an implementation of a linker using numerical references to subroutines provides groundwork for symbolic coding? – Leo B. Apr 01 '17 at 22:18
  • 4
    I really suggest you have a look at the MIT Summer Session Paper. The impression I get from it is that a lot of ideas were floating around and developing gradually, there was never really a phase where people actually coded in "zero and ones" only, but always used some kind of (human-made) translation from a more human-friendly representation, and the question really was "how can we use the computer itself to make that easier?". – dirkt Apr 02 '17 at 06:26
  • I would think that in many cases instead of using a linker as such, it might be more useful to load each module at a fixed region of address space that could be used to hold variables once a program starts execution, and then have each module be a piece of code that would copy itself to a particular address while applying any necessary "fix-ups". Do you know if that was ever done? – supercat Feb 28 '20 at 20:00
  • 1
    @supercat: I really don't see why this would be useful, given a computer with very limited memory. In this case you want to remove any additional "housekeeping" code. Having a single linking-loader makes removing this code after the work is done easy. Having each module have copying code inside the modules makes that hard. So I wouldn't expect anyone to actually have used this technique. – dirkt Mar 03 '20 at 04:24
  • @dirkt: Most programs need to use a certain amount of RAM for data once they start up, which would imply that size of the code would be smaller than the total amount of RAM the program would need. If e.g. one would have a program with 30K of code that would need 16K of RAM available after loading, up to 16K of fixups could be included with the program and jettisoned after relocation is finished. – supercat Mar 03 '20 at 05:14
  • 1
    @supercat: We are talking on the order of 1K-4K words for those early computers, and tape storage for any intermediate results, not 16K. If you think it's a useful technique, it would certainly be interesting to see a proof of concept code (and you'll also feel the pain if you try to actually implement it and use it for a program that does something useful). And I still think it won't fly. – dirkt Mar 03 '20 at 05:56
  • @dirkt: For whatever size, if the program is supposed to have any space left over for data once it starts, the same principle would apply. I'm not sure what sort of platform would be best to test such a concept. Maybe 6502/ – supercat Mar 03 '20 at 06:32
  • @dirkt: Upon further consideration, 6502 would be complicated by the fact that most patch-points would involve single bytes. I would think given something like a Z80, tape reader, and tape writer, it would be practical to design a build system that would allow a program for a language like Pascal to be built with a single tape pass, starting by loading the compiler/loader, writing the loader to the destination tape, then reading in a file containing all exports, and then reading all the source files. At the end, one would have a tape that was ready to load. If one had more tape drives... – supercat Mar 03 '20 at 16:18
  • ...and were willing to do more passes with the tape, one could handle slightly larger programs, but that would only be helpful if the programs were large relative to the amount of RAM they would need for zero-initialized or uninitialized data. Even if some programs would require using more passes, or it would be worth running more passes for programs that would be used frequently, I would think computer utilization could be maximized by using a single-pass approach for the jobs where it would work. – supercat Mar 03 '20 at 16:21
  • @dirkt: To minimize storage requirements while building, every time the compiler would write code that would require an external or forward reference, it would write the number of bytes since the last unavailable reference for the same segment and add an entry to its fixup-list. Then after writing out all the code (which the loader would be instructed to place at the lowest unused address), it would write out for each segment the relative offsets that needed to be filled into earlier instructions accessing that segment (this blob would load downward from the highest address). – supercat Mar 03 '20 at 17:12