44

On microcomputers, the famously laughed 8.3 file names (by Amiga and Mac users!) were used by PC/MS-DOS, which in turn inherited this limit from CP/M. But why did Gary Kildall use this scheme of eight plus three for CP/M?

  • Did Kildall impose the limit in accordance to a previous system?
  • Did he think that it was adequate?
  • Was it a technical limit?
Peter Mortensen
  • 260
  • 2
  • 7
Krackout
  • 1,528
  • 11
  • 20

3 Answers3

50

It was because the fairly common 6.3 scheme of the time was considered too small :-)

Seriously: 8.3 wasn't a particularly onerous restriction at the time. Many PDP-11 operating systems had a 6 character name, 3 character extension (all upper-case) limit. The TOPS-10 system, on PDP-10, similarly had 6.3 as a limit. Some PDP-8 systems had a 6.2 limit.

These are technical limits:

  • PDP-11, 16-bit words: so 2 words name, 1 word extension (in RAD50)

  • PDP-10, 36-bit words: so 1 word name, a halfword extension (in SIXBIT)

  • PDP-8, 12-bit words: so 3 words name, 1 word extension (I suppose a 6-bit code)

I list DEC systems since they are the ones I was familiar with, but other contemporary systems would have been similar.

Early UNIX on PDP-11 was a little more restricted: 8 char filenames, with no special provision for an extension (so any dot would have been part of the stored name, unlike DEC systems). See DIRECTORY (V) in the 1971 manual.

CP/M and DOS show a certain amount of DEC familiarity in their structures, in any case. Microcomputer OS implementors with experience on other systems would have applied that experience to their own design.

Going back a little further, CTSS (on IBM 7090/7094, possibly the first interactive system) named files with two "names", which you can regard as equivalent to name and extension. Each was 6 characters, or one 36-bit word.

In general, many early system designers opted for fixed-length structures. They are easier to deal with, and since storage was not plentiful, smaller fixed-length structures were preferred. Just enough to get the job done was sufficient.

So, I can't say exactly why 8.3 and not some other small size, but small was certainly adequate by most contemporary standards. A name didn't need to be written out as an English sentence; it was simply required to be reasonably mnemonic. Given, for example, that all RSX-11M drivers had names like xxDRV.SYS, it was perfectly obvious what each driver was for. We didn't even need to use the full 6 characters :-)

cjs
  • 25,592
  • 2
  • 79
  • 179
dave
  • 35,301
  • 3
  • 80
  • 160
  • I wonder if contemporary users felt that 6.3 or 8.3 file names were "reasonably mnemonic". That was not my opinion of the names of routines in the NAG Numerical Library. (They were 6 characters max.) – Rosie F Feb 18 '22 at 17:10
  • 1
    @RosieF -Well, as a user, yes I did. And ditto the 6-character synbols in MACRO-11. – dave Feb 18 '22 at 17:32
  • 1
    @RosieF Sure, it was 6 (or in my case 8) characters. That's a lot. I did several decades of assembly programming with a max label size of 8 characters. When that got lifted, I could not see any advantage of longer names - beside them being harder to memorize and distinguish between. – Raffzahn Feb 18 '22 at 17:41
  • 7
    I was horrified when VMS, after years of system services named like $DCLAST (declare asynchronous system trap, obvs) started adding names like $CLEAR_SYSTEM_EVENT. Kids today, huh? – dave Feb 18 '22 at 20:33
  • Probably worth mentioning somewhere that the restricted character choice allows packing into 6 bits each (it's fairly obvious, but...). Incidentally, that's the reason that C language identifiers can only contain English letters, digits and _, and are only guaranteed distinct if they differ somewhere in the first six characters - to the linker, each external symbol was stored in a single machine word. – Toby Speight Feb 19 '22 at 18:05
  • @TobySpeight - but which linker on which system? One machine word sounds like a 36-bit system, which was not the first target. Was the 6-char limit present in ur-C? The peculiar treatment of external variables came from Fortran common semantics, which I think I recall was also a limitation of the early target linker. – dave Feb 19 '22 at 19:08
  • 1
    Someone should write up another question for this particular tangent! – dave Feb 19 '22 at 19:09
  • Sorry I don't know the details, just repeating something I heard either here or over on [so]. – Toby Speight Feb 19 '22 at 20:27
  • Things were sometimes even smaller in earlier days. Filenames on the IBM 1130 were limited to five characters. That's why the FORTH language isn't called FOURTH: it was first implemented on the 1130. The name encoding packed five six bit characters (EBCDIC with the two MSBs stripped) into a 32 bit doubleword. The extra two bits identified the file type. There were three file types: linkable object, executable, and data. – John Doty Feb 20 '22 at 23:54
34

[Part of what is described here can be found on Herb Johnson's great site about CP/M history, the other is experience of 30+ years in mainframe procedure]

CP/M was, Dave explains, heavily influenced by DEC's TOPS-10 operating system, one Kildall was quite familiar with. Its 6.3 structure nicely fits the 36-bit nature of the PDP-10 (as well as other DEC machines).

But to get to CP/M, the story took several intervening turns.

Kildall wrote 8008 emulators and a PL/M compiler for Intel (later used in the upcoming ISIS operating system). The compiler was developed on a PDP-10 in Fortran under TOPS-10 - Intel used PDP-10 before the MDS-800 (the first 8080-based development system, running ISIS) was ready.

At that time, Kildall had already started a project to make PL/M a self-hosted system on an 8008 and later 8080. He tried to sell this to Intel, but Intel refused, as they believed then that all development would stay on minicomputer systems (notably PDP-10). Systems like the Intelec-8 were only intended for debugging and as a flexible environment for prototypes.

This changed soon and Intel opted to develop the Multibus and the MCS-800 as a stand-alone 8080-based development system - with ISIS as its designated operating system. Except Intel opted to develop in-house, not using Kildall's offer.

Being, for the most part, a down-port of the tools used on the PDP-10, it came naturally that ISIS used many of the conventions of TOPS-10, including a file system with a 6.3 name convention.

Kildall in turn acquired one of the first MDS-800 systems (running ISIS) to finish his CP/M project for generic 8080 machines. The very first known version of CP/M has a BIOS fitting the MDS-800.

While proven to work fine in many DEC installations, the 6.3 scheme had a huge shortcoming when transferring/accessing IBM files, as IBM used a basic 8-character file name for tape files and in turn also for (floppy) disk (*1). Transferring an IBM file to TOPS-10 more often than not ended up crippling the name - especially if the extension had to be spared for TOPS-10 usage. It's easy to imagine how this resulted in endless problems when exchanging more than one file at a time.

Increasing the file name size to 8.3 added legroom to reduce such issues.

Equally importantly, the CP/M catalogue entry simply had room to do so. One catalogue entry is 32 bytes, of these the first 16 are used to store the metadata, while the second store up to 16 block numbers the file uses.

3 bytes were needed for organisation, which would leave 13 for a file name. Instead of using all, two were kept as reserved (*2), resulting in the 8.3 structure we know (*3).

So in the end it's simply what was possible without spending everything.


*1 - Well, no, it's not 8, but 17 in a volume label, or 44 in a catalog entry, which are again made up of one or multiple times 1..8 characters joined by a '.' giving something we would today call sub-directories - except they weren't really ones. Confusing? Well, not really, but an explanation would need quite some room to describe each environment. In the end and for all practical means, exchange files were usually kept to 8 character names.

*2 - Quite a good idea as one of these was later (CP/M 2.x) used to allow files larger than 512 KiB.

*3 - It's maybe important to keep in mind that these were NOT 8-bit characters, but 7-bit. This fact was used in CP/M 2.x to store various flags in the high bits of the file type for flags (Read Only and Hidden) and in CP/M 3.x to add a backup bit, while the second 'reserved' byte now added an 'incomplete record' counter, which finally allowed that an exact file length could be managed - before that, files were always multiples of 128 and CTRL-Z used as EOF marker ... if possible.

Toby Speight
  • 1,611
  • 14
  • 31
Raffzahn
  • 222,541
  • 22
  • 631
  • 918
  • 2
    I agree - it was seen as desirable to increase it and the space was available in a CP/M file system record. – davidbak Feb 18 '22 at 02:32
  • 1
    If you were coming from a DEC system to a CP/M system, things felt pretty familiar. It wasn't just files with extensions. In particular, PIP (the Peripheral Interchange Program) worked nearly the same way: https://en.wikipedia.org/wiki/Peripheral_Interchange_Program. – Flydog57 Feb 18 '22 at 23:21
  • Another likely influence: Kildall used IBM CP/CMS at Naval Postgraduate School-predecessor of VM/CMS-which uses 8 character max file names with an 8 character “file type” (essentially an extension, but with a space instead of a dot). He could have decided based on that experience, that 8 character file names were better than 6; OTOH, 8 characters for the extension may have seemed overly wasteful. Also, your comment about IBM mainframe 44 character names - should note that’s true for OS that use the VTOC file system, like MVS+VSE, but not other OS that don’t, e.g. the CMS minidisk file system – Simon Kissane Feb 16 '23 at 07:34
5

If you think of memory restrictions, small file names make more sense. You would want a directory structure allocating maybe 32 bytes to each directory entry. Those bytes would include the file name, a pointer to the file allocation block, attributes, date/time, etc. As Another-Dave pointed out, the genesis of DOS was CP/M, which was preceded by even earlier systems. Those original systems were sometimes lucky to have 8 kwords (16 kbytes) of RAM total. There was no room to waste on long file names.

RichF
  • 9,006
  • 4
  • 29
  • 55
  • 5
    Also, the number of files would be smaller so fewer characters would be needed to differentiate them, the same with extensions as there were fewer file types. – Tim Locke Feb 18 '22 at 00:26
  • 1
    CP/M could run in 16 Kb. That does not explain the 8+3 limit though, as there are spare bytes both in the File Control Block and page zero (even with two FCB's), and in the directorly blocks (4 to a 128 byte sector). – Thorbjørn Ravn Andersen Feb 18 '22 at 16:28
  • 1
    @ThorbjørnRavnAndersen: The "spare" zeros appear to come from the multi-user subsystem in CP/M; under DOS, you were user 0. – Joshua Feb 18 '22 at 22:35
  • @TimLocke: That's also why microcomputer file systems tended not to have a hierarchical directory/folder structure. If you had enough files to require an “organization” system, you'd just move some onto a second floppy disk. – dan04 Feb 18 '22 at 22:55
  • @Joshua if I recall correctly the user was just a single 4 bit value. – Thorbjørn Ravn Andersen Feb 18 '22 at 23:52
  • @ThorbjørnRavnAndersen: It is, but several more more bytes were used for access control and password check bits and password checksum. DOS just stored them all as zero (bit set was deny/password check). – Joshua Feb 19 '22 at 01:56