Why is Windows using CR+LF and Unix just LF when Unix is the older system?

Question

Windows and MS-DOS use the control characters CR+LF (carriage return ASCII 13 followed by line feed ASCII 10) for new lines, while Unix uses just LF.

As far as I know CR+LF made sense for systems controlling a real teletypewriter, which has an actual carriage. LF only may make sense for teletypewriter with automatic carriage return, or just as simplification on systems which do not need the physical interpretation of these characters anymore.

Now I wonder why MS-DOS, being a rather recent OS, is using CR+LF while Unix, which was one of the OSes operated from teletypewriters, only uses LF. It seems like it should be the other way around.

cross-site duplicates: Why does Windows use CR LF?, Difference between '\n' and '\r\n' — phuclv, May 04 '18 at 06:36
Another duplicate: Why does Linux use LF as the newline character? — scai, May 04 '18 at 06:58
These are no real duplicates. I knew what are the reasons for CR/CRLF (see the question), but I wondered why the older systems have the control sequence which seems to require more device support than the newer systems which probably can assume more driver logic in the device itself. Anyway, this post got a real good answer. — allo, May 04 '18 at 22:33
I changed the title to reflect this, so it is easier to distinguish the questions when somebody has the same question. — allo, May 04 '18 at 22:35
@Tommy And compilers for classic MacOS swapped the values for '\n' and '\r', to make things more interesting. Lot's of fun reading files that were created on Windows. — gnasher729, May 06 '18 at 15:13
I haven't seen anybody mention it, but unix has a lot of settings for terminal i/o. Among them, CRMOD, which automatically converts CR to LF on input and LF to CRLF on output. Most of the time, I would guess, CRMOD (or it's modern equivalent ONLCR/ICRNL) is on because that's how VT100 terminals worked. So, the character in the file is not necessarily what is sent to the tty. — Kelvin Sherlock, Jun 17 '20 at 13:19

Stephen Kitt · Accepted Answer · 2021-02-12T09:09:07.843

116

This is covered largely in the history section of Wikipedia’s entry on newlines. Basically there are two primary lineages of operating systems leading to modern-day desktop usage: Windows on the one hand, and Unix-like systems on the other.

Windows descends from MS-DOS (because initially it was implemented on top of DOS), which itself inherits much of its behaviour from CP/M. CP/M inherited its line-endings from DEC systems, which used CR+LF because that’s the character sequence required to move the cursor to the start of the next line on ASR-33 teletypes (among others), which were common teletypes used with DEC systems. On most teletypes, CR and LF do just what their names imply: carriage return returns the carriage (carrying the paper) to the right, so the hammers or type head are above the left of the page (or equivalently, it returns the type head to the left of the page, depending on which part of the assembly is mobile), and line feed feeds the page one line up. The order was important: carriage return takes some time to execute, so starting it first meant that the line feed would happen in parallel to the carriage return, and by the time the line feed was processed, the carriage had a decent chance of having finished, so the next character could be processed safely (otherwise it ended up smeared across part of the page as the carriage finished flying back).

Unix was inspired by Multics, whose developers chose LF as the line-ending character, relying on device drivers to translate that to whatever character sequence was required on actual devices. LF is defined as

New Line. Move carriage to left edge of next line.

and its developers wrote, in relation to control characters,

The objective of typewriter device independence also has some implications for control characters. The Multics strategy here is to choose a small subset of the possible control characters, give them precise meanings, and attempt to honor those meanings on every device, by interpretation if necessary.

(I recall discussions on this topic where the idea was floated that the Multics developers did this in order to save disk space, which seems incorrect given the above. Another possible consideration is that relying on the device driver to handle this meant that each driver could adjust the timing as necessary, without the system having to care about it — CR+LF in particular was chosen partly for timing reasons. That ended up not being sufficient, and Unix stty allows users to choose one of several CR durations.)

Some systems used other conventions (see this table); many systems, including all of Apple’s computers before OS X, used a single CR, and obviously non-ASCII systems had their own line-ending characters (this includes IBM mainframes and 8-bit Atari home computers).

edited Feb 12 '21 at 09:09

answered May 03 '18 at 15:46

Stephen Kitt

121,835
17
505
462

3

Maybe I'm just a punk kid, but the only printing terminals I ever saw attached to DEC systems were LA36 DecWriters. – Solomon Slow May 03 '18 at 20:25
13

Re, "...for timing reasons." Anybody remember seeing the smear on the paper where type head on a Teletype machine struck the first character of the next line while the head still was in motion, "returning" from the previous line? – Solomon Slow May 03 '18 at 20:30
13

"Unix-style systems descend from Multics" - this is not true. Unix was inspired by Multics. Unix was actually shipped before Multics and was developed because Multics took too long to develop – slebetman May 04 '18 at 00:33
1

@jameslarge the LA-36 was introduced too late to influence the CR+LF/LF design decisions (1975). DEC routinely sold Teletype hardware, including the ASR-33, and featured them prominently in their brochures (at least until they started building their own teletypes). – Stephen Kitt May 04 '18 at 09:40
9

@slebetman - Unix is a philosophical descendant of Multics - it borrowed ideas, not code. – Michael Kohne May 04 '18 at 15:47
8

@jameslarge I remember when we replaced out ASR 33's with Decwriter LA-36's. I was really impressed at the speed of the LA-36. I remember telling a friend, "This thing is so fast, it can print FASTER THAN YOU CAN READ IT!" – Jay May 04 '18 at 21:35
1

@jameslarge I saw this, and when I did, I realized why that order was chosen. – gbarry May 05 '18 at 21:28
2

Some StackExchange people misread a Multics paper that I talked about at https://unix.stackexchange.com/questions/411811/why-does-linux-use-lf-as-the-newline-character/411830#comment738295_411830 where the paper actually explained that it was done for device independence. – JdeBP May 08 '18 at 22:26
As communication speeds increased, first to 300 baud then 1200 baud, the time taken for LF became not enough for the head to finish moving to the left. I remember one system that could be configured to insert extra NUL characters after the LF. And there was at least one model of Decwriter that had a small buffer that would hold characters until the head reached the left, then it would run at double speed until it caught up. – Mark Ransom Dec 23 '20 at 03:08
@MarkRansom I may be misremembering, but I think Multics tried to calculate the CR delay dynamically based on the carriage column. – Barmar May 11 '21 at 15:52
@Barmar that's cool! I never had the pleasure of using Multics. The system I'm remembering required you to set the number manually, nothing was calculated. – Mark Ransom May 11 '21 at 16:26
1

It is worth noting that the earliest CP/M systems used hardcopy terminals, so CRLF was the simplest solution, it avoided the need to write some LF=>CRLF translation code like you find in Unix tty drivers. CP/M was arguably just following the KISS principle, while Multics/Unix were adding a little bit of extra complexity to gain greater device independence. – Simon Kissane May 12 '23 at 01:42
I still wonder whether there would have been any difficulty with assigning codes for CR, LF, CR+LF, and carriage-delay padding, such that processing all four codes would merely require having the CR trip mechanism ignore one bit and the LF mechanism ignore another bit, and then having teletype equipment include "next line" and "carriage return" which would, in addition to sending a CR+LF or CR code, trip a mechanism that would automatically press the "send delay byte" key as soon as the first byte was sent. – supercat May 18 '23 at 17:28
@supercat, Re, "codes for...carriage-delay padding." You mean, codes sent to the printer? Sometimes we sent NUL bytes for that. Is that what you mean? Bytes that take up time during the transmission, but which have no effect on the hardware? – Solomon Slow May 18 '23 at 18:16
@SolomonSlow: When sending or reproducing a tape, depending upon the purpose for which it was punched, it may often be useful to eliminate null bytes. Bytes which are sent for purposes of delay, however, should generally be retained. If e.g. codes 4-7 had been used for the described purpose, a teleprinter could respond to any 7-bit code of the form 00001x1, where "x" is ignored, by resetting the carriage, and any code of the form 000011x by advancing paper. The code 0000100 would match neither pattern, while 0000111 would match both. – supercat May 18 '23 at 18:31

score 14 · Answer 2 · edited May 04 '18 at 15:49

At the time the PC came out, there were at least five common approaches used by ASCII-based devices and systems:

Devices receiving a CR would advance to the start of the next line, and lines were delineated with just a CR. An LF might behave identically, or might advance to the same spot on the next line, but it wouldn't usually matter because LF codes weren't used much. This approach allowed arbitrary binary graphic data to be included within files to printed.
Devices receiving an LF would advance to the start of the next line, while receipt of a CR would reset them to the start of the current line. Lines were delineated with LF; CR would generally only be used if necessary to overprint the current line. This approach allowed arbitrary binary graphic data to be included within files to printed.
Devices receiving an CR would reset to the start of the current line, and devices receiving an LF would either advance to the start of the next line or the current position on the next line. Lines were delineated with CR+LF--a mode of behavior which was inherently compatible with equipment of types #2 or #3. This approach allowed arbitrary binary graphic data to be included within files to printed.
Lines were delineated with just CR, but devices of types #2 and #3 would be accommodated by replacing any instances of CR with CR+LF. This approach would be prone to malfunction when printing files containing binary graphics data.
Lines were delineated with just LF, but devices of types #2 and #3 would be accommodated by replacing any instances of LF with CR+LF. This approach would be prone to malfunction when printing files containing binary graphics data.

Approach #4 was used by the Apple II among others; approach #5 was used by Unix. When the PC came out, however, many popular printers including the Epson MX-80, were configurable to process CR and LF using approach #1 or #3, but not #2, and they also handled bitmap graphics with a command that would take a specified number of bytes as binary pixel data that needed to sent verbatim even if it contained the bit patterns 00001101 or 00001010. The fact that printers would have problems with #2, #4, or #5 meant that the if MS-DOS wanted to be suitable for use with such printers it would need to adopt approach #1 or #3. Of those choices, #1 is slightly more efficient but #3 offers more efficient overprinting.

This impliess MSDOS was designed on certain printers - and reference for that - surely MSDOS design is following CP/M and so earlier than those printers — mmmmmm, May 04 '18 at 22:12
@Mark: Almost any common printers which could handle ASCII would work usably when files delimited with CR+LF were sent to them verbatim. At absolute worst, they'd produce double-spaced output. Sending both CR+LF wasn't necessarily the most useful format, but it was the closest thing to a universal one. BTW, Adobe PostScript uses the closest thing to a universal receiver: treat either CR or LF as a newline, except that an LF will be ignored if preceded by a non-ignored CR, and a CR will be ignored if preceded by a non-ignored LF. — supercat, May 04 '18 at 22:52
... and Unix ended up using pretty much only PS to talk to printers (as the application-level printing language). That’s not a factor here of course since PS came later. The interesting angle in the DOS/Unix comparison wrt printers is that Unix was specifically built to support a typesetting system... — Stephen Kitt, May 05 '18 at 13:16
@StephenKitt: In Unix, routing all printer output through a printing utility program made sense. The printing program could run concurrently with other programs, and could ensure that different user's print jobs didn't collide with each other. It does, however, limit the range of usable printer features to those the program knows about. When MS-DOS came out, the personal printer market was very much in a state of flux, and it the only cheap and practical way for MS-DOS to treat the printer was as a pipe to which bytes are sent. — supercat, May 05 '18 at 16:59
@supercat I agree, the small printer market was too much in a state of flux; I just thought the comparison was amusing. It’s also interesting that one of the first TSRs for DOS was a print spooler, PRINT. — Stephen Kitt, May 05 '18 at 17:19
@StephenKitt: That is indeed interesting. It might have been possible for Microsoft to publish a simple spec for files containing graphic data that must be sent verbatim, and might also be able to allow a limited form of compression, thus allowing NL->CR+LF translation to be performed safely without corrupting 0x0A bytes that happen to occur within graphics data, but they never published such a format. — supercat, May 05 '18 at 17:43

score 4 · Answer 3 · answered Nov 08 '22 at 19:13

4

While other answer do quite good emphase on the issues of real hardware with how CR/LF or LF/CR, it seems the main point is missing here:

While CR/LF is related to hardware operation, Unix' LF is a logical line ending. Much like EBCDIC's NL (x'15') on mainframes. Its only function is to inform the hardware driver about a line ending, leaving exact handling in whatever way to the driver, all in favour for the layered model Unix is based on.

answered Nov 08 '22 at 19:13

Raffzahn

222,541
22
631
918

A key difference between DOS and Unix with respect to layering is that software which is communicating with a printer *that is shared among many people who may have no idea who else is using it" must have a much deeper understanding of printer state than one which is communicating with a printer that is used by one person at a time, and can be manually "managed" (e.g. advancing to start of next page) when switching jobs A program which acts as a "dumb pipe" will be simpler, but capable of doing a much wider range of things, than one which needs to "understand" the byte stream it's... – supercat Nov 08 '22 at 19:46
...dealing with, but if one were to run such a program on two different print jobs, without manually adjusting the perforation alignment between them, the second job would likely end up mis-paginated. – supercat Nov 08 '22 at 19:47
1

That’s pretty much what I meant by “chose LF as the line-ending character, relying on device drivers to translate that to whatever character sequence was required on actual devices”, so I don’t think that point was missing, was it? – Stephen Kitt Nov 08 '22 at 22:31
@StephenKitt No, not missing, but maybe presented under value. The answer seems to focus quite on CR vs. LF in relation to hardware behaviour, which is not wrong, but misses IMHO the basic point about LF as not being a hardware control but a logical line ending - it just happened that they used the LF character. Using a logical line end is an active, intentional design decision to abstract behaviour from implementation, while 'leaving it to a driver' sounds to me more like lazyness waiting for an afterwards hack to make it work. I guess it just hit a point I'm sensitive. – Raffzahn Nov 08 '22 at 23:25

Polluks · Answer 4 · 2023-05-20T20:33:41.613

2

At the beginning QDOS aka MS-DOS was trying to keep some CP/M compatibility (File Control Block, 8.3 names, ^Z handling, etc.), so CRLF was used.

edited May 20 '23 at 20:33

answered Sep 08 '20 at 22:18

Polluks

465
3
7

1

QDOS was not just trying, it was a clone of CP/M-80 to get real work done on the new machines while DR was dragging their feet getting CP/M-86 out of the door. – Thorbjørn Ravn Andersen May 18 '23 at 16:56

Why is Windows using CR+LF and Unix just LF when Unix is the older system?

4 Answers4