In P.J. Plauger's "The Standard C Library" (1992), why are for loops used so frequently instead of while loops in the implementations?

Question

I think I'm getting into such a specific question that there may be no answer, but it seems curious to me. This is a retrocomputing question, I promise, see the last paragraph to see how.

In Plauger's 1992 book, "The Standard C Library", Plauger gives reference implementations that work but are not meant to be the world's most efficient implementation. I'm implementing my own libc for kicks so I'm using this book a lot to understand it. I am also not going to try for super efficiency, for the record, and the reason I'm using Plauger's book is because I'm interested in a simpler C library than the ones that exist today.

One thing that stands out to me is that he uses for loops quite a bit where, to me anyway, a while loop seems more natural. Here is an example: his implementation of strcpy:

char* (strcpy)(char* s1, const char* s2)
{
    char *s = s1;
for(s = s1; (*s++ = *s2++)!= '\0';)
    ;

return (s1);

}

My own implementation, made before I referenced his, was almost identical but used a while loop:

char* strcpy(char* s1, char*s2)
{
    char* su1 = s1;
    const char* su2 = s2;
while(*su1++ = *su2++);
return s1;

}

So here's the retrocomputing part: Plauger was writing in 1992. Is there a specific reason that compilers at the time did better with for loops rather than while loops, or is this just a personal preference of Plauger?

Well, your solution is presented in Harbison and Steele's third edition from 1991, so it was pretty well known. So I'd vote for personal preference. Certainly nothing in the language was an issue. — Jon Custer, Jun 22 '22 at 13:30
It’s even the approach used in the K&R book, so anyone who’d read that at the time would have known it at some point. — Stephen Kitt, Jun 22 '22 at 13:31
note that your prototype should remain char* strcpy(char* s1, const char*s2) or that will break some compilations when a constant pointer is passed as second argument, and you could skip the copy of the second argument completely then — Jean-François Fabre, Jun 22 '22 at 14:06
While-loop is a limited, special case of for-loop, that only exists for brevity/style/readability. — Brian H, Jun 22 '22 at 14:49
@BrianH Interesting. I would have thought that the for loop was built on the while loop instead — Michael Stachowsky, Jun 22 '22 at 15:16
Plauger's coding is sloppy. In the example, s1 is assigned to s twice: in the variable initialization, and in the for-loop initialization. — Leo B., Jun 22 '22 at 15:41
When running these 2 implementations through the Godbolt compiler explorer, I could see no significant difference in the output asm that would make any performance delta, at least for gcc 4.8.5 on Power architecture and gcc 4.5.4 on ARM. However, 6502 cc65 results in a bit shorter code with the while loop, do not know if it is quicker. — Glen Yates, Jun 22 '22 at 15:59
@GlenYates a fascinating question (well, probably not fascinating, but to me it is) is: which versions of the compilers listed in the book (Borland, "Project GNU", and VAX ULTRIX) was he using, and were there any differences for those versions? — Michael Stachowsky, Jun 22 '22 at 16:03
for(s = s1; (*s++ = *s2++)!= '\0';) would never pass a code review nowdays. Spend some more lines of code, make it readable. It will compile to the same thing. — Kingsley, Jun 23 '22 at 00:04
@Kingsley It annoys me to see sentiment like this. I am fairly sure it comes from a place where people don't want to learn how C for loops work, and only ever want to pattern match all code against for (int x = (…); x < (…); x++) so that they can pretend they understand what it does. — user3840170, Jun 23 '22 at 05:05
@user3840170 - For me it comes from reading thousands of lines of code every day. In these sorts of complex operator-precedence situations I have to stop, tease apart what's going on, decide if this is a source of a problem, then continue on to the next line. IMMHO code needs to be written for the least capable person who will read it, not the most. Expecting every person reading the code to be a language-lawyer (or stop and look it up) is (again IMMHO) bad programming. Extra lines of simpler code is much quicker to read & debug than less lines of more complex code. — Kingsley, Jun 23 '22 at 05:12
I am in the same situation as you -- writing a C library, and using Plauger as an important reference on fine print that would probably have escaped my attention otherwise. I have to second Leo's sentiment: The code presented in the book exhibits all kinds of questionable habits, and could IMHO have been presented in a more legible way. It is definitely not a book on clean coding practices. That being said, it could also have been much worse, and is at least correct AFAICT. — DevSolar, Jun 23 '22 at 08:01
Sidenote, your own strcpy implementation has some issues: 1) s2 should be declared const char *, not char *. 2) Since C99, both arguments should be declared restrict. 3) Declaring su2 is unnecessary, as s2 should already be const. 4) You, too, are assigning s twice. And as a personal matter of taste, I despise for / while / if not followed by {, and prefer making empty loops very much explicit, for clarity and robustness when code is added later (when a trailing semicolon might be overlooked). — DevSolar, Jun 23 '22 at 08:10
You can write any type of loop as a for loop. Thus, for many C programmers, for becomes the idiomatic way of doing it. I certainly find it much easier to standardize on for than having to think about all different kinds of loop structures/syntax. There are always multiple ways of writing the same code. Pick the one you find easiest to read and/or easiest to write without bugs. — Cody Gray - on strike, Jun 23 '22 at 08:15
This reminds me of something I read decades ago. I don't remember who wrote it, but it's one of the folks who were writing books about programming back in the days of Plauger and Steele. To the best of my memory, this is the quote: "As time goes on, computer time gets cheaper, and man-hours get more expensive." In other words, write easy-to-understand code, even if it uses more statements and instructions, because a short, complex construct meant to save a few CPU cycles will be more and more expensive to understand and debug as time goes on, while the run resources will continue to decrease. — HippoMan, Jun 23 '22 at 15:32
Way back when (early 80s, Z-80, Aztec C compiler), someone put this in one of our standard .h files: #define EVER ;;. As a result, we had a too-cute local (/in-house) standard of saying for (EVER) instead of while (true) — Flydog57, Jun 23 '22 at 16:09

score 23 · Accepted Answer · edited Jul 05 '22 at 08:07

One thing that stands out to me is that he uses for loops quite a bit where, to me anyway, a while loop seems more natural.

That part is already discussed in chapter 3.5 Loops While and For on p. 56 of the very first edition of The C Programming Language:

We have already encountered the while and for loops. In
while (expression)
    statement
the expression is evaluated. If it is non-zero, statement is executed and expression is re-evaluated. This cycle continues until expression becomes zero, at which point execution resumes after statement.

The for statement
for (expr1 ; expr2 ; expr3)
    statement
is equivalent to
expr1;
while (expr2) {
    statement
    expr3;
}
Grammatically, the three components of a for are expressions. Most commonly, expr1 and expr3 are assignments or function calls and expr2 is a relational expression.

As to your question:

So here's the retrocomputing part: Plauger was writing in 1992. Is there a specific reason that compilers at the time did better with for loops rather than while loops, or is this just a personal preference of Plauger?

Yes, it's only a matter of style as K&R notes on p. 57 of their book with:

Whether to use while or for is largely a matter of taste. For example, in
while ((c = getchar()) == ' ' || c == '\n' || c == '\t')
     ; /* skip white space characters */
there is no initialization or re-initialization, so the while seems most natural.

The for is clearly superior when there is a simple initialization and re-initialization(*1), since it keeps the loop control statements close together and visible at the top of the loop. This is most obvious in
for (i = 0; i < N; i++)
which is the C idiom for processing the first N elements of an array, the analog of the Fortran or PL/l DO loop.

So Plauger might simply be used to a preference of for loops shown by the base material and many many examples derived from it.

Take Away #1: Code writing is always a matter of style preference.

The book goes on a bit about further savings in source lines, showing a clear preference of K&R for writing tight source code, almost as readable as APL (but... see below). Of course there could be other PoV.

The discussion about the merits of either version is already there in the very first book about C, more than a decade before Plauger's writing.

Take Away #2: Reading the basics first is a great idea before switching to secondary literature :))

But there's also the but-part about some compiler shortfall: Some of these compacting line saving techniques were also hints for the compiler - or better ways to make less 'intelligent' compilers generate the best code. A line like *s1++ = *s2++ can be turned into an assignment with auto increment operations without much lookahead or reshuffling. Even better, on a CPU which sets the flags according to the last item moved, it spares an additional test for a trailing zero byte (*2).

So even the dumbest compiler, used for a CPU with auto increment and flags set by move will choose the best possible code - and that's what a PDP-11 was. It's one of the artefacts showing that while C may be seen as CPU independent, it was in all practical use constructed to best support a CPU with certain features.

This kind of compiler support became useless if not superficial the moment C was ported to different CPUs (*3) as well with more sophisticated compilers.

Take Away #3: There is a part of compiler hacking included in using such constructs. Even though it was already way outdated when Plauger wrote his book.

A different POV about Style, readability and meaning.

In the mind of K&R for is kind of a combination of COBOL and APL: the readability of COBOL by putting all items necessary for (basic) loops into a defined structure, so no need to look around for initialization, while keeping it compact like some APL code.

But when looking at it in a more fundamental way I would consider both implementations presented in the question as bad implementations and note it in any code review.

To start with, using while with anything but a test expression is using side effects. By definition (check K&R) while is a repeating loop, one that is executed ZERO or more times. No execution of whatever it contains will ever happen if the condition is not true at the beginning of each iteration. Using it as shown goes against that basic principle.

After all, the task of copying a (zero) delimited string needs to transfer at least the trailing delimiter. So the task is to transfer 1..n elements, not 0..n.

Similarly, for does not fully fit the task, at least not as used. for defines an iteration as initialization, condition and reinitialization. Again none of the three should be manipulating anything outside of loop control. Except, when taking the items apart, using a for no longer satisfies the need for a 1..n loop - at least not without a lot of temporary variables/constructs.

The grammatically correct construct to describe this is a do/until as in do/while, so it should look rather like this:

  do
     st = *s1++ = *s2++;
  while (st != '\0');

Yes, I know, unfamiliar to classic C programmers drilled to save on lines at all cost.

*1 - Note the word re-initialization got changed to increment in the second edition.

*2 - This is BTW why Plauger writes (*s++ = *s2++)!= '\0' as that forces an explicit test for a zero value transferred, independent of compiler and CPU type. One step toward real portable code.

*3 - Including x86, which does not set flags according to moves.

The explicit comparison with zero has nothing to do with portability, beyond the fact that some compilers will issue a warning if an assignment operator is the outermost node of a condition-control expression, and some programmers configure compilers to treat warnings as errors. Many compilers will squawk at if (x=y) but accept if ((x=y) != 0) without complaint. — supercat, Jun 22 '22 at 17:24
@supercat, or if ((x=y)); IIRC at least gcc accepts and even suggest using that if you really mean it when seeing the if (x=y). Then the next question is if compilers warned about that in the early 90's already? I can't remember. — ilkkachu, Jun 23 '22 at 07:34
for loops were common to almost all programming languages at the time. I'm not so sure about while -- in some languages you probably had to use explicit goto for this. — Barmar, Jun 23 '22 at 14:15
It's easier for a simple compiler optimizer to make a guess as to how to parallelize a for loop than a while loop, since at least one of the iteration dependencies is specified at the top of the loop. — hotpaw2, Jun 24 '22 at 00:45
"analog of the Fortran" - did FORTRAN re-calc the limits on loops? I thought it calced only on loop entry? — Maury Markowitz, Jun 28 '22 at 12:34
@MauryMarkowitz Well, I would as well say so. You might have to ask the Messrs. Kernighan and. Ritchie what exactly they mean. Personally I would think that the usage of 'analog' indicates that it's about the primary function of a counting loop, not any deeper similarities or tricks that can be played in either language (or not). — Raffzahn, Jun 28 '22 at 12:59

In P.J. Plauger's "The Standard C Library" (1992), why are for loops used so frequently instead of while loops in the implementations?

1 Answers1

A different POV about Style, readability and meaning.