29

I'm asking specifically about C, not about other contemporary languages, but if the reason is "that's how B did it" or something please assume I'm talking about "in the lineage of C". I'm also aware that many if not all languages also adhere to the rule that only a single thing can be returned from a function, but again I'm interested mainly in why C does it.

A C function returns only a single thing, whether it be a fundamental data type or a pointer. I don't think there is any theoretical reason why it couldn't return more than one thing, and perhaps put all return values on the stack. So I'm wondering if there was a technological reason that the single return value rule was instated? If not, what was it?

**EDIT: by "single thing" I mean either a single variable of a fundamental data type or a pointer to something else. This is as opposed to returning a tuple, say:

(int x,char* y) = func(int a,double b);
Michael Stachowsky
  • 3,151
  • 2
  • 22
  • 31
  • 14
    I suppose it's important to understand what you mean by "single thing" here. A pointer to an array or struct may point to something that contains many things but is still a "single thing". – jwh20 Feb 01 '23 at 19:05
  • @jwh20 good point. I've edited the question to hopefully be more clear – Michael Stachowsky Feb 01 '23 at 19:06
  • 7
    Maybe "complex type" would be a more accurate term, as K&R C did not allow you to return e.g. structs or unions either. – Jonathan Potter Feb 01 '23 at 19:12
  • 2
    Fortran functions only return a single output value. That meant that the function could be used in things like an if() statement without the compiler checking whether it returned nothing, one thing, or many things. – Jon Custer Feb 01 '23 at 19:15
  • @JonCuster indeed, as do many other languages. I'm specifically interested in C, although I suppose if the answer is "because that's how some ancient language did it" then so be it. – Michael Stachowsky Feb 01 '23 at 19:18
  • 6
    @JonathanPotter K&R C did allow to return structs. – Leo B. Feb 01 '23 at 20:42
  • 2
    Everything computer related was so much simpler back then. Languages, hardware, and most of all: the user/programmer conception of what you could do given the world at the time (I mean day-to-day limitations that today are not even seen except in the teeny-tiniest of embedded environments.) – davidbak Feb 01 '23 at 20:43
  • 8
    Plus the developers of C weren't programming language mavens, even given the understanding of programming languages and models at the time. They were trying to develop a useful computer system so that they could do their other interesting research. They needed an OS (which became Unix) and they needed a language higher than assembly to code it and their real applications in (which became C). But those weren't the end goal, and there's only so much time and so many people on your research team to work on stuff ... – davidbak Feb 01 '23 at 20:45
  • 1
    @LeoB. Nope, pretty sure it didn't. Maybe some compilers did but "pure" K&R did not. – Jonathan Potter Feb 01 '23 at 21:00
  • 1
    @JonathanPotter "No true Scotsman" fallacy detected. – Leo B. Feb 01 '23 at 21:02
  • 2
    @LeoB. I knew I kept my first edition of The C Programming Language around for a reason :) https://imgur.com/a/zoq2vWF – Jonathan Potter Feb 01 '23 at 23:58
  • @JonathanPotter Do you see the parenthetical after the highlighted section? That clearly tells that the design was to allow it, and the restriction was merely temporary for implementation reasons. – Leo B. Feb 02 '23 at 00:04
  • @LeoB. shifting the goalposts there somewhat :) – Jonathan Potter Feb 02 '23 at 00:08
  • @JonathanPotter Just following the spirit of the original question. Also, the fact that some early K&R C compilers did not even allow assigning structs to one another, let alone returning them from functions, does not make those compilers more "pure" than later K&R C compilers, delivering on that promise. – Leo B. Feb 02 '23 at 00:17
  • 12
    "This is as opposed to returning a tuple." It's interesting that you used a singular noun to describe what is being returned here. Even when discussing multiple values, it feels natural to group them into a singular item (effectively "the collection of returned values") for the purposes of communication. – Flater Feb 02 '23 at 04:53
  • 2
    Regardless of what early C implementations did or did not support, this question speaks about C in the present tense, and every version of standard C since the 1989 standardization has supported returning structures and unions. These, especially structures, are not "a single thing" as the question defines that. – John Bollinger Feb 02 '23 at 18:02
  • What makes you think the designers of C had other reasons (to not allow this) than the designers of other languages? I think it would have been better to ask why this is not allowed in most languages. It's hard to say if it's good that C lacks this feature. But I think the benefit of such a feature is limited and it's not a game changer. – zomega Feb 02 '23 at 21:22
  • 2
    @Flater And yet, arguments are listed individually, without the need to use a struct. I think it's reasonable to wonder why arguments have that special treatment, but not return values. – Invizio Feb 03 '23 at 02:23
  • @Invizio On that note - sometimes arguments are return values themselves. – Brilliand Feb 04 '23 at 19:17
  • @Invizio: My point was more than that if it's (subconsciously) more intuitive to use a singular in communication on the topic, which you've sort of proven; it stands to reason that the language designers were also (subconsciously) finding it more intuitive to reason about a singular return value (even if said value would be a collection of values) – Flater Feb 05 '23 at 15:44
  • 2
    I believe the reason it's intuitive to all of us is specifically because most languages already allow one return value only (would it be as intuitive if the languages we're used to allowed multiple return values?), but that would be more of a psychological reason than a technological reason, as the question asks. – Invizio Feb 19 '23 at 16:36

10 Answers10

55

The premise of the question is incorrect.

In the first edition of the C Programming Language book (thanks to @JonathanPotter), the authors mention the intent to support passing structs to functions and returning them from functions, along with assigning them to one another:

6.2 Structures and Functions

There are a number of restrictions on C structures. The essential rules are that the only operations that you can perform on a structure are take its address with &, and access one of its members. This implies that structures may not be assigned to or copied as a unit, and that they can not be passed to or returned from functions. (These restrictions will be removed in forthcoming versions.)

(note the parenthetical). This means that C was designed to return values of composite types from functions, but it was not considered essential to support them in the early releases of the compiler.

Indeed, while the C compiler in UNIX V5 does not compile the program below due to a bug (that is, it fails with an internal error message rather than a syntax error or a semantic error), and the V6 one says Unimplemented structure operation, the one available in BSD 2.9 happily compiles it correctly.

struct s { int a, b; };
struct s foo() {
    struct s r; 
    r.a = 5;
    r.b = 25;
    return r;
}
main() {
    struct s a;
    a = foo();
    printf("%d %d\n", a.a, a.b);
}

prints 5 25 as intended. Moreover, it does not happen by chance, as the assembly code contains an explicit copy of the function result into the memory location of the variable a, then its fields are pushed to the stack to be printed:

jsr     pc,_foo
mov     (r0),-12(r5)
mov     +2(r0),-10(r5)
mov     -10(r5),(sp)
mov     -12(r5),-(sp)

That is, the underlying C language mechanisms are sufficient to return arbitrarily complex structures. Writing functions returning ad-hoc tuples like

(int x,char* y) = func(int a,double b);

rather than values of predeclared types, would be an example of syntactic sugar.

Stephen Kitt
  • 121,835
  • 17
  • 505
  • 462
Leo B.
  • 19,082
  • 5
  • 49
  • 141
  • How is the value in R0 computed? What storage is it returning? – supercat Feb 01 '23 at 20:44
  • @supercat The caller of a function which returns a struct reserves a space for it, and passes the pointer to that space as the shadow first argument. – Leo B. Feb 01 '23 at 20:46
  • 2
    Is the ability to return a struct just syntactic sugar for having a function accept a pointer to a struct as an argument, and maybe return that pointer, or are there cases where it would offer some performance advantage? – supercat Feb 01 '23 at 20:51
  • @supercat if anything it feels like a potential disadvantage, as it requires extra work if the caller wants the results to end up in a struct that is on the heap? – Tommy Feb 01 '23 at 20:53
  • @supercat That ability provides an opportunity for an optimizing compiler to elide some copies, in the caller as well as in the callee. Whether a compiler does it or not, is up to a compiler writer. – Leo B. Feb 01 '23 at 20:57
  • 1
    ... and, separately, disjointly, digressively, noting that C++ has semi-recently introduced structured binding that ends up looking quite a bit like I imagine a call to the ad hoc tuple function hypothesised would look. – Tommy Feb 01 '23 at 20:57
  • @LeoB.: How often would a compiler not have to go through a fair bit of work to simply not end up worse than with a manually-passed structure pointer? An ordinary function that receives a structure pointer can use it to modify the passed structure "in place", but a function that returns a struct would, by contrast, generally have to build the struct in an automatic-duration object and then copy it into the space supplied by the caller. unless the ABI specified that calling code struct foo x = func(); would be have to pass the address of a temporary buffer instead of passing x's address. – supercat Feb 01 '23 at 21:22
  • @supercat True, optimizing it in terms of instructions would require quite a lot of work for the compiler. IMO, the main intention of the feature was to allow stack reuse. In any case, we're digressing from the OP question, which is counterfactual. – Leo B. Feb 01 '23 at 21:28
  • 2
    @LeoB. From the 1974 C Reference Manual: "Not all the possibilities allowed by the syntax above are actually permitted. The restrictions are as follows: functions may not return arrays, structures or functions, although they may return pointers to such things;" I'm not sure what "stack reuse" would be facilitated by struct-returning functions as compared with having "user code" declare an automatic-duration object and pass its address if a temporary object is really necessary, or pass the address of whatever structure should be populated by a function. – supercat Feb 01 '23 at 21:32
  • @supercat Unix V6 C compiler fails with "unimplemented" rather than "not permitted". That means to me that there was an intention to implement the feature, which has been realized soon after. – Leo B. Feb 01 '23 at 21:50
  • @LeoB.: I would have liked to see specifications for a "bootstrapping" version of the language which was pared down to the minimum, with the intention that writing a compiler for "Bootstrap C" in assembly language would a typical new platform be fairly simple, and once one had that working, one could that to process a full-featured C compiler that was written in "Bootstrap C" to produce a full featured C compiler that could run on the new platform. Features like struct returns and even any kind of floating-point math would be sensible omissions. – supercat Feb 01 '23 at 21:54
  • @supercat Sure enough, there are many things not necessary for a "bootstrapping" version of the C compiler, to wit, but that is beside the point. – Leo B. Feb 01 '23 at 21:58
  • @LeoB.: From what I understand, Ritchie first designed a version of C that could bootstrap itself, and then proceeded to write more sophisticated compilers in earlier versions of C. While the earliest versions of the language are lost to time, the design constraints would have been somewhat similar to those of someone trying to produce a self-bootstrapping version of the language today. – supercat Feb 02 '23 at 15:53
  • 1
    I know that OP didn't want to be told that it was because predecessor languages also had that restriction, but I think it's worth suggesting that C was somewhat more heap-oriented than older languages and as such the developers might have been far more relaxed about returning pointers to newly-allocated storage. If only they'd initiated a robust culture of /always/ documenting whose responsibility it was to free memory once it was no longer needed... – Mark Morgan Lloyd Feb 02 '23 at 21:05
  • 1
    "The OP didn't try to verify the premise of the question before asking." I tried compiling OP's code, and I can confirm it doesn't work. I suppose that the obvious answer is to just use a struct, but I notice that one can write many parameters in a function declaration without needing to use a struct. I interpreted OP's question as: Why was C designed to allow parameters to be listed inline, but not the return values? – Invizio Feb 03 '23 at 02:39
  • 3
    @Invizio The premise is A C function returns only a single thing, whether it be a fundamental data type* or a pointer.* That is plainly wrong. – Leo B. Feb 03 '23 at 03:18
  • I stand corrected, I missed that. Thanks for pointing it out. – Invizio Feb 03 '23 at 05:05
16

There is obviously no technical reason why C could not have returned a "complex" type. Using your example, (int x,char* y) could have been a returned.

But at what cost? Please remember the state-of-the-art in 1972 where the computer that first ran the first C compiler was the PDP-11. The language was designed to be simple to implement and performant. Every feature you add to the language adds complexity and impacts performance. I'm sure they believed that having complex return types was not worth the cost.

As a C developer is it really so much more difficult to return a pointer to a struct or other complex data vs. the struct or data itself? Not really in my opinion.

Since there are other languages today that do return complex data that proves that C could have done it as well but the developers of the language either decided not to support that or (more likely in my opinion) didn't even consider it because it didn't fit with their vision for the language.

I think we sometimes assume that C was carefully designed from a detailed specification but that is certainly not the case. It was a quick-and-dirty project that was done to meet an immediate need for the research these guys were involved with.

jwh20
  • 3,039
  • 1
  • 10
  • 19
  • 2
    C compilers for PDP-11 are perfectly able to handle functions returning structs. – Leo B. Feb 01 '23 at 20:37
  • 2
    @LeoB.: They gained that ability some time after the 1974 C Reference Manual was written. – supercat Feb 01 '23 at 21:35
  • 1
    @supercat Who cares about 1974? From Wikipedia: In 1978, Brian Kernighan and Dennis Ritchie published the first edition of The C Programming Language. This book, known to C programmers as K&R, served for many years as an informal specification of the language. – Leo B. Feb 01 '23 at 21:55
  • 2
    @LeoB.: The fact that later C compilers added the ability to process struct return types as syntactic sugar doesn't imply that C wasn't designed initially to return a single non-aggregate object, nor that the PDP-11 didn't handle such returns much more efficiently than it could handle returns of aggregates. – supercat Feb 01 '23 at 23:21
  • 1
    @supercat Trying to deduce after the fact how C was "designed", is fruitless. Was C specifically "designed" to be ambiguous in its syntax of a=-b? Was it "designed" not to allow unsigned types? Declaring functions returning structs was not diagnosed as a prima facie user error even in 1974; thus I conclude that the intention to allow it was there from the beginning, but the implementation came a couple of years late. – Leo B. Feb 01 '23 at 23:46
  • 1
    @LeoB.: The return-value-passing convention used on the PDP-11 only returns 1 thing. I don't know by whom and why it was decided that the additional complexity necessary to process the syntactic sugar associated with returning aggregates was worth the cost, given how many other constructs could have offered far more value, far more cheaply – supercat Feb 02 '23 at 00:06
  • @supercat Apparently it was decided by K&R themselves, as can be seen by the parenthetical in https://imgur.com/a/zoq2vWF Note that it lists the highlighted restriction along with assigning structures as a unit, the utility of which is not disputed. – Leo B. Feb 02 '23 at 00:09
  • PDP-7, no? And because of that, freaking zero-terminated strings... – Maury Markowitz Feb 02 '23 at 01:57
  • @supercat re The return-value-passing convention used on the PDP-11 only returns 1 thing True de facto, but as implementors of the OS, the language, and the complete tool chain, they were free to design any return-value convention they wanted, so I'm unclear on what point is being made. – dave Feb 02 '23 at 02:33
  • @LeoB.: Interesting that struct assignment and struct return types were lumped together like that, given that support for struct assignment will--if suitably exploited by programmers--make it easier for compilers to generate efficient code, but the same isn't really true of struct return values. – supercat Feb 02 '23 at 15:44
  • @another-dave: Bearing in mind the C was used to bootstrap itself, I think Ritchie designed a primitive version of C for that purpose, that version would have been designed to have functions return a single value, and such a design would have been motivated by the fact that it could be accommodated more efficiently than a design that allows returning aggregates. – supercat Feb 02 '23 at 15:50
13

On most platforms, expression evaluation would require having someplace reserved to put the result; simple compilers would always use the same CPU register or combination of registers for computations whose result is used for any purpose other than direct assignment to an register-qualified object.

If evaluation of an int expression always leaves the result in register 0, and the platform's process of handling function returns wouldn't disturb that register, then all the code generation for return X; would have to do, within an int function, would be to output code to compute the value of the associated expression in usual fashion (which would leave the value in R0) and follow that with a return instruction. An expression which calls a function that returns int would be generated by outputting a function call instruction and then assuming that the value of the function call expression had just been generated.

An essential thing to note about this approach for returning values is that the register used to hold the return value generally wouldn't be able to usefully hold anything else if it weren't used for that purpose, so returning a value in this fashion is "free".

While compilers fairly quickly added support for returning structures, this was often syntactic sugar for having callers create a structure on the stack and pass its address as an extra "hidden" argument, and then having a return statement in the called function copy data from whatever structure was used in the return statement to buffer supplied by the caller. Unlike simple-value returns, this approach would require allocating stack storage that otherwise wouldn't be allocated, and thus isn't "free".

supercat
  • 35,993
  • 3
  • 63
  • 159
  • how many registers did the pdp-6 have? the -11 already had several registers that could have been used for return values, if needed. (at the time I was using pdp-8s and similar machines that had 1 or 2 registers (A and B) at the most. – davidbak Feb 01 '23 at 21:15
  • @davidbak: Registers that were specified for returning values would only be usable for register-qualified objects if their values were stacked across function calls. – supercat Feb 01 '23 at 21:25
  • Didn't early versions store arguments in specific memory locations, and forbid recursion? I don't see why they couldn't also store return values in specific locations. – user253751 Feb 01 '23 at 23:52
  • 1
    Good answer, finally one that distinguishes returning a single struct object vs. multiple primitive types where the caller might only want to use one. e.g. memcmp would be more useful if it returned the mismatch position in another register, as well as the comparison. A struct would usually get returned by hidden pointer, adding latency, unless a C ABI allows returning structs of 2 objects. (I guess if standard library functions did that, ABI designers would be motivated to support it more efficiently.) – Peter Cordes Feb 02 '23 at 16:13
  • 1
    So to some degree this is an ABI design issue, but it still seems clunky to need to declare a struct type for every function that wants to return 2 things. (Supporting arbitrary amounts of return objects would be extra ABI complexity, and a limit of 2 would be arbitrary.) Historically the evolution of C with active use in its primitive forms led to bad designs of some library functions sticking. (Especially string functions like strcpy.) memcmp not returning a position is more understandable, but unfortunate these days when SIMD searching can go 32x faster than scalar. – Peter Cordes Feb 02 '23 at 16:18
  • @PeterCordes: An alternative was and is to pass the address of an array, or a portion thereof, and have a function populate that. For example, a function computeSineAndCosine could accept as arguments double angle and a double*results, placing the sine of angle into results[0] and the cosine into results[1]. Depending upon what one is doing, having such a function accept two independent pointers might be more or less convenient than using one, and might lead to code which is more or less efficient than using one. Predicting which approach would be more efficient would often... – supercat Feb 02 '23 at 19:38
  • ...not be possible looking just at the code and specifications for the function, but would require knowing how it is going to be used. If one wants to use both return values from a function, one will generally need to declare an automatic-duration object to hold them, and once such an object is needed, someFunc(&thing, whatever) isn't really much less convenient than thing = someFunc(whatever), but would likely be if anything more efficient. – supercat Feb 02 '23 at 19:44
  • Something like memcmp() could have be best handled by having multiple functions, especially since many SIMD approaches have a non-trivial setup time which could not be recouped when comparing objects which will usually have few if any bytes in common. A more problematic function is realloc, which to be used most effectively would need to be able to distinguish cases where the request was satisfied without moving anything, the request was satisfied by copying everything to a new location that might overlap the original, the request was satisfied by copying everything to a new location... – supercat Feb 02 '23 at 20:01
  • @PeterCordes: [see above]...disjoint from the original, the request was not processed and the original object is still valid, or a zero-size request was processed, the original object [if any] is gone, and the implementation extends the semantics of null pointers to allow them to be treated as identifying zero-sized objects (so e.g. adding 0 to a null pointer will yield a null pointer, memcpy(any,any,0) will be a no-op even if one or both pointers are null, etc.). Plus there should have been a way to indicate that some of those alternatives would be unacceptable, and the function should... – supercat Feb 02 '23 at 20:06
  • ...indicate if it couldn't succeed in acceptable fashion, or force an abnormal program termination if it couldn't succeed and acceptable fashion and it wasn't allowed to report failure either. If client code isn't going to be prepared to do anything useful in case of an allocation failure, having the request force an abnormal program termination would be less bad than having the program erroneously thinking an allocation is bigger than it actually is. – supercat Feb 02 '23 at 20:08
  • Seems unlikely to me that foo(&out, x) would be more efficient than out = foo(x) on a system that can return in a register. Having the caller store to out shouldn't be worse in most cases than having it pass a pointer for the callee to use. Unless it doesn't need that pointer at all anymore, and doesn't need a call-preserved register to keep it across the function. Like possibly for GNU C sincos(double, double*, double*), if you aren't looping over an array of outputs and don't need to use the results for further computation. – Peter Cordes Feb 02 '23 at 20:15
  • I suspect its API was designed more for regularity, not performance; it would probably be faster to return the sin as a return value (in a register) and take an output pointer for the cos, assuming a calling convention where you couldn't return both in registers. But if C had been designed for multiple return values, C calling conventions would have had more efficient multi-value returns. – Peter Cordes Feb 02 '23 at 20:17
  • With memcmp, yes if you just need greater less or equal, you might not identify the exact byte that differs if you just find the mismatch chunk and do a big-endian integer compare of multiple bytes. Modern glibc memcmp for x86-64 does pretty well at not branching much for small inputs: 32B compare (if that won't cross a page) into a bitmask and zeroing a number of high bits dependent on the length. But on unequal it does have to compute pointers to the mismatching bytes to zext and sub them. – Peter Cordes Feb 02 '23 at 20:24
  • @PeterCordes: If out is of a type that can be passed in a register, then out = foo(x) would often be more efficient than foo(&out, x);. On most platforms, if the type can't be passed in a register, and a caller can't guarantee that there's no way for the called function to know the address of out, it will be necessary either for the caller to pass the address of a temporary object and copy its contents to out after the function returns, or for the called function's code to refrain from using the passed pointer until after the return expression is evaluated, and copy... – supercat Feb 02 '23 at 21:20
  • ...the data that should be returned to the passed storage at that time. If user code passes a pointer to a globally-accessible object to a function which is then supposed to store data into the storage identified by the pointer, , the programmer rather than the compiler would be responsible for ensuring that it only writes parts of the object whose contents will no longer be needed. – supercat Feb 02 '23 at 21:27
  • @PeterCordes: As for memory comparison functions, if the byte at src1 will only match the byte at *src2 about 1% of the time (which would expected in some usage cases), a test for non-zero length followed by a comparison of the bytes at just those addresses would be faster, 99% of the time, than just about anything else, even if one were only interested in a "pass/fail" comparison. If two blocks will match 99% of the time, however, time spent on individual-byte comparisons will be essentially wasted. – supercat Feb 02 '23 at 21:40
  • Yes, but if you can easily have multiple output args, you don't need a single struct return value, you'd just have multiple scalars. That's the case I was considering. Agreed that for larger objects, manually passing an output pointer can gain efficiency. As you say, the caller needs a temporary object; it can't pass a pointer to an object that the callee could conceivably access some other way (SO Q&A), unless you define the ABI to allow such aliasing, in which case the callee can't write to that object until after its last read of non-private mem. – Peter Cordes Feb 02 '23 at 22:15
12

Yes, prior to ANSI C, there was a technological reason. Valid code could corrupt the stack.

Prior to ANSI C, function declarations were optional. According to Bell Labs, if you called a function that had not yet been declared, the caller should assume that the return type was int. In their C Programmer's Handbook, p. 27, they warned that this could result in nonsensical results when the actual function definition had a different return type:

C Programmer's Handbook page 27

Examples --

  • Correct
extern double linfunc();
float y;
y = linfunc(3.05, 4.0, 1e-3)

The value of the function call is properly converted from double to float by the assignment to y.

  • Incorrect
float x;
float y;
x = 3.05;
y = linfunc(x, 4, 1e-3)

This is wrong because types of arguments do not match declarations in definition, that is x is float not double, 4 is int not double. The result is that arguments passed on the stack are the wrong size and format, so that arguments taken off the stack by the quadfunc [sic, should read linfunc] are nonsense. There is no predictable return value. Also, unless the type of linfunc is declared, it is presumed to be type int. Thus even if the linfunc() did return a meaningful double, the value of the function call expression would be a nonsensical int value (e.g., upper half of a double).


The "single things" (technical name: scalars) that a function can return have the nice property that they can fit entirely within the register space of most processors. You don't need the stack; you just put the return value in one or more registers.

A compound object such as a struct or array might not fit within a processor's register space, requiring them to be placed on the stack. For this to properly work, the following must happen:

  1. The caller must reserve space on the stack for the compound return value (in addition to the arguments also placed on the stack).
  2. Control then transfers to the called function, which then reads the arguments and writes the compound return result to the stack.
  3. Upon return, the caller then reads the return result from the stack, then discards the memory allocated to arguments and return value.

Getting back to pre-ANSI C, what happens if you call an undeclared function that intends to return a compound object?

  1. The pre-ANSI behavior is for the caller to assume that int is being returned, thus no space (or the improper amount of space) is allocated on the stack for the return value.
  2. Control is transferred to the called function. It tries to read arguments from the stack, which as noted above, might produce nonsense values. However, the called function will also try to store the compound result on the stack, which was not allocated by the caller. This will result in a corrupted stack.
  3. With a corrupted stack, control may not return to the caller or its predecessors, crashing the program.

The easiest way to avoid this problem was to not allow compound objects as return values. Later compilers required function declarations, which is a better way of solving the problem.

DrSheldon
  • 15,979
  • 5
  • 49
  • 113
  • 3
    All this is true yet I think the first sentence, last paragraph is off the mark: "The easiest way to avoid this problem was to not allow compound objects as return values." Go back in time to that period - I think the easiest way to avoid the problem was to not think of it as an issue to be solved at all! Functions in all programming languages at that time - common ones anyway, not LISPy ones - returned one value via a function return (in a register) and used out parameters for all other results. That's just the way it was, everybody thought that way. – davidbak Feb 03 '23 at 05:08
  • 1
    The obvious solution was to define those functions using ‘extern’. – RonJohn Feb 03 '23 at 05:29
  • Still, struct returns were a) promised and b) implemented way before ANSI C. – Leo B. Feb 03 '23 at 09:06
  • @RonJohn: Especially because the implicit int doesn't work very well. It didn't work on DOS (most of the time) and it doesn't work now on x64. – Joshua Feb 03 '23 at 19:48
  • Calling an undeclared function that returns a type other than int has always been erroneous. That erroneous programs can produce all manner of ill effects continues to this day to be an explicitly acknowledged characteristic of C. That's not a technological reason for a function to return only "a single thing", especially when we accept functions returning a double as their single thing. A double doesn't necessarily fit in a CPU register, and usually does not fit in space the size of an int. – John Bollinger Feb 04 '23 at 17:39
  • @JohnBollinger: On many platforms, a function which chained to a return someFunctionPtr(whatever) could be processed in a manner that was agnostic to the return-type (if any) of the passed function if it was anything other than a struct or union being returned by value, and code which would call a function but ignore the return value could likewise be agnostic with regard to the return type. Code which would use a function's return value would need to know what the type was, but if a caller's machine code wouldn't care about a function's return type, the compiler wouldn't either. – supercat Feb 07 '23 at 21:53
6

Preface: There is no single, hard reason, but rather a series of good decisions.

Was there any technological reason that C was designed to return only a single thing from a function?

Sure. C is intended as simple language. Simple does not mean easy, but simple to implement with the least amount of language constructs possible.

For result passing the most simple thing to do is restricting it to a single value. This is what nearly all computers can do, what nearly all languages provide as minimum consensus and - maybe most important - what was already standard on the machines C was developed on: Returning a function's result in R0.


I don't think there is any theoretical reason why it couldn't return more than one thing,

You mean beside complexity? You mentioned the lineage of C. The eventually most important aspect of that evolution was to drop everything that is not absolutely required. The important distinction between a function and a (sub) procedure is that a function returns at least one value. The ability to return more than one value is not simple, thus unnecessary.

Likewise any syntax for multiple return values (like shown in your pseudo-code) would have added complexity to the compiler. This complexity would not really bring any benefit that couldn't be gained by other means - like passing pointers to variables, or, even better, using a structure and exchanging pointers.

and perhaps put all return values on the stack.

Three reasons:

  1. Compatibility

    At that point it's important to remember that basic calling on the PDP-11 (*1) used to return any result value in R0. Sure, C could use parameter return on the stack. Except that would mean C programs could only call C functions/procedures. Any external function would need an assembly wrapper - or a compiler/linker extension to change parameter passing depending on the type of function called.

    C was not developed to write Unix, but to rewrite it. So linking to existing code was mandatory.

  2. Runtime Memory

    Here lies another important simplification: Procedures are just functions that don't return anything - or better whose return value is ignored. If return value(s) would be handled using the stack, each call would have to allocate memory for that return value.

  3. Execution Speed

    Each and every memory operation slows execution. Returning via stack means that any result value needs to be moved into that location first. And after returning from that location, storing a value at the end of a function in memory, that will be most likely be loaded right after returning again means two superfluous instructions to be executed without any gain.

  4. Code Bloat

    Of course additional instructions also need additional code space. So returning a value via the stack means 2 instructions or (at least) 4 additional bytes.

All of this has to be seen in context of a PDP-11/20 with a main memory of only 64 KiB (256 KiB with MX11 - not sure if the one used by Ritchie had that).


*1 - The way PDP-11 subroutine calling is implemented is further complicated by how C implemented it.

Toby Speight
  • 1,611
  • 14
  • 31
Raffzahn
  • 222,541
  • 22
  • 631
  • 918
  • 1
    On a CPU with more than 2 registers, two return values could easily be returned in registers. It's just a matter of designing a calling convention to support it. The "primary" return value can still go in R0, and a secondary return value (like a mismatch position from memcmp or strcmp) can go in another call-clobbered register; functions like memcmp will naturally end with that value in a register anyway. Obviously handling this would take more complexity for the compiler, so might have been problematic at the time. – Peter Cordes Feb 02 '23 at 16:24
  • 1
    @PeterCordes yes, a lot can be done (PDP-11 hat 8, but 3 are already been taken by default). Just look at some really nice languages. Except, C is not about being nice, but providing the necessary minimum. – Raffzahn Feb 02 '23 at 16:43
  • 1
    Oh, I see all 4 of your three (cough) reasons are in a section replying to the OP asking about returning on the stack, rather than as primary reasons why it's a problem to return 2 registers. I had been going to say those only apply with a crappy ABI / compiler incapable of returning two values in registers that are call-clobbered anyway in the asm calling convention. But that's the simplicity argument you make in the first half of the answer, which makes sense for early compilers on cramped systems. – Peter Cordes Feb 02 '23 at 16:56
  • @PeterCordes yes, in the end all of that can be derivative once one accepts the main reason of C being as minimalist as possible while still covering all basic cases. Spelling them out does make understanding more easy to anyone not investing mas much time to think about. – Raffzahn Feb 02 '23 at 16:59
4

Is simplicity a technological reason?

Usually, C functions parameters are pushed into the stack before call and then flushed after by the calling procedure (PASCAL is different, the callee pops its own parameters). The stack is one-way.

If possible, the return value is passed in a register, "accumulator", R0, EAX or their floating-point equivalents.

Grabul
  • 3,637
  • 16
  • 18
  • It may be sufficient, yes. I am wondering if there is anything specific happening on the computers that C was developed for that necessitated it, or it really was just a choice for simplicity. As one of the comments pointed out, a function that returns a single thing can easily be used in an if statement. However, I am looking for a firm source rather than a well-reasoned idea :) – Michael Stachowsky Feb 01 '23 at 19:27
  • but they wouldn't have chosen to pass parameters and return values in this way if there could be more of them. The stack is not one-way. – user253751 Feb 01 '23 at 23:53
  • Modern calling conventions return structs as large as 2 registers in two registers, not memory. https://godbolt.org/z/14efTbn53 shows x86-64 returning in RDX:RAX and AArch64 returning in x1:x0, for a pointer + integer return value. (Like memcmp or strcmp could have used, to not throw away the mismatch-position information.) If you only need one of the return values, if (foo().first < 1). Or if C had a notion of a "primary" default return value if you just do foo() < 0, and an optional secondary return value you could also grab. – Peter Cordes Feb 02 '23 at 16:33
  • 2
    If C had been designed with multiple return values, C calling conventions / ABIs would have evolved to allow returning in multiple registers, similar to how calling conventions pass args in multiple registers. – Peter Cordes Feb 02 '23 at 16:33
  • @user253751 C was developed on a pretty low-spec system. That’s reason enough for a “high level assembly language” to keep things simple. Mathematical functions return one valve, that’s what they knew functions to be, so that’s what they did. – RonJohn Feb 03 '23 at 05:34
4

C does support returning multiple values - using out parameters.

int arg_a;
int out_r;
f(arg_a, &out_r);

It seems obvious nowadays that a function consists of an argument list and a result, which then poses the question why there is no "result list".

The latest iterations of mainstream languages like C# do indeed have a result list, with accompanying syntax to define the result list in the function header, and to assign the results to multiple variables at the call site.

int arg_a = 1;
int arg_b = 2;
(int out_r, int out_s) = f(arg_a, arg_b);

However, back when C was first designed, it might not have seemed obvious why a function should even have a result as well as an argument list, when the argument list alone is capable of meeting all relevant needs.

In fact, a single pointer argument, passing the address of a struct, is capable of shuttling as many inputs and outputs as the programmer may find necessary.

struct f_args {
    int arg_a;
    int arg_b;
    int out_r;
    int out_s;
};

...

f_params fa;

fa.arg_a = 1;

...

f(&fa);

The answer to the question is really found in considering why a single pointer argument is not generally considered enough.

Computational efficiency

The first issue is computational efficiency. Passing multiple values in registers is more efficient than just passing a pointer in a register (or on the stack) and letting the callee unpack the contents.

One of the benefits of C at the time of its conception, compared to assembly, was that the compiler was capable of selecting and using registers appropriately for the purpose of shuttling values into and out of functions - using the full complement of registers furnished by whatever hardware was in use - without the programmer being hassled by the task of register selection as they would be when writing assembly.

Therefore, passing values in as individual items (rather than as a pointer to a struct) has a real bearing on efficiency.

Typical balance of arguments and resultants/composition into expressions

It's also common in practice that functions take multiple arguments but return a single value. All basic arithmetical operators take two arguments and return one value - dyadic and monovalent. Standard mathematical expressions also rely on all evaluations being monovalent.

Modelling this traditional mathematical style (without attempting to innovate it), and the de facto reality that many functions have multiple inputs but only one output, is therefore what strongly justifies the specific configuration of a multi-value argument list, and a single return - polyadic and monovalent.

Again to compare with assembly, expressions were a big jump in functionality - and there must be at least one return value (as distinct from merely an out parameter) to allow composition of functions as part of expressions.

Syntactic ease

The practice of defining the argument list inline with the header (as opposed to defining a struct), and of composing values together inline as part of the call (as opposed to assigning values into a struct as a preparatory step before a call), has been found to be a good usability feature.

Although the inline composition has always been present in C, the ANSI C syntax for actually defining a function (and its arguments and argument types), does not correspond with the original style in K&R C.

A style of function definition that nowadays seems standard across many programming languages, was not a settled question at the time C was originally designed.

Mathematics uses functions, but it had no corresponding practice or syntax for defining type constraints (or even any explicit concept of "type" as programmers know it), so there was additional design work required on the syntax.

Conclusion

It's at least for the above reasons why C came to have a polyadic-monovalent style of function, as opposed to either the bare bones of what is necessary (which is one argument and no return value - monadic and nonvalent), or some other style.

The existence of the return value at all, is primarily driven by its compelling use as part of expressions. Multiple return values do not exist, because how to integrate them usefully into expressions is not even clear today, and certainly wasn't on the agenda in 1972.

Certainly, modern functional languages have operators which can select individual values from amongst multiple results of a previous stage, but there's often a massive loss of explicitness and obviousness.

Meanwhile, I suspect in C# that the predominant use of multiple return values, is not for use within expressions - in other words, multiple returns are used for completely different (and far less important) purposes than what single returns are typically used for.

And the long-time existence of a multi-value argument list (as opposed to just a single value), besides aligning with the existing mathematical practice, is primarily because it saves the hassle of always defining and assigning a separate structure, and because it is capable of catering to the need for multiple output parameters as well as multiple inputs.

Steve
  • 787
  • 4
  • 6
3

There's also nomenclature: C calls these functions, like the mathematical concept of a function. Mathematical functions return a single type of value even if that's a tuple or something more complicated. So functions returning a single thing matches the metaphor of what a computer operator would have expected out of the f(x) notation used. If they were 'subroutines' maybe they would have had no return code at all and only in-parameters and out-parameters.

davolfman
  • 633
  • 1
  • 6
3

DrSheldon's answer is right in that implicit function declarations require a canonicalized way to return a fixed amount of values, which was set to one. And this is a technological reason, albeit on the language level. He has my upvote for it.

As a matter of fact, it is just as easy to define an ABI that allows for arbitrary return values as it is to allow for arbitrary parameters. The available mechanisms are exactly the same: First you can define some registers to hold the first few values, and just as larger values are passed on the stack, the caller can reserve stack space to return larger values.

As such, I believe that the real reason for not allowing more than one return value is the same as for modern languages: Virtually all languages use the notion that a function call has a value just like a variable name or some expression does. C is no exception to this. If you have a function that returns multiple values, you are only able to use a single one of these values implicitly, or the tuple of return values as a whole.

Today, some languages have adopted tuple unpacking in an assignment to work around this problem. But the problem persists: Where you can use a single return value right in the middle of any odd expression, you must first assign multiple return values to something in a tuple assignment. Needless to say, C does not have tuples. It has struct, and you can indeed define a struct for a function return. However, you need to give that struct a name, both for the type and the variable to which you assign the function result. It is doable, but requires some boilerplate code.

So, the real technical reason to not implement multiple return values has nothing to do with the hardware. It is a problem of language design that persists today, and even the most modern languages can only ever work around it. These workarounds introduce complexities into the compiler that must have looked quite unnecessary, and most certainly not simple, to the people who developed C.

  • On multiple results, F# for example is capable of piping the results list of one call as the argument list of the next, without unpacking. It also has operators which select individual items from the results (and discards others). In my opinion, the problem with this approach is not a language design difficultly, the problem is that the presence of all the intermediary names are purged from the code, and everything becomes far too implicit. The design is a victim of its own success. – Steve Feb 05 '23 at 09:51
  • @Steve Well, piping the results into a function call as arguments is using the tuple of return values as a whole. Of course, this can be done. And I share your unease about dropping all intermediary names. However, as a former assembler programmer, I've frequently been annoyed by C's inability to return two values without fuss or pointer argument overhead. It's extremely straightforward to return a handful of results in assembler without touching the stack. – cmaster - reinstate monica Feb 05 '23 at 10:11
  • I'm not an assembly programmer myself, but I wonder whether the problem you're alluding to requires a careful fitting-together of how both the caller and callee use the registers? In other words, the use of registers has to be analysed and designed as a matching combination by the assembly programmer, an approach which would then make the C compiler's register-use analysis potentially a global problem for the entire program, rather than local to each function. Also, I gather your grievance is more about the computational inefficiency of passing a pointer, than about the syntactical weight? – Steve Feb 05 '23 at 12:39
  • @Steve Indeed, my grievance was about performance. As an assembler programmer, you have all the liberties that a compiler does. And if you say: This function accepts these three inputs in these three registers, and produces these two outputs in these two registers, that's how it is. All five in-/outputs behave exactly the same. Not so as a C programmer: The compiler can pass inputs in registers, but not several outputs. You are forced to accept at least two superfluous memory accesses plus an address taking instruction. And memory accesses are expensive even when the are cached. – cmaster - reinstate monica Feb 05 '23 at 18:25
-2

It is worth noting that C is a low level procedural language and does not really support Object Oriented programming.

This seems odd seeing that all the children languages of big-daddy-C brought OOP to the masses.

C was designed to deal specifically with hardware at a low level. It does this exceptionally well, but a lot of the niceties we have grown used to are absent in its implementation.

Neil Meyer
  • 6,275
  • 9
  • 29
  • 44
  • C has a syntax that is concise and consistent, and there are a lot of source-code related tools for that syntax. No need to reinvent the wheel when developing a new language, just add the new features you want, which is exactly what the children languages did. – DrSheldon Feb 07 '23 at 21:29
  • It's far from clear that this answers the question that was asked. – Toby Speight Feb 05 '24 at 19:55