0

Why does the following code work?

#include <stdio.h>
#define LEN 12

typedef struct
{
    char buffer[LEN];
} string;

int main()
{
    char buffer1[LEN] = "Hello World";
    char buffer2[LEN];

    *(string*)buffer2 = *(string*)buffer1;

    printf("%s",buffer2);
    return 0;
}

As far as I understand, I cannot assign one array to another.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
cheroky
  • 755
  • 1
  • 8
  • 16
  • 2
    It "works" because you assign structures, which is allowed and effectively equivalent to the `memcpy` operation. The last sentence is lost in translation. – Eugene Sh. Jun 22 '16 at 15:51
  • An expected output can be the result of undefined behavior. Simply put, you code doesn't 'work'. – 2501 Jun 22 '16 at 15:52
  • Also note that It "works" because the struct has only one field of type char array. Try to add another field in the structure. – terence hill Jun 22 '16 at 15:56
  • @Eugene Sh: the assignment is equivalent to memcpy? – cheroky Jun 22 '16 at 15:59
  • @2501 - where is the UB? – pm100 Jun 22 '16 at 16:00
  • @cheroky The assignment of a *structure* is equivalent to `memcpy` with the `sizeof` of that structure. – Eugene Sh. Jun 22 '16 at 16:01
  • @2501 Actually I am failing to point at the concrete UB cause here. I think the aliasing is legal here.. – Eugene Sh. Jun 22 '16 at 16:02
  • @EugeneSh. As it has been said many times, char can alias any type but not vice-versa. (I think I had this conversation with you before, if I'm not mistaken.) – 2501 Jun 22 '16 at 16:03
  • 1
    @pm100 `string` cannot alias `char[LEN]`. The types are not compatible => ub. – 2501 Jun 22 '16 at 16:05
  • @EugeneSh. "The assignment of a `structure` is equivalent to `memcpy` with the `sizeof` of that structure." No, it is not. Per **6.2.6.1 General**, paragraph 6 of the [C Standard](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf): *When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.* The footnote to that - footnote 51 - specifically states: *Thus, for example, structure assignment need not copy any padding bits.* – Andrew Henle Jun 22 '16 at 16:12
  • @2501 Yeah, that was me. But I must admit I am still getting confused with it :) – Eugene Sh. Jun 22 '16 at 16:13
  • @2501: Sort of. If the `char[LEN]` originated from a `string`, then it's safe to cast it back (it didn't here, but just saying). Whether that is an aliasing problem or some other problem is debatable, however. – Tim Čas Jun 22 '16 at 16:17
  • @TimČas Yes that would be fine, but that is a different case. They might be techically allowed to alias, but I think this is a defect and wasn't intended behavior. There are other problems, padding, alignment that make this ub. See: https://stackoverflow.com/questions/17384918/can-a-struct-alias-its-own-initial-and-only-member – 2501 Jun 22 '16 at 16:25
  • @2501: Not saying you're wrong generally, but it's definitely not padding & alignment that would make it UB (since the struct would have the same alignment as its first datamember, and the same padding as its only datamember; without that guarantee, you couldn't cast from `string` to `char[LEN]` in the first place, and that *is* ***explicitly*** allowed!). – Tim Čas Jun 22 '16 at 23:38
  • @TimČas C Standard doesn't guarantee that there is no padding in the struct, and it doesn't guarantee identical alignment for different types. The struct in this case may have different alignment requirements and may have padding. – 2501 Jun 23 '16 at 05:04
  • @2501: Sure, but there is *never* padding at the beginning. And because casting `string` to `char[LEN]` *is* allowed, it also needs to have compatible alignment requirements. See C99 (or C11) 6.5p7 and 6.7.2p13. – Tim Čas Jun 23 '16 at 10:15
  • @TimČas Padding may be at the end. 6.5p7 doesn't prohibit padding. Assuming that it does is not correct. 6.7.2p13 doesn't exist. You probably meant 6.7.2.1p13? The struct may have stricter alignment requirements. Anyway see the link I posted why you shouldn't do this in real code. If you're interested from a language perspective I have stated my opinion two comments up. – 2501 Jun 24 '16 at 06:40
  • @2501: Padding at the end does not influence accesses to the first (in this case only, but that's irrelevant) member of the struct. Only padding at the start of the struct matters (which is always `0`). – Tim Čas Jun 24 '16 at 11:23
  • @TimČas Padding may be copied, since the array doesn't have padding, but is copied as a struct, copying reads and writes bytes out of bounds. – 2501 Jun 24 '16 at 11:29

3 Answers3

4

C permits assigning one struct to another of the same type, and the semantics of doing so are defined in terms of the struct representation, as opposed to a member-by-member copy. That a struct representation encompasses the representation of an array does not mean that assigning the value of a struct that contains an array to another struct violates C's prohibition against assigning one array to another.

C furthermore guarantees that the address of the first member of a struct is the same as the address of the struct itself, and it permits object pointers to be cast among different pointer types. In that case, the result of such a conversion is not guaranteed to be correctly aligned, and if it is not then dereferencing that result produces undefined behavior.

On the other hand, the compiler is free to include trailing padding in struct representations. Oftentimes that is done for alignment purposes. If your compiler does that for your struct -- which it likely will do if it applies a 64-bit or greater alignment requirement to it -- then your assignment produces undefined behavior. In that case, if it appears to work then that's because you got lucky.

If, however, it turns out that neither of the above sources of undefined behavior applies, then it is indeed reasonable to expect the code to work as expected. Inasmuch as it is tricky to predict whether that will be the case, however, you would be well advised to avoid code like this.

A better question might be why C disallows array copying. There are likely several reasons, but I think the deepest one is simply for consistency. In almost all contexts, when an an expression or sub-expression evaluates to an array, that value decays to a pointer to the first array element. That includes the subexpressions constituting the operands to an = operator. So in normal C expressions, an array assignment would actually be a pointer assignment, and one that did not have the intended semantics. One would have had to carefully craft an appropriate special case for that situation in order to allow for array assignment.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
0

As pointed out in the comments you are not assigning an array to another but you are assigning a struct to another by means of the cast. Assigning a struct result in copying all the elements (in case of a simple struct, see this post). The code however it's really bad and you can break it by simply adding another field to the struct.

Community
  • 1
  • 1
terence hill
  • 3,354
  • 18
  • 31
0

From your comment "As far as I understand, I cannot assign one array to another." it seems that you created the struct to circumvent the inability to copy one string to another by casting your buffers to the struct in assignment.

This happened to work this time, but it won't always work; structs often have padding and alignment bits between fields.

If you want to copy one string to another in C, use the strcpy or strncpy functions (or, in the case of wchar_t type strings, wcscpy and wcsncpy).

If you want to copy one array of an arbitrary type to another, use a for loop where you copy each index one by one, or the memcpy function.

Govind Parmar
  • 20,656
  • 7
  • 53
  • 85
  • `(you can't even be sure that there won't be padding before the first variable in a struct)` yes, you can be sure, there are no padding bytes before the first member of a `struct` ( not the downvoter) – David Ranieri Jun 22 '16 at 16:09
  • 1
    I'd be inclined to recommend `memcpy()` over an element-by-element `for` loop. – John Bollinger Jun 22 '16 at 16:13
  • @JohnBollinger ha that slipped my mind... edited my post – Govind Parmar Jun 22 '16 at 16:15
  • I beg to differ, I think this particular piece of code will always print "Hello World" because you are laying down the bits after a cast to `(* string)`, which will always copy the 12 bytes to the address pointed by `buffer2`. – babon Jun 22 '16 at 16:15
  • @babon Yes but John's answer goes into detail about why this is a bad practice in general; using the standard C library functions is better – Govind Parmar Jun 22 '16 at 16:18
  • Yes, agreed. Writing code like this simply makes life harder for everyone. – babon Jun 22 '16 at 16:21
  • @babon, in practice, I think most C systems would produce a binary that prints "Hello World". You cannot rely on that, however, because by definition, you cannot reason about undefined behavior. – John Bollinger Jun 22 '16 at 16:34