-1

I am a beginner and I am curious about char in C. I know that char takes only one character. If I inline initialize char like the below code:

#include <stdio.h>

int main() {
    char c2 = 'ab';
    printf("c2: %c\n", c2);

    return 0;
}

It will output b. My first question is why b and why not a?

Again if I take input form console like the below code:

#include <stdio.h>

int main() {
    char c3;
    scanf("%c", &c3);
    printf("c3: %c\n", c3);
    
    return 0;
}

Now if I input ab then it will output a. Why now not giving output b as the first one (inline initialization)?

Please someone explain why this different behaviour is performing?

  • 2
    A `char` is a single character. `'ab'` is two characters. – AKX Aug 31 '22 at 06:11
  • 1
    `char` only holds 1 character. Why do you think you can assign `'ab'` to a single character? – Barmar Aug 31 '22 at 06:11
  • 2
    2 characters causes undefined behavior. That means anything can happen, so why not `b`? – Barmar Aug 31 '22 at 06:11
  • Did you compiler show some warning about using a "multi character integer constant"? – Gerhardh Aug 31 '22 at 06:12
  • 3
    @Barmar isn't that implementation defined behaviour how multi character integer constants are evaluated? – Gerhardh Aug 31 '22 at 06:13
  • 1
    @Mahbubul Hasan: turn your compiler's warning level up to 11 ! – Mitch Wheat Aug 31 '22 at 06:14
  • 2
    @Gerhardh Maybe, I can't always remember what's undefined and what's implementation-defined. Either way, there's nothing in the standard that says it can't be `b`. – Barmar Aug 31 '22 at 06:14
  • 1
    With your call `scanf("%c", &c3)` you will read one character sequentially. It will simply take the very first character in the input buffer. If there are two (or more) characters in the input buffer, the remaining will be left for the future. – Some programmer dude Aug 31 '22 at 06:15
  • Read https://stackoverflow.com/questions/31335472/assigning-more-than-one-character-in-char – Support Ukraine Aug 31 '22 at 06:28
  • 1
    It may be semantically dubious code, but it is nonetheless compilable and reproducible, and deliberate, so I cannot agree with the reason for closure. IMO the question should stand. – Clifford Aug 31 '22 at 18:21

3 Answers3

3

You can't fit 'ab' in a char, and trying to assign 'ab' to a char truncates it to the lower byte, 'b'.

Please, please turn your compiler warnings on and heed them; you can see these errors on Godbolt:

<source>: In function 'main':
<source>:4:15: warning: multi-character character constant [-Wmultichar]
    4 |     char c2 = 'ab';
      |               ^~~~
<source>:4:15: warning: overflow in conversion from 'int' to 'char' changes value from '24930' to '98' [-Woverflow]

98 is the ASCII value for b:

>>> chr(98)
'b'

For your second program, if you input 'ab', scanf with %c consumes 1 character, 'a'. (Other input remains in the input buffer and would be consumed by subsequent similar scanf calls.)

AKX
  • 152,115
  • 15
  • 115
  • 172
2

Compiling the first example may compile with warnings. e.g in gcc:

main.c: In function ‘main’:
main.c:4:15: warning: multi-character character constant [-Wmultichar]
    4 |     char c2 = 'ab';
      |               ^~~~
main.c:4:15: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘24930’ to ‘98’ [-Woverflow]

Since:

The value of an integer character constant containing more than one character (e.g., 'ab'), [...] is implementation-defined.

The warnings also suggests the implementation behaviour: 24930 = 0x6162 where 0x62 is ASCII 'b'. So the implementation behaviour is a matter of byte-order, then assignment of an int to a char, which assigns the least significant byte.

The behaviour of the second example is no surprise - input is not assignment or initialisation, and it is not a C language behaviour, but a system I/O behaviour. Input is buffered and requesting a single character input will take the first character from the first-in-first-out (FIFO) input queue. The first character when you enter 'a' followed by 'b' is of course 'a'.

A second input request would retrieve the 'b' whereas in the initialisation example the 'b' is simply discarded by the assignment. Comparing I/O behaviour with C language assignment behaviour are not at all comparable. The input is a FIFO queue of single characters, whereas 'ab' is an int (in this implementation).

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • 1
    I don't think the implementation-defined part has anything to do with the CPU byte order, but simply how the compiler decided to implement it. When I'm trying out the OP's code on gcc/PowerPC it results in `li 4,98`, 98 being `'b'`. Same result as for gcc/x86. – Lundin Aug 31 '22 at 06:52
  • @Lundin Sure, that is not what I think I said. I was referring to the byte order the implementation packs the multibyte constant, not the architecture's byte order. The implementation _in this case_ placed the second character in the LSB. One could imagine that the obvious implementation is to place the first character in the lower address and the second in the higher - like a byte array, then on a big-endian machine, the second is in the LSB. But an implementation might do something else, if for example it wants the behaviour to be identical regardless of the natural machine byte order. – Clifford Aug 31 '22 at 10:19
  • Not only is the value of a character constant containing more than one character implementation-defined, but so is the result of converting an `int` to `char`, if `char` is signed and the source value is not representable. – Eric Postpischil Aug 31 '22 at 12:11
  • The fact that when the user inputs “ab”, “a” is read first is a C language behavior. Per the C standard, a stream is an ordered sequence of characters. – Eric Postpischil Aug 31 '22 at 12:13
  • @EricPostpischil, sure, the standard specifies that as required behaviour, but the underlying I/O system is independent of any language, and it would be an insane I/O system that presented data out of order. The point is the standard _requires_ that behaviour of the system, it is not the source or cause of that behaviour, other than it may not arbitrarily re-order the stream. It is a somewhat pedantic distinction nonetheless. It is also a requirement of the standard library rather than the language, C does not have an intrinsic I/O system. Also a pedantic distinction perhaps. – Clifford Aug 31 '22 at 18:13
2

This is compiler-specific, so called implementation-defined behavior.

From the C standard 6.4.4.4/2:

An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'.

Then 6.4.4.4/10, which happens to use the very same example as you:

An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined. If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.


In the second example, the first character you place in stdin is the one read and the other is ignored. This has nothing to do with character constants but is simply how scanf works.

Lundin
  • 195,001
  • 40
  • 254
  • 396