-6

Here is some simple C code for a class quiz:

#include <stdio.h>

int main() {
  float a = 2.3;
  printf("%d\n", a);
  return 0;
}

Compiled and run on:

Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.5.0

The output of this code is undefined. I am trying to predict the output by inspecting the memory near a with the debugger (X command in gdb). For example, when the address of a is 0x7fff5fbffb98, then the context near &a is as follows:

0x7fff5fbffb98: 1075000115
0x7fff5fbffb9c: 0
0x7fff5fbffba0: 1606417336
0x7fff5fbffba4: 32767
0x7fff5fbffba8: -1754266167
0x7fff5fbffbac: 32767
0x7fff5fbffbb0: -1754266167
0x7fff5fbffbb4: 32767

Then the output of printf is 1606417352. I know the output when using an incorrect specifier is undefined. Out of curiosity, I expected the output of this undefined behavior to be related to some memory from the running stack or registers, but I have not figured out how to correlate it.

So which address or register is used to set the output of this printf? In other words, given the state of the running stack, and all values from all registers, can we predict (and if so how) the output of this undefined behavior?

notbad
  • 2,797
  • 3
  • 23
  • 34
  • What architecture are you compiling this on? Is it amd64? – fuz Mar 08 '16 at 07:57
  • 1
    Yes. it is a 64-bits system – notbad Mar 08 '16 at 07:58
  • 3
    Using the wrong format specifier is undefined behaviour. – Jabberwocky Mar 08 '16 at 08:00
  • 4
    Why are asking about UB? Do you understand that UB is, well, undefined. – David Heffernan Mar 08 '16 at 08:12
  • 6
    [what-happens-when-i-use-the-wrong-format-specifier](http://stackoverflow.com/questions/16864552/what-happens-when-i-use-the-wrong-format-specifier) – Frodo Mar 08 '16 at 08:29
  • It's UB - if you need to know what is going on in YOUR machine/environment when this happens, YOU should disassemble the call andor trace through it with YOUR debugger. Asking us to explain UB is pointless. – Martin James Mar 08 '16 at 08:44
  • Anyway, have a down and close vote for 'I did a wrong thing, I know I did a wrong thing, wrong things happened as a result and I want someone else to tell me why'. – Martin James Mar 08 '16 at 08:46
  • @Frodo That's not a duplicate. OPs question is not about what happens when you use a wrong formatting specifier. Stop trying to be so fixated on undefined behaviour. – fuz Mar 08 '16 at 10:04
  • @MartinJames That's often a very interesting question to ask. Why do you think trying to learn about the failure modes of a C program deserves a downvote? – fuz Mar 08 '16 at 10:06
  • @FUZxxl Since this is UB, how can anything be said about the behaviour, since it is undefined? – David Heffernan Mar 08 '16 at 10:06
  • 1
    @DavidHeffernan See my other answer to you. You can predict pretty well what happens when you call `printf` with incorrectly typed arguments and understanding what happens in this case is very useful for learning how C is implemented. – fuz Mar 08 '16 at 10:09
  • 2
    "How C is implemented?" You mean how this specific compiler implements things today? Could be different in the next version. Or in a different compiler. Or on a different platform. – David Heffernan Mar 08 '16 at 10:10
  • @DavidHeffernan You make a good point. Other platforms have different calling conventions where OPs assumptions hold. But this one (amd64) doesn't, so it's interesting to ask and understand why. – fuz Mar 08 '16 at 10:17
  • This question has been [postet on meta](http://meta.stackoverflow.com/q/318526/417501). – fuz Mar 08 '16 at 10:46
  • @zhiwenf you make wrong assumptions altogether about automatic variables. An automatic variable does not even **need** to have an address, **unless you take its address**. Thus it is the `&` operation that changes the code *altogether* already. – Antti Haapala -- Слава Україні Mar 08 '16 at 11:42
  • @DavidHeffernan actually I disagree with setting *that very* question as a duplicate, because all of the `printf`s there need to be **completely well-defined under any conforming implementations**, if `x` is of type whose default promotion is to an `int`; and all would have undefined behaviour if `x` is not promoted to `int`, and the type of `x` is not even visible in the question. – Antti Haapala -- Слава Україні Mar 08 '16 at 12:45
  • @AnttiHaapala, I use gdb X command to view the memory, it shows &a having float value 2.3. So &a should have been allocated. I expected the output of this UB related to some address in the running stack or registers, but I have not found any clue. So I asked this:) – notbad Mar 08 '16 at 18:49
  • 1
    @zhiwenf you do realize that `a` is here smaller than the Planck's constant, you cannot debug it without changing it! – Antti Haapala -- Слава Україні Mar 08 '16 at 18:53
  • 1
    @FUZxxl You asked the community for its opinions on meta and had an overwhelming response which you appear to have decided to ignore. If the response had been in your favour would you have ignored it? – David Heffernan Mar 08 '16 at 19:30
  • @David Heffernan, I know you may know "undefined" well. But don't forget many buffer overflow attacks are based on such undefined behavoirs. "I personally don't think there's a lot to be gained from looking at how UB manifests in one specific compiler at one point in time", cannot believe someone with rich coding experience and so high skackoverflow score would say this. – notbad Mar 09 '16 at 05:03
  • @zhiwenf That's one valid reason. You should have stated it in the question from the off. Bear in mind that we see hundreds of questions each week from people who don't know about UB but even after being told their code is UB, still want to reason about it. – David Heffernan Mar 09 '16 at 07:03
  • @zhiwenf Also read your original question. No mention of security research. No mention of the specific architecture. Question looked just like one of the hundreds a week from UB naiifs. What you should take away from this is the importance of writing a clear question. – David Heffernan Mar 09 '16 at 07:24

2 Answers2

9

You try to use %d for float:

d specifier is used for signed decimal integer

f specifier is used for decimal floating point

Using wrong specifier leads to Undefined behavior

You relied on address of an automatic variable:

I try to predict the output by viewing the memory near a

a is an automatic variable, its address changes every time you compile the code, so memory-near-a also changes every time you compile the code.

So, "viewing the memory near a" also causes Undefined behavior.

Solution:

You have nothing to do with Undefined behavior (in this case), so just forget it for saving time, it will make your life easier.

Van Tr
  • 5,889
  • 2
  • 20
  • 44
  • I don't think OPs question is about what formatting specifier to use. Why are you fixating so much on the wrong formatting specifier? While technically undefined behaviour, what happens when you pass a wrong formatting specifier is pretty well-defined due to the way `printf` is implemented. I don't think this answer is helpful to OP at all. – fuz Mar 08 '16 at 10:03
  • 5
    @FUZxxl "well-defined due to the way printf is implemented"? The only thing that is pretty well defined is that the behaviour is very **undefined**. – Antti Haapala -- Слава Україні Mar 08 '16 at 10:19
  • Your answer should also include the sizes of arguments (int and double, and those types whose default promotion would be to int / double) – Antti Haapala -- Слава Україні Mar 08 '16 at 10:21
  • @AnttiHaapala According to the standard, behaviour is undefined, yes. However, you can in practice predict pretty well what is going to happen in this case. OP tries to do that. I don't think all questions that try to look behind the curtains of the C machinery should be discouraged. How are people supposed to learn about the innards of the C language when everything that is not 100% well-defined is met with a “this is undefined behaviour so STFU” sledgehammer? – fuz Mar 08 '16 at 10:21
  • @FUZxxl they tagged it 'c' and 'gcc'. If they want to look behind the curtains of the C machinery, then fine, they can go ahead and trace through the functin call with their debugger on their box using code from their compiler and in their environment. Asking such a question on SO is too broad. – Martin James Mar 08 '16 at 10:28
  • @MartinJames How is this question too broad? It is well-constrainted. It gives a specific piece of code, a specific architecture and asks a specific question. Just because you feel uncomfortable around things that are not part of ISO 9899, doesn't mean you should discourage others from taking a peak. If we were that strict, we had to forbid all questions about dynamic linking as well as `dlsym()` must be used in technically undefined ways. – fuz Mar 08 '16 at 10:30
  • I didn't consider that OP don't know what specifier should use with float. My answer tried to show him that use the not-`f`-specifier for float causes things that are undefined. I also point that the address of automatic variable shouldn't be relied on. – Van Tr Mar 08 '16 at 10:45
  • I used GDB (X command) to view the memory. So you can see the right context each time regardless the changing of &a. I know that is undefined for something using C, but I don't think that is undefined for a specific complier or even the complier developer. – notbad Mar 08 '16 at 17:07
  • @zhiwenf I could only say "the right context each time regardless the changing of &a" are UB and it's same for any C compiler even in C compiler for embedded system ( that may be used in your case). – Van Tr Mar 08 '16 at 17:11
  • @zhiwenf yes, command x examine memory but it does not make the automatic variable memory (which is stack memory) be defined. – Van Tr Mar 08 '16 at 17:20
  • @IlDivinCodino, I know the value of the address is undefined. But the meaning of each value is defined.I mean, when you know the start of the stack, the call back address, the parameter's address (maybe some register) could be got for a specific compiler. Or if you are given the address of a, you may know what &a+1 is for. – notbad Mar 08 '16 at 18:01
  • Sure, the compiler usually does know what it did put in certain address, and so does the linker. But the C standard does not define its behaviour. It specifically is written so that an automatic variable whose address is never taken with `&` may not be in memory. It can be in a register. It also does not say that `&a + 1` is a valid address - on the contrary - the **behaviour** is undefined if you access `&a + 1`. There need not be any memory mapped there. When we talk about undefined behaviour it does not mean that "the value of the address is not known". That is "unspecified". – Antti Haapala -- Слава Україні Mar 08 '16 at 19:08
  • @zhiwenf I wrote an answer that explains exactly what you want to know, but it got deleted out of ignorance and because I tried to answer your question instead of correcting you. – fuz Mar 08 '16 at 19:08
  • @FUZxxl I will be glad to see your answer again. – notbad Mar 08 '16 at 19:16
  • @zhiwenf You'll have to wait. I'm currently trying to get it undeleted, but that's unlikely to happen. I'm not going to forsake proper procedures and post the answer again, after all, I don't want to risk getting banned. You could ask the same question on [/r/C_Programming](http://reddit.com/r/C_programming) and I will provide my answer there again. – fuz Mar 08 '16 at 19:18
  • 2
    @FUZxxl Your answer was deleted and downvoted because you did not mention UB. Had you done so it would have been upvoted and not deleted. Ignorance wasn't the issue and it is rude of you to suggest otherwise. There was no need to visit meta when the comments here had explained everything. Perhaps a little listening and understanding is needed? – David Heffernan Mar 08 '16 at 19:37
  • @DavidHeffernan Why should I point out the undefined behaviour when OP is clearly aware of it and (by now) states in the question that he is aware of the undefined behaviour? That's just condescending and useless as it adds nothing to the answer. – fuz Mar 08 '16 at 20:26
  • 1
    We already explained this many times. I'm not going to repeat what I and so many others have said. Let us respectfully agree to disagree. – David Heffernan Mar 08 '16 at 20:28
8

On AMD64 with the SysV calling convention (used by nearly every system but Windows), the first few arguments to a function are passed in registers. That's why you don't see them on the stack: They aren't passed on the stack.

Specifically, the first few integer or pointer arguments are passed in rdi, rsi, rdx, whereas the first few floating point arguments are passed in xmm0, xmm1, and xmm2. Since a is passed in xmm0 but printf attempts to read a number from rsi, you won't see any correlation between the number you supplied and what is printed out.


For future readers: Please note that what OP attempts to do is undefined behavior. ISO 9899:2011 specifies that an int should be passed for %d, but OP is trying to use it with a double (after default argument promotions). For that, OP should use %f instead. Using the wrong formatting specifier is undefined behaviour. Please do not assume that the observations OP make hold on your system or anywhere and don't write this kind of code.

fuz
  • 88,405
  • 25
  • 200
  • 352
  • 1
    Windows x64 ABI calling convention passes params in registers. I don't understand why you omit to mention that the code exhibits undefined behaviour. – David Heffernan Mar 08 '16 at 08:11
  • @DavidHeffernan the point is that different registers are used for integer than floating, which explains the nature of OP's observations – M.M Mar 08 '16 at 09:21
  • 2
    @M.M I understand all of that, I was just pointing out that the Windows x64 ABI also uses a register calling convention similar to the SysV calling convention. One might read the first paragraph and think that the Windows calling convention passed all parameters on the stack. – David Heffernan Mar 08 '16 at 09:24
  • @DavidHeffernan OPs question is not about undefined behaviour. OP seems to try to use the `printf` function to read out the stack. For experimentation, this is perfectly fine. Your fixation on undefined behaviour is completely missing the point of OPs question. It is not about using the right or wrong formatting specifier. – fuz Mar 08 '16 at 10:02
  • 4
    I wouldn't say it was a fixation. I personally don't think there's a lot to be gained from looking at how UB manifests in one specific compiler at one point in time. – David Heffernan Mar 08 '16 at 10:05
  • @DavidHeffernan If you don't think there is much gained, why don't you go and answer a different question where you think knowledge is gained? Trying to do the wrong thing and then understanding how things break is often a very good way to learn how things work. Why does that deserve down- and close votes? – fuz Mar 08 '16 at 10:07
  • 7
    I can't help you with the reasons why other people voted, not having voted here. My understanding is that undefined behaviour is, well, undefined. – David Heffernan Mar 08 '16 at 10:09
  • @DavidHeffernan Undefined by the standard. That doesn't mean that you can never predict what's going to happen. And as I said above, it's interesting to understand why your predictions (as is the case with OP) are wrong. – fuz Mar 08 '16 at 10:10
  • 3
    I still don't understand why you didn't make it explicit, front and centre, that this is UB. You can then go on to explain what is happening with this compiler and this ABI. Surely you need to point out to the asker that this is UB. It's not at all clear that asker even realises that. – David Heffernan Mar 08 '16 at 10:12
  • @DavidHeffernan I'm pretty sure OP knows that this is undefined behaviour and it's pretty clear that OP knows that: “I try to predict the output by viewing the memory near a.” He wouldn't write that if he expected this construct to be well-defined. My job is not to be some sort of undefined behaviour police. While it is acceptable to remind beginners of undefined behaviour, I expect proficient C programmers to understand what undefined behaviour comprises. There is no need to point that out every time. – fuz Mar 08 '16 at 10:15
  • 3
    So please do state it at the top of your answer that the **behaviour is undefined**, and that it must not be relied on; however we can use this particular implementation/ABI as an example **why** the behaviour is undefined and what benefits can be accomplished by not assuming anything about the behaviour of the implementation. – Antti Haapala -- Слава Україні Mar 08 '16 at 10:26
  • @AnttiHaapala See my previous comment. I'm pretty sure OP knows exactly that this is undefined behaviour, this is clear from how his question is worded. I don't think I have to do the this-is-undefined-behaviour-so-do-not-make-any-assumptions spiel every time, especially not when OP seems to be somewhat experienced in C. – fuz Mar 08 '16 at 10:28
  • I am just trying to point out to you how to make this answer an exemplary answer, and get score greater than the other answer – Antti Haapala -- Слава Україні Mar 08 '16 at 10:33
  • @AnttiHaapala I refuse to add that remark. I don't think it is needed. – fuz Mar 08 '16 at 10:33
  • 6
    @FUZxxl why would you refuse? Do keep in mind that this answer might be read by many a beginning C coder, who would misunderstand your wording to mean that you actually can perhaps somehow rely on said behaviour. In its current form this answer is a bit dangerous even, though correct. – Ilja Everilä Mar 08 '16 at 10:36
  • 1
    @Ilja said it right. SO isn't just about talking to the one person that asked the question. Questions get read by many other readers. In this case, the downvotes mean it probably won't see much future readership, but the principle is there. – David Heffernan Mar 08 '16 at 10:38
  • This question has been [postet on meta](http://meta.stackoverflow.com/q/318526/417501). – fuz Mar 08 '16 at 10:47
  • Just FYI, this is what I think a good answer looks like. After editing it thus, I have cast the final vote to undelete and converted my downvote into an upvote. I will admit that I'm kind of putting words in your mouth, but this is honestly how I understand your position from the Meta discussion, the comments here, and elsewhere. If you are deadset against mentioning that this is UB, then feel free to roll back my edits. But please consider carefully whether you want to stand on principles here, or whether you want the answer to be maximally helpful to all parties. – Cody Gray - on strike Mar 09 '16 at 01:46
  • @CodyGray Thank you for adding that remark. I edited it a little and put it below the answer as I consider it to be not useful for OP but I see the value of this remark for future readers who don't read the full question. I hope this is an amicable resolution. – fuz Mar 09 '16 at 02:32
  • 1
    When I check the register, it indeed reads from register rsi. Thanks a lot:) – notbad Mar 09 '16 at 05:53
  • 1
    Nice that we got here in the end. – David Heffernan Mar 09 '16 at 07:05