Floating point numbers, real numbers and machine precision

Question

My math-book states that when a real number $x$ is replaced by a floating-point number $F(x)$ then the error between the two is: $|error|=|x-F(x)|\le\epsilon |x|$

Now the book asks:

Consider two different, but mathematically equivalent expressions, having the value $C$ after evaluation. If we suspect that the computer satisfactorily evaluates the expressions for many input values within an interval, all to within machine-precision, why might we expect the difference of these expressions on a computer to have an error contained within an interval $[-\epsilon C,\epsilon C]$?

In the answer it is claimed that the computer will generate two numbers for both expressions: $C(1+\alpha)$ and $C(1+\beta)$ where $\alpha$ and $\beta$ are both positive and less than $\epsilon$.

Therefore indeed: $C(1+\alpha)-C(1+\beta)=C(\alpha-\beta)$ where $|\alpha-\beta|<\epsilon$

QUESTION: I don't see why $\alpha$ and $\beta$ have to be positive. This suggests that the computer would always generate a number that is higher than the real number. Is this true? Indeed the statement $|\alpha-\beta|<\epsilon$ is only true when both are positive and less than $\epsilon$. So therefore again: is it true that $\alpha$ and $\beta$ have to be positive? If not, does the statement that the error has to be contained within an interval $[-\epsilon C,\epsilon C]$ still hold true, possibly because of another reason?

This is a badly-worded question, but your observation seems right. Let $C=1$ and $\epsilon=0.0001$. Then expressions $e_1$ and $e_2$ both mathematically equivalent to $1$ could be evaluated by computer as $0.9999$ and $1.0001$ within machine precision. The difference between the computer’s evaluation of the difference (maybe $0.0002$?) and the mathematical difference of $0$ is outside $[-0.0001,+0.0001]. (I think that’s “the difference” you are supposed to be considering, but I don’t know, and I don’t know what the assumption about “many input values within an interval” means, though.) — Steve Kass, Dec 06 '16 at 16:22

score 1 · Answer 1 · answered Dec 08 '16 at 16:43

It is certainly true that the computed value could be less than the true value, so $\alpha$ and/or $\beta$ could be negative. It is not clear what "satisfactorily evaluates" means. Does it just mean executing the code properly? Does it mean that the result is within $[C(1-\epsilon),C(1+\epsilon)]?$ You can have subtractive cancellation that makes the relative error much larger than $\epsilon$. Suppose I want to compute $10^{-50}$. I can just type that into the computer and probably get an answer with fractional error (about) $\epsilon$. I could also ask for $1- (1-10^{-50})$. These expression are mathematically equivalent, but I suspect the numeric error on the second will be much higher. In standard $64-$bit arithmetic there is not enough precision for this.

Floating point numbers, real numbers and machine precision

1 Answers1