Mathematical questions concerning floating point numbers, a finite approximation of the real numbers used in computing.
Questions tagged [floating-point]
465 questions
58
votes
1 answer
Show that floating point $\sqrt{x \cdot x} \geq x$ for all long $x$.
I verified experimentally that in Java the equality
Math.sqrt(x*x) = x
holds for all long x such that x*x doesn't overflow. Here, Java long is a $64$ bit signed type and double is a IEEE binary floating point type with at least $53$ bits mantissa…
maaartinus
- 1,401
8
votes
4 answers
Why is $3 \times 0.3 = 0.8999999999999999$ in floating point?
Can anyone please help me explain this fact? I tried to read some articles on the web about floating point but it is always a hard topic for me to understand.
This is what I get from Python 3.3.0
A brief to medium explanation is enough. It is…
user71346
- 4,171
6
votes
3 answers
What is the maximum difference between two successive real numbers in the given floating point representation?
The following is a scheme for floating point number representation using 16 bits.
Sign :- Bit 15
Exponent:-Bit 14-9
Mantissa :- Bit 8-0
Let $s, e,$ and $m$ be the numbers represented in binary in the sign,
exponent, and mantissa fields,…
PleaseHelp
- 761
- 8
- 29
6
votes
1 answer
What is the most significant digit?
What is the most significant digit of
$$0.00234$$
I have a problem of figuring out where it is $0$ or $2$.
Zonik
- 183
5
votes
1 answer
How to calculate floating point numbers?
Here are two locations in memory:
0110 | 1111 1110 1101 0011
0111 | 0000 0110 1101 1001
Interpret locations 6 (0110) and 7 (0111) as an IEEE floating point number.
Location 6 contains bits [15:0] and location 7 contains bits…
Patrick Stevens
- 121
5
votes
2 answers
Complex formula that is equivalent to $f(x) = x$
In a computer you can't store any real number that you want to because in the $[0.0;1.0]$ interval there are infinite numbers and computer memory is finite. I want to show it in examples, which is why I need some formulas that do some complex…
TIKSN
- 153
5
votes
1 answer
Associativity in floating point arithmetic failing by two values
Assume all numbers and operations below are in floating-point arithmetic with finite precision, bounded exponent, and rounding to the nearest integer.
Are there $x,y$ positive such that $$\begin{align}(x+y)-x&>y\\(x+y)-s(x)&>y\end{align}$$
where…
EEE
- 111
4
votes
1 answer
Why does $\left(1 + \frac{1}{n}\right)^n$ give vastly different relative errors when $n=252257928$ and $n = 215450934$?
This expression $\left(1 + \frac{1}{n}\right)^n$ approximates $e^1$.
When $n = 252257928$, the relative error, $(e - \text{result})/e$, is $1.740557727387924\mathrm{e-}12$
When $n = 215450934$, the relative error is…
Joshua Leung
- 555
4
votes
2 answers
What is the correct way to round 331.449999 to 1 decimal place
Should 331.449999 be 331.4 or 331.5?
I can see a issue with a programming framework I am using. I think I am getting erroneous results in some cases and wanted to make sure I am using the right math results, before I raise a bug for it.
In the…
Akamad007
- 149
- 3
4
votes
2 answers
The upper and lower limits of IEEE-754 standard
So there's something I just can't understand about ieee-754.
The specific questions are:
Which range of numbers can be represented by IEEE-754 standard using base 2 in single (double) precision?
Which range of numbers can be represented by…
Koy
- 877
- 1
- 6
- 13
3
votes
2 answers
How to determine a number closest to a given number in floating point.
I am learning to program and came across the interesting byproduct of computer number representation shown by $$0.1+0.2= 0.30000000000000004$$
In trying to understand why this occurs, one has to understand how 0.1 is stored on the computer. I…
floater
- 33
2
votes
2 answers
Representing $-2.5$ as a floating point number
I am trying to understand the following:
For example the number $-2.5 = -1*1.25*2^1$ is strored as:
$S = 1$, Exponent $= 1+127 = 128$, Mantissa = $0.25 $
This got me. How do you connect all these numbers to yield $-2.5$ ?
oz123
- 151
2
votes
2 answers
How can I simplify this lerp arithmetic to avoid floating point precision errors?
I am making a "cinematic camera" that pans around a scene. Each step has a position for the camera and a point that the camera should be looking at. For instance:
1. position = (3, 2, 1), facing = (0, 0, 0)
2. position = (6, 5, 4), facing = (9, 9,…
Misys
- 43
2
votes
1 answer
Floating-point systems: Is the mantissa the whole thing or just the "fraction" part after the decimal?
In the context of floating-point systems, our numerical analysis book defines the terms mantissa and fraction as follows:
I am unable to find any consistent definition of the terms "mantissa" and "fraction" online. Wolfram defines a mantissa to…
Aleksandr Hovhannisyan
- 2,983
- 4
- 34
- 59
2
votes
1 answer
How do 24 significant bits give from 6 to 9 significant decimal digits?
was reading IEEE 754 single-precision binary floating-point format: binary32 when I ran into
The IEEE 754 standard specifies a binary32 as having:
Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 24 bits (23 explicitly stored)
This…
user87870