Questions tagged [floating-point]

Mathematical questions concerning floating point numbers, a finite approximation of the real numbers used in computing.

465 questions
58
votes
1 answer

Show that floating point $\sqrt{x \cdot x} \geq x$ for all long $x$.

I verified experimentally that in Java the equality Math.sqrt(x*x) = x holds for all long x such that x*x doesn't overflow. Here, Java long is a $64$ bit signed type and double is a IEEE binary floating point type with at least $53$ bits mantissa…
maaartinus
  • 1,401
8
votes
4 answers

Why is $3 \times 0.3 = 0.8999999999999999$ in floating point?

Can anyone please help me explain this fact? I tried to read some articles on the web about floating point but it is always a hard topic for me to understand. This is what I get from Python 3.3.0 A brief to medium explanation is enough. It is…
user71346
  • 4,171
6
votes
3 answers

What is the maximum difference between two successive real numbers in the given floating point representation?

The following is a scheme for floating point number representation using 16 bits. Sign :- Bit 15 Exponent:-Bit 14-9 Mantissa :- Bit 8-0 Let $s, e,$ and $m$ be the numbers represented in binary in the sign, exponent, and mantissa fields,…
PleaseHelp
  • 761
  • 8
  • 29
6
votes
1 answer

What is the most significant digit?

What is the most significant digit of $$0.00234$$ I have a problem of figuring out where it is $0$ or $2$.
Zonik
  • 183
5
votes
1 answer

How to calculate floating point numbers?

Here are two locations in memory: 0110 | 1111 1110 1101 0011 0111 | 0000 0110 1101 1001 Interpret locations 6 (0110) and 7 (0111) as an IEEE floating point number. Location 6 contains bits [15:0] and location 7 contains bits…
5
votes
2 answers

Complex formula that is equivalent to $f(x) = x$

In a computer you can't store any real number that you want to because in the $[0.0;1.0]$ interval there are infinite numbers and computer memory is finite. I want to show it in examples, which is why I need some formulas that do some complex…
TIKSN
  • 153
5
votes
1 answer

Associativity in floating point arithmetic failing by two values

Assume all numbers and operations below are in floating-point arithmetic with finite precision, bounded exponent, and rounding to the nearest integer. Are there $x,y$ positive such that $$\begin{align}(x+y)-x&>y\\(x+y)-s(x)&>y\end{align}$$ where…
EEE
  • 111
4
votes
1 answer

Why does $\left(1 + \frac{1}{n}\right)^n$ give vastly different relative errors when $n=252257928$ and $n = 215450934$?

This expression $\left(1 + \frac{1}{n}\right)^n$ approximates $e^1$. When $n = 252257928$, the relative error, $(e - \text{result})/e$, is $1.740557727387924\mathrm{e-}12$ When $n = 215450934$, the relative error is…
4
votes
2 answers

What is the correct way to round 331.449999 to 1 decimal place

Should 331.449999 be 331.4 or 331.5? I can see a issue with a programming framework I am using. I think I am getting erroneous results in some cases and wanted to make sure I am using the right math results, before I raise a bug for it. In the…
Akamad007
  • 149
  • 3
4
votes
2 answers

The upper and lower limits of IEEE-754 standard

So there's something I just can't understand about ieee-754. The specific questions are: Which range of numbers can be represented by IEEE-754 standard using base 2 in single (double) precision? Which range of numbers can be represented by…
Koy
  • 877
  • 1
  • 6
  • 13
3
votes
2 answers

How to determine a number closest to a given number in floating point.

I am learning to program and came across the interesting byproduct of computer number representation shown by $$0.1+0.2= 0.30000000000000004$$ In trying to understand why this occurs, one has to understand how 0.1 is stored on the computer. I…
floater
  • 33
2
votes
2 answers

Representing $-2.5$ as a floating point number

I am trying to understand the following: For example the number $-2.5 = -1*1.25*2^1$ is strored as: $S = 1$, Exponent $= 1+127 = 128$, Mantissa = $0.25 $ This got me. How do you connect all these numbers to yield $-2.5$ ?
oz123
  • 151
2
votes
2 answers

How can I simplify this lerp arithmetic to avoid floating point precision errors?

I am making a "cinematic camera" that pans around a scene. Each step has a position for the camera and a point that the camera should be looking at. For instance: 1. position = (3, 2, 1), facing = (0, 0, 0) 2. position = (6, 5, 4), facing = (9, 9,…
Misys
  • 43
2
votes
1 answer

Floating-point systems: Is the mantissa the whole thing or just the "fraction" part after the decimal?

In the context of floating-point systems, our numerical analysis book defines the terms mantissa and fraction as follows: I am unable to find any consistent definition of the terms "mantissa" and "fraction" online. Wolfram defines a mantissa to…
Aleksandr Hovhannisyan
  • 2,983
  • 4
  • 34
  • 59
2
votes
1 answer

How do 24 significant bits give from 6 to 9 significant decimal digits?

was reading IEEE 754 single-precision binary floating-point format: binary32 when I ran into The IEEE 754 standard specifies a binary32 as having: Sign bit: 1 bit Exponent width: 8 bits Significand precision: 24 bits (23 explicitly stored) This…
user87870
1
2 3 4