The ELI5 explanation of floating point: they approximately give you the same accuracy (in terms of bits) independently of the scale. Whether your number if much below 1, around 1, or much above 1, you can expect to have as much precision in the leading bits.
This is the key property, but internalizing it is difficult.
So between 1/2 and 1 there are the same number of numbers as between 1024 and 2048. If you have 1024 numbers between each power of 2, then each interval is 1/2048 in the first case and 1 in the second case.
I reality there are usually:
bfloat16: 128 numbers between each power of 2
float16: 1024 numbers between each power of 2
float32: 2*23 numbers (~8 million) between each power of 2
float64: 2*52 numbers (~4.5 quadrillion) between each power of 2
(keep in mind there is a subnormal range where there's implicit 0. at the beginning of the mantissa instead)
To reiterate, increasing the exponent by 1 doubles the window size, so the exponent describes how many times the window size was doubled while the number of bits of mantissa describes how many times you can do the reverse and "half" it, hence the exponent to mantissa bits relation.
For example, if you use single precision floats, then you need up to 9 digits of decimal precision to uniquely identify a float. So you would need to use a printf pattern like %.9g to print it. But then 0.1 would be output as 0.100000001, which is ugly. So a common approach is to round to 6 decimal digits: If you use %.6g, you are guaranteed that any decimal input up to 6 significant digits will be printed just like you stored it.
But you would no longer be round-trip safe when the number is the result of a calculation. This is important when you do exact comparisons between floats (eg. to check if data has changed).
So one idea I had was to try printing the float with 6 digits, then scanning it and seeing if it resulted in the same binary representation. If not, try using 7 digits, and so on, up to 9 digits. Then I would have the shortest decimal representation of a float.
This is my algorithm:
int out_length;
char buffer[32];
for (int prec = 6; prec<=9; prec++) {
out_length = sprintf(buffer, "%.*g", prec, floatValue);
if (prec == 9) {
break;
}
float checked_number;
sscanf(buffer, "%g", &checked_number);
if (checked_number == floatValue) {
break;
}
}
I wonder if there is a more efficient way to determine that shortest representation rather than running printf/scanf in a loop?I am surprised how complex the issue seems to be. I assumed there might be an elegant solution, but the problem seems to be a lot harder than I thought.
[1] Some (e.g. Windows CRT) do use the shortest representation as a basis, in which case you can actually extract it with large enough precision (where all subsequent digits will be zeros). But many libcs print the exact representation instead (e.g. 3.140000000000000124344978758017532527446746826171875 for `printf("%.51f", 3.14)`), so they are useless for our purpose.
printf("%f\n", 3.14); // 3.140000
printf("%g\n", 3.14); // 3.14> g, G: A double argument representing a floating-point number is converted in style f or e (or in style F or E in the case of a G conversion specifier), depending on the value converted and the precision. Let P equal the precision if nonzero, 6 if the precision is omitted, or 1 if the precision is zero. Then, if a conversion with style E would have an exponent of X: if P > X ≥ −4, the conversion is with style f (or F) and precision P − (X + 1). otherwise, the conversion is with style e (or E) and precision P − 1.
Note that it doesn't say anything about, say, the inherent precision of number. It is a simple remapping to %f or %e depending on the precision value.
I once wanted to find a vector for which Euler rotation (5°, 5°, 0) will result with the same vector, so I just ran a loop of million iterations or so which would take a starting vector, translate it randomly slightly (add a small random vector) and see if after rotation it's closer to original than the previous vector would be after rotation, if not, discard the change otherwise keep it. The script ran for a couple seconds on Python and with decreasing translation vector based on iteration number, I got perfect result (based on limited float precision of the vector). 2s would be terribly slow in a library but completely satisfying for my needs :D
I read a bit more about the topic, and it seems that the issue with my approach is that the decimal representation might end up exactly halfway between two floats, and then the result of parsing it depends on the rounding mode that the parser uses. (By default scanf should use round-to-even, but I'm not sure all implementations do so)
In the PostgreSQL docs I found a curious fact: They use an algorithm that makes sure the printed decimal is never exactly half way between two representable floats, so the result of scanning the decimal representation doesn't depend on the rounding mode.
[1]: https://en.wikipedia.org/wiki/Axis%E2%80%93angle_representat...
[2]: https://en.wikipedia.org/wiki/Axis%E2%80%93angle_representat...
I just tested it:
from bpy import context as C
from mathutils import Vector, Euler
from math import radians as rad
r = Euler((rad(5), rad(5), 0))
ob = C.object
ob.rotation_euler = r
ob.rotation_mode = 'AXIS_ANGLE'
a, x, y, z = ob.rotation_axis_angle
v = Vector((x, y, z))
print(v)
v.rotate(r)
print(v)
print("--")
Can be done without using an object: from mathutils import Vector, Euler
from math import radians as rad
r = Euler((rad(5), rad(5), 0))
v = Vector(r.to_quaternion().axis)
print(v)
v.rotate(r)
print(v)
print("--")Yes, just `printf("%f", ...);` will get you that.
The actual algorithms to do the float->string conversion are quite complicated. Here is a recent pretty good one: https://github.com/ulfjack/ryu
I think there's been an even more recent one that is even more efficient than Ryu but I don't remember the name.
https://en.cppreference.com/w/cpp/types/numeric_limits/max_d...
This is totally pointless when serialization to and from an unsigned integer that's binary equivalent would be perfectly reversible and therefore wouldn't lose any information.
double f = 0.0/0.0; // might need some compiler flags to make this a soft error.
double g;
char s[9];
assert(sizeof double == sizeof uint64_t);
snprintf(s, 9, "%0" PRIu64, *(uint64_t *)(&f));
snscanf(s, 9, "%0" SCNu64, (uint64_t *)(&g));
If you want something shorter, apply some sort of heuristic that doesn't sacrifice faithful reproduction of the original representation, e.g., idempotency. union {
float f;
int i;
} foo;
foo.f = 3.14;
printf(“%x”, foo.i);
that the compiler can think the assignment to foo.f isn’t used anywhere, and thus can chose not to do it.In C++, you have to use memmove (compilers can and often do recognize that idiom)
The implication is that the next biggest float is (almost) always what you get when you reinterpret its bits as an integer, and add one. For example, start with the zero float: all bits zero. Add one using integer arithmetic. In int-speak it's just one; in float-speak it's a tiny-mantissa denormal. But that's the next float; and `nextafter` is implemented using integer arithmetic.
Learning that floats are ordered according to integer comparisons makes it feel way more natural. But of course there's the usual asterisks: this fails with NaNs, infinities, and negative zero. We get a few nice things, but only a few.
A more correct version of the statement would be that comparison is the same as on sign-magnitude integers. Of course, this still has the caveats you already mentioned.
let mut left = self.to_bits() as i32;
let mut right = other.to_bits() as i32;
// In case of negatives, flip all the bits except the sign
// to achieve a similar layout as two's complement integers
left ^= (((left >> 31) as u32) >> 1) as i32;
right ^= (((right >> 31) as u32) >> 1) as i32;
left.cmp(&right)
[1] https://doc.rust-lang.org/src/core/num/f32.rs.html#1348The killer app was not Lotus 1-2-3 v2, but Turbo Pascal w/ 8087 support. It screamed through tensors and 3D spaces, places we only saw on plotters.
It was not until I got a G3 and used Graphing calculator that I could explore sombrero functions of increasing frequency.
Floating point math is essential, not an option.
It's like the paranormal trope of an expedition encountering things being disconcertingly "off" at first, and then eventually the laws of nature start breaking down as well. All because of float precision.
For example, the Assassin's Creed series.
From the wiki, far lands ("spongy walls of terrain") aren't caused by precision loss but integer overflow in the terrain generation.
Donald Knuth has all of that covered in one of his "The Art of Computer Programming" books, with estimations about the error introduced, some basic facts like a + (b + c) != (a + b) + c with floats and similar things.
And believe it or not, there have been real world issues coming out of that corner. I remember Patriot missile systems requiring a restart because they did time accounting with floats and one part of the software didn't handle the corrections for it properly, resulting in the missiles going more and more off-target the longer the system was running. So they had to restart them every 24 hours or so to keep that within certain limits until the issue got fixed (and the systems updated). There have been massive constructions breaking apart due to float issues (like material thicknesses calculated too thin), etc.
Really though, games are theater tech, not science. Double-Precision will be more than enough for anything but the most exotic use-case.
The most important thing is just to remember not to add very-big and very-small numbers together.
imagine if integer arithmetic gave wrong answers in certain conditions lol why did we choose the current compromise?
It is odd to me that every major CPU instruction set has ALU codes to indicate when these conditions have occurred, and yet many programming languages ignore them entirely or make it hard to access them. Rust at least has the quartet of saturating, wrapping, checked, and unchecked arithmetic operations.
You get surprising things with common place problems.
Signed Integer Overflow OTOH is Undefined Behavior, so it's worse.
There are other solutions they had to find. The game has two "orbital mechanics engines". One runs a physics simulation and the other is just ellipse math, making the rocket "move on rails". That's why you can't time warp under a certain altitude (that's the real physics simulation).
There are also things in rendering, how they faked the large-scale graphics of Kerbin, involving problems with the depth buffer I believe.
https://www.h-schmidt.net/FloatConverter/IEEE754.html
This one has the extra feature of showing the conversion error, but it doesn't support double precision.
However the one in OP has an amazing graph intuitively explaining the numeric space partitioning - the vertical axis is logarithmic, and the horizontal is linear for each row on its own, but rows are normalized to fit the range between the logarithmic values on the vertical axis. I guess it's obvious once you're comfortably understanding floats and could do with some explanations for those still learning it.
These types of visualizations are super useful.
[1] - https://cidr.xyz
Just learned something. Thanks.
> .EXPOSED will be utilized by registrants seeking new avenues for expression on the Internet. There is a deep history of progressivity and societal advancement resulting from the online free expressions of criticism. Individuals and groups will register names in .EXPOSED when interested in editorializing, providing input, revealing new facts or views, interacting with other communities, and publishing commentary.
Like what? A risque lingerie shop at balls.exposed or something? And new TLDs don't in any way facilitate "better search", you know, nor "information sharing".
> Along with the other TLDs in the Donuts family
Sorry, the what family?
> online identities and expression that do not currently exist.
What does this phrase even mean?
> the TLD will introduce significant consumer choice and competition to the Internet namespace – the very purpose of ICANN’s new TLD program.
"Considered harmful" etc.
> Individuals and groups will register names in .EXPOSED when interested in editorializing, providing input, revealing new facts or views, interacting with other communities, and publishing commentary.
Still not sure how "provision of legitimate goods" fits into this. Or the floating point formats, for that matter.
On the other hand linux.rocks and windows.rocks are taken (no website), vi.rocks is 200 USD/year and emacs.rocks is just 14 USD/year.
microsoft.sucks redirects to microsoft.com, but microsoft.rocks is just taken :thinking:
On that note, I've been trying to see if GoDaddy will buy a domain and resell for higher price by searching for some plausibly nice domain names on their site. They haven't took the "bait" yet.
It's sometimes fun to have these kinds of edge cases up your sleeve when testing things.
For other curious readers, these are one beyond the largest integer values that can be represented accurately. In other words, the next representable value away from zero after ±16,777,216.0 is ±16,777,218.0 in 32 bits -- the value ±16,777,217.0 cannot be represented, and will be rounded somewhat hand-wavingly (usually towards zero).
Precision rounding is one of those things that people often overlook.