> A 16 bit posit in a near-ideal situation matches an 18 bit IEEE float Unsure w...

Dylan16807 · 2026-05-26T14:28:00 1779805680

> A posit16 has up to 11 bits of precision.

Is this excluding the implied bit?

In that case a short float has 10, but if you're messing with formats you can staple on an extra bit of precision and an extra bit of exponent.

> There's no such thing as an 18 bit IEEE float.

There's a lot of custom sizes out there. But if you keep following IEEE rules then there's no special circuitry needed, just a small scaling factor.

nVidia also laid out a 19 bit format that's a superset of both fp16 and bf16.

> Many papers have compared neural networks at 16 bits or 8 bits, and posits beat the hell out of floats and it's not even close.

Can you link a paper that shows posits beating floats at different sizes?

I found a 2021 paper that compares various posits to 32 bit floats, and finds that the model quality is close for some of them. It does not compare any smaller floats.

> Which is very much expected. As they're particularly suited to this task.

Posits show their value when you need a huge exponent range and your numbers focus very closely around 1. How strongly do neural nets fit that pattern?

And how often is their advantage better than 1 or 2 bits?

If you can keep your weights within a range of 9 orders of magnitude, I expect fp16 to do just fine since it loses a bit on some numbers and gains a bit or two on other numbers.

> But also in other domains, like numerical weather simulations, where tests have shown 16-bit posits can replace 32-bit floats.

Can you link this too? I found a 2019 paper that shows them beating fp16 and falling short of fp64, but no fp32 comparison. They also noted that 16,0 posits and bf16 did badly.

They did conclude that 16 bit posits were probably good enough to beat out measurement error and be suitable for the bulk of simulation, but that same chart showed that fp16 was almost good enough. So again I wonder how many bits you'd actually need, since if you're considering rebuilding your FPUs it would be silly to exclude "float sizes that aren't powers of two".