16, 8 and 4 bit floating point formats - how do they work? | by Dmitrii Eliuseev | September 2023

Let’s go to bits and bytes

13 minute read

10 hours ago

For 50 years, since the time of Kernighan, Ritchie and their first edition of the C language book, it was known that a single-precision “float” type has a size of 32 bits and a double-precision type has 64 bits. There was also an 80-bit “long double” type with extended precision, and all of these types covered almost all the needs of floating-point data processing. However, over the past few years, the advent of large neural network models has forced developers to move to another part of the spectrum and reduce floating point types as much as possible.

Honestly, I was surprised when I found out that the 4-bit floating point format exists. How the hell can that be possible? The best way to know is to try it for ourselves. In this article, we will discover the most popular floating point formats, create a simple neural network, and see how it works.

Let us begin.

A “standard” 32-bit floating point

Before entering into “extreme” formats, let’s remember a standard one. A IEEE 754 The standard for floating point arithmetic was established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). A typical number in a float 32 type looks like this:

Here, the first bit is a sign, the next 8 bits represent an exponent, and the last bits represent the mantissa. The final value is calculated using the formula:

This simple helper function allows us to print a floating point value in binary form:

import structdef print_float32(val: float):
""" Print Float32 in a binary form """
m = struct.unpack('I', struct.pack('f', val))(0)
return format(m, 'b').zfill(32)
print_float32(0.15625)
# > 00111110001000000000000000000000

Let’s also create another wizard for backward conversion, which will be useful later:

def ieee_754_conversion(sign, exponent_raw, mantissa, exp_len=8, mant_len=23):
""" Convert binary data into the floating point value """
sign_mult = -1 if sign == 1 else 1
exponent = exponent_raw - (2 ** (exp_len - 1) - 1)
mant_mult = 1
for b in range(mant_len - 1, -1, -1):
if mantissa & (2 **…

16, 8 and 4 bit floating point formats – how do they work? | by Dmitrii Eliuseev | September 2023

Technical Terrence Team

Are Aviva shares a good investment?

Leave a Reply Cancel reply

Recommended.

Range Rover Electric is ahead of its presentation in 2024

ETH is uncertain above $2,200, but is a drop towards $2,000 in play? (Ethereum Price Analysis)

Meta’s Horizon Worlds is catching up with the NES by making A the jump button

Ballena doubles down on Ethereum, exits Bitcoin despite stellar performance

Even China’s 1.4 billion population can’t fill all its vacant homes

Categories

Important Links