There follows a description of IEEE and IBM 370 floating point formats. This information was more relevant in the days of the Hitachi S3600 at the HPCF, and is not even guaranteed to be correct, although corrections are welcome.
Floating point numbers are stored in a very similar fashion to the "scientific" or "exponential" notiation in routine usage. There is a sign, a mantissa, and an exponent.
- 5.1276 * 10^3 = -5127.6 | | | | | exponent | mantissa signDifferences arise in the choice of base (here 10), the range of the mantissa (here 1 <= m < 10), and the number of figures given to the mantissa and exponent. Note that in binary the choice of mantissa 1 <= m < 10 guarantees that the first digit of the mantissa is 1. Therefore this 1 is often suppressed when storing the mantissa.
As used by babbage, and (almost) every other current computer excluding IBM mainframes, VAXes, the Hitachi S3600, Cray (i.e. XMP and C90 etc.) and Intel native mode floating-point.
Sign exponent mantissa base exp offset 32 bit single precision 0 1-8 9-31 2 127 64 bit double precision 0 1-11 12-63 2 1023The leading 1 of the mantissa is suppressed.
Various "special" numbers are also representable as IEEE. These include:
double single +INF 7FF0000000000000 7F800000 -INF FFF0000000000000 FF800000 NaN 7FF0000000000001 7F800001 to to 7FFFFFFFFFFFFFFF 7FFFFFFF and and FFF0000000000001 FF800001 to to FFFFFFFFFFFFFFFF FFFFFFFF +OVER 7FEFFFFFFFFFFFFF 7F7FFFFF -OVER FFEFFFFFFFFFFFFF FF7FFFFF +UNDER 0010000000000000 00800000 -UNDER 8010000000000000 80800000Example (single precision):
3 (base 10) = 1.5 * 2^1 (base 10) = 1.1 * 10^1 (base 2) Store: sign bit as 0 (+) exponent as 1000 0000 (=128, as offset of 127 added) mantissa as 100 0000 0000 0000 0000 (suppress leading 1) Giving: 0100 0000 0100 0000 0000 0000 0000 0000
The single precision range is 1.2e-38 (2^-126) to 3.4e38 (2^128), and double precision 2.3e-308 (2^-1022) to 1.7e308 (2^1024).
The smallest stored exponent is one, representing -126 in single precision, or -1022 in double precision. Numbers with a stored exponent of zero are said to be denormalised. The exponent is considered to be -126 / -1022, and the mantissa is interpreted with a suppressed leading zero (alternatively, the exponent is considered to be -127 / -1023, and the mantissa is fully stored).
Some machines flush denormals to zero, others calculate with them, but note that precision will be lost, as (many) leading digits of the mantissa will be zero.
The minimum denormals are 2^-149 (single precision, approx 1.4e-45) and 2^-1074 (double precision, approx 5e-324).
As used by turing (Hitachi S3600), and, of course, the IBM 370.
Sign exponent mantissa base exp offset 32 bit single precision 0 1-7 8-31 16 64 64 bit double precision 0 1-7 8-63 16 64The mantissa may have up to 3 leading zeros, even when normalised. Denormalised values are valid. Example (single precision):
3 (base 10) = 0.1875 * 16^1 (base 10) = 0.0011 * 10000^1 (base 2, exponent base 16) Store: sign bit 0 (+) exponent 100 0001 (=65, as offset of 64 added) mantissa 0011 0000 0000 0000 0000 0000 Giving: 0100 0001 0011 0000 0000 0000 0000 0000
Approximate ranges are as follows:
Range Precision (binary digits) (decimal digits) IEEE 32 bit 10^38 23 7 IEEE 64 bit 10^308 52 15 IBM 32 bit 10^75 21-24 6-7 IBM 64 bit 10^75 53-56 15-16Please note also that the IBM arithmetic "rounds" by truncation, whereas IEEE normally rounds to the closest value. Errors can build up rapidly under the former scheme.
This is mostly IEEE, but is 80 bits, with a 64 bit mantissa with an explicit leading one and a 15 bit exponent offset by 16383. Denormals have an explicit leading zero, as well as the exponent zero, and the exponent cannot be zero with an explicit leading one in the mantissa(?!).