4-2 FLOATING-POINT NUMBERS - CONCRETE EXAMPLE ********************************************** IEEE/REAL*4 ----------- To make things more concrete let's look at a typical floating-point representation for a REAL (SINGLE PRECISION) - the single-precision unextended IEEE (ANSI/IEEE Std 754-1985) that became a de facto standard on workstations. The '*4' is a non-standard notation that says that 4 bytes are allocated for the representation. A schematic description of the representation follows, the 4 bytes contain 32 bits that are partitioned into 3 parts (the letter 'S' in the left part is short for 'Sign') +-+--------+-----------------------+ |S| exp | fraction | +-+--------+-----------------------+ Direction of ^ ^ <--- increasing addresses Bit31 Bit0 (See discussion below) A formula that gives the value of this float is: Value = (-1)**S X 1.fffffffffffffffffffffff X 2**(exp - 127) The most significant bit (MSB) is the sign bit, it is 0 for a positive number and 1 for a negative number. The next 8 bits describe the exponent which is BIASED by 127 (see the formula above), so the range of values is [-127, 128] The remaining 23 bits are taken as the binary digits of a binary fraction that has a "whole part" = 1 (see the formula above), this condition is just the normalization condition. An IEEE normalized mantissa always has a leading '1' bit, so it is really redundant and can be always omitted (an old 'trick' attributed to David Goldberg), it 'saves' one bit that can be used to improve the precision. The following program may help you examine the structure of REAL on your machine, it is based on the plausible assumption that integers are represented in two's complement format. Of course we could use the Z edit descriptor, but it is not standard FORTRAN 77, and so may not be implemented by all compilers. PROGRAM RELREP C ------------------------------------------------------------------ REAL * X C ------------------------------------------------------------------ WRITE(*,*) ' Enter a REAL number: ' READ(*,*) X CALL BINREP(X) C ------------------------------------------------------------------ END SUBROUTINE BINREP(INT) C ------------------------------------------------------------------ INTEGER * I, * INT C ------------------------------------------------------------------ CHARACTER * B*32 C ------------------------------------------------------------------ IF (INT .GE. 0) THEN B(1:1) = '0' DO I = 32, 2, -1 IF (MOD(INT,2) .EQ. 0) THEN B(I:I) = '0' ELSE B(I:I) = '1' ENDIF INT = INT / 2 ENDDO ELSE B(1:1) = '1' INT = ABS(INT + 1) DO I = 32, 2, -1 IF (MOD(INT,2) .EQ. 0) THEN B(I:I) = '1' ELSE B(I:I) = '0' ENDIF INT = INT / 2 ENDDO ENDIF C ------------------------------------------------------------------ WRITE(*,*) ' ', B(1:8),' ', B(9:16),' ', B(17:24),' ', B(25:32) WRITE(*,*) ' ........ ........ ........ ........ ' WRITE(*,*) ' 21098765 43210987 65432109 87654321 ' WRITE(*,*) ' 3 2 1 ' WRITE(*,*) ' ' C ------------------------------------------------------------------ RETURN END Special numbers --------------- Using normalized mantissas raises a little problem, how to represent zero when the mantissa is not allowed to have zero value? The IEEE solution is to represent the number zero by a zero fraction and exponent, but no condition is imposed on the SIGN BIT, so we have two 'zeros' +0 and -0! Remember that the exponent is biased by 127, so that a zero exponent really means that the binary fraction is 'multiplied' by (2 ** (-127)), in other words, the minimal exponent is reserved to represent zero. There is also an internal representation for 'INFINITY', it consists of the maximal exponent = 255 (128 after debiasing) and all fraction bits = 0. So we have also two 'infinities' one positive and one negative. An even stranger phenomenon is the class of bit patterns called NaNs, a NaN has exponent = 255 (128 after debiasing) and fraction bits which are not all 0. NaN is short for 'Not A Number'. The special numbers (except zero) were invented in order to implement NON-STOP ARITHMETIC, instead of aborting the program in the case an intermediary calculation gives a bad result, the result is replaced by the appropriate special number and computation continues. IEEE arithmetic implements an extension of the real numbers system, the quantities +INFINITY, -INFINITY and the NaNs are added to the real numbers, and arithmetic operations involving them are defined in a plausible way. Many users find this extension confusing and not very useful. The 'representation density' of IEEE/REAL*4 ------------------------------------------- What is the spacing between two consecutive floating-point numbers? Positive FPN are the product of a 'normalized' binary fraction with 23 binary digits, and (2 ** e), where e is in [-126,127]. Remember that the exponents -127 and +128 are reserved to represent zero and infinity respectively. The 'normal' FPNs can be partitioned into 254 disjoint sets, one for each possible exponent, each set containing (2 ** 23) numbers, one for each possible binary fraction of length 32. The spacing between consecutive numbers belonging to the same set, is the same, and equals (2 ** (-23)) * (2 ** e) = 2 ** (e -32). It is clear that the spacing increases when e (and the magnitude of the number) increases. The minimal positive FPN is (+1.0) * (2 ** (-126)) = 2 ** (-126), the spacing at that region is (2 ** (-126 - 32)) = 2 ** (-158). We see that the minimal positive FPN is MUCH LARGER than the local spacing. The number space of IEEE/REAL*4 ------------------------------- If we will translate the binary data from previous sections to decimal, we will find the range of numbers that can be represented by the IEEE REAL*4 is: (-3.4 X 10**+38, +3.4 X 10**+38) Because the minimal FPN is so much larger than the nearby spacing, it is more instructive to look at that range as the union of three discrete segments: (-3.4 X 10**+38, -1.2 X 10**-38) (0.0) (+1.2 X 10**-38, +3.4 X 10**+38) In this floating-point representation we have a finite number of numbers filling the three ranges, two of them with variable 'density'. +---------------------------------------------------------------------+ | SUMMARY | | ======= | | 1) IEEE/REAL*4 = 1 Sign bit, 8 exponent bits, 23 mantissa bits | | 2) There are all kinds of 'strange numbers' | | 3) The number space is discrete, made of three parts, and has | | maximal 'density' near zero | +---------------------------------------------------------------------+Return to contents page