1-8 FORTRAN COMMON PITFALLS **************************** (Thanks to Sergio Gelato for the important contributions, to Dan Pop for the enlightening comments and suggestions, and to Craig Burley for the very important comments) This chapter explains the less-obvious pitfalls that appear in the 'common pitfalls list' of the chapter on debugging. Contents -------- 1) Passing constants to a subprogram and modifying them 2) Using floating-point comparisons tests 3) Loops with a REAL control variable 4) Using the MOD function with REAL arguments 5) Assigning a real to an integer 6) Assuming intrinsic conversion functions take care of result type 7) Uncareful use of automatic type promotions 8) Letting array indexes go out of bounds 9) Common blocks losing their values while the program runs 10) Aliasing of dummy arguments and common block variables 11) Depending on assumptions about the computing order of subexpressions 12) Assuming Short-circuit evaluation of expressions 13) Using trigonometric functions with large arguments on some machines 14) Incompatible argument lists in CALL and the routine definition Passing constants to a subprogram --------------------------------- In FORTRAN, when a variable is passed as a actual argument to a procedure, and the procedure modifies the corresponding dummy argument, the variable itself gets modified, unlike C or the default argument-passing mechanism of Pascal. In other words, changing the value of a dummy argument inside a procedure changes the actual/formal variable (corresponding entity used in the CALL statement) that invoked it. FORTRAN constants are declared with the PARAMETER statement, Constants are useful when we want to make sure that some quantity wouldn't change its value no matter what we do, it's a safety measure against a certain kind of programming errors. Compilers take a variety of approaches to implementing constants, often largely influenced by the underlying OS and CPU, depending on performance vs. debugging considerations, and so on. Constants can be implemented in various ways: 1) Replacement during compilation of every occurrence of the constant with its value. If the constant is passed to another procedure, it is no longer protected. 2) Storing them in memory like ordinary variables, but making the associated memory area 'read only'. This method insures that the protection against modification is "propagated" to called procedures. Passing a constant to a procedure may be done: 1) Just like ordinary variables with no extra protection against modification, this may be bad practice if a "non-propagating" method is used. 2) Using passing by value, then at least the calling procedure is protected. If you pass a constant, and modify it, either you get a segmentation fault (access violation error), or something gets modified. If your compiler pools and reuses constants, you might find that other instances of the constant got modified as well! If modification of constants is possible, it defeats the original purpose of a programming safety measure, and turns constants into a programming trap. Operating systems like DOS work directly with physical addresses, and can't protect memory, so you will not get a warning from the system if you access entities supposed to be 'read only'. Operating systems with Virtual memory (VMS, UNIX) can protect specific memory areas. Using floating-point comparisons tests -------------------------------------- When comparing two floating-point values, one has to keep in mind that .EQ. and .NE. test for strict equality of the model numbers being compared. Often one needs to allow for small round-off errors in the results of floating-point calculations, so that ABS(A-B) .LE. EPS, for some suitable tolerance EPS, is more appropriate than A .EQ. B . EPS may be a constant, or it may depend on A and B, e.g. It could be a constant times MAX(ABS(A), ABS(B)). The best way to choose EPS is by studying the numerical properties of the algorithms used to compute A and B; there is no single best answer for all applications. Remember: in Fortran, and most other languages, floating-point values are always _approximations_ of mathematical values (See the chapters on floating-point numbers). In simple language, using the relational operators .EQ. or .NE. with floating-point numbers is dangerous, because the value of numbers may be different from the expected mathematical value, due to radix conversion and roundoff errors. If you can't avoid FP tests, use the following statement-functions: FPEQ(X,Y) = ABS(X - Y) .LT. EPS FPEQ(X,Y) = ABS(X - Y) .LT. EPS * MAX(ABS(X), ABS(Y)) with EPS larger than the machine precision. Loops with a REAL or DOUBLE PRECISION control variable ------------------------------------------------------ Another version of the floating-point comparisons problem, with a REAL or DOUBLE PRECISION loop control variable, you may get different number of loop iterations than what you expected. A DO loop: DO I = E1, E2, E3 Will be executed: ( N times if N > 0 = ( ( 0 times if N <= 0 where N = INT((E2 - E1 + E3) / E3) The INT intrinsic function is very sensitive to small errors in the quotient, if the quotient changes from 1.00001 to 0.99999 the result will be quite different (zero iterations instead of one). DO loops with floating-point control variables are still allowed in Fortran 90, but programmers are strongly advised against using them, you can stop using them now. Using the MOD function with REAL arguments ------------------------------------------ This is yet another version of the floating-point comparisons problem. A small example program will make it clear: PROGRAM RNDOFF C ------------------------------------------------------------------ REAL * R1, R2, * RESULT C ------------------------------------------------------------------ R1 = 1.0 R2 = 0.2 RESULT = MOD(R1,R2) WRITE(*,'(1X,F3.1)') RESULT C ------------------------------------------------------------------ END The MOD function really computes: MOD = R1 - (R2 * INT(R1 / R2)) Upon converting the constant 0.2 to the internal binary representation a small roundoff error is introduced, on some machines the internal value will be rounded UP a little. Because of the rounding up, R1/R2 will be slightly smaller than 5.0, and the INT function will give the value 4 instead of the correct value 5. The program then will proceed to write the WRONG result 0.2. On other machines you may get some very small value. Assigning a REAL/DOUBLE PRECISION to an INTEGER ----------------------------------------------- In a similar way roundoff errors are amplified by a conversion to integer. An assignment of a REAL or DOUBLE PRECISION to an INTEGER is really an implicit conversion from a floating-point to integer, and the roundoff errors just wait for such moments. A likely situation to fall into this trap is when you compute array indexes from floating-point expressions. Assuming intrinsic conversion functions take care of result type ---------------------------------------------------------------- Because intrinsic routines are generic, some people assume that calling an intrinsic routine takes care of all data typing problems. A possible trap is: INTEGER I DOUBLE PRECISION X ..................... I = 123456789 X = REAL(I) WRITE(*,*) X The result of REAL() is just a REAL, it is followed by an implicit conversion (by assignment) to DOUBLE PRECISION. Instead of directly converting the integer to DOUBLE we are 'chaining' conversions, this may cause precision loss. It is correct in general that DBLE(I) will never be less accurate than DBLE(REAL(I)), and may be more accurate in many situations. The REAL(I) call may cause precision loss if the number of significant bits of the value of I exceeds the number of bits in the mantissa of a REAL, or radix conversion errors are introduced. Uncareful use of automatic type promotions ------------------------------------------ Automatic type promotions (by assignment) sweeps data-type problems under the rug, all problems seem to take care of themselves with that feature. The danger is that we will 'chain' conversions and lose precision. In the following example we convert a decimal floating-point number to DOUBLE PRECISION, directly and via an intermediary REAL and compare the results: PROGRAM CONV C ------------------------------------------------------------------ REAL y C ------------------------------------------------------------------ DOUBLE PRECISION xx, yy C ------------------------------------------------------------------ CHARACTER*40 str C ------------------------------------------------------------------ 100 CONTINUE WRITE(*,'(1X,A)') 'Enter a FP number: ' READ(*,'(A)') str READ(UNIT=STR, FMT=*) xx READ(UNIT=STR, FMT=*) y yy = y WRITE (*,'(1X,3E20.10)') xx, yy, xx - yy GOTO 100 C ------------------------------------------------------------------ END Explicitly specified numeric constants (by assignment or PARAMETER) may create similar problems (see the chapter on constants). Letting array indexes go out of bounds -------------------------------------- By default, array indexes are not checked before being used. The memory protection mechanism discovers out of bound references only when they get out of allocated memory. The best way is to compile the program (until you are sure it works o.k.) with: $ FORTRAN/CHECK=BOUNDS (VMS) f77 -C (SUN) f77 -C (IRIX) f77 -C (OSF/1) f77 -C (ULTRIX) xlf -C (AIX) In order to reap full benefits from the compiler's bounds checking option, formal arguments that are arrays should not be declared as assumed-size (with their last dimension given as *). They should also not be declared with a size different from that of the actual argument; in particular, declaring them of size (1) is usually not a good idea. Fortran-66 legacy code may contain such formal arguments with size 1. At the very least they should be replaced with assumed-size (*) arrays. Even better, the actual size should be passed as a separate argument and the formal array should be declared with that size. Example: SUBROUTINE S(A,N) INTEGER N REAL A(N) is better than SUBROUTINE S(A) REAL A(*) which in turn is better than SUBROUTINE S(A) REAL A(1) The last choice will cause the -C option to report spurious errors; the assumed-size version will prevent -C from detecting anything at all. Fortran 90 adds an even more convenient option (but only when the subprogram interface is explicit at the point of call): SUBROUTINE S(A) REAL A(:) Be sure to declare S in a module or interface block in this case. Common blocks losing their values while the program runs -------------------------------------------------------- The FORTRAN standard DOESN'T guarantee that named common blocks will keep their data values. When none of the procedures that declared the common block are activated, the common block may lose the data. Remember that an _unnamed_ (blank) common block does _not_ lose its data. See the chapter on common blocks for more information on common blocks. Static code checkers like FTNCHEK can warn you if your program is in such danger. Aliasing of dummy arguments and common block variables ------------------------------------------------------ The FORTRAN standard prohibits using the same variable (or array element) more than once in the same CALL statement, if one of the associated formal arguments (i.e. associated with the aliased actual arguments) is assigned a value during the subprogram call. An example of such aliasing: CALL SUB1(ARRAY, ARRAY(1)) Passing the same variable (or array element) at the same time with a CALL statement and in a common block is also prohibited. What happens if you alias arguments by mistake? Eager to optimize compilers may assume the no-aliasing condition, because it makes possible some more code optimizations, so if you do use the illegal dummy aliases, the results of your computations may be wrong. The important point is that in the presence of dummy aliasing, the results of a calculation may be different whether the compiler assumes dummy aliasing or not. You can check your compiler with the following program: PROGRAM ALIAS C ------------------------------------------------------------------ INTEGER * I, A(5) C ------------------------------------------------------------------ DO I=1,5 A(I) = 2 ENDDO C ------------------------------------------------------------------ WRITE(*,*) 'Before: ', A CALL SUB(A, A(1)) WRITE(*,*) 'After: ', A C ------------------------------------------------------------------ END SUBROUTINE SUB(X, Y) C ------------------------------------------------------------------ INTEGER * I, X(5), Y C ------------------------------------------------------------------ DO I=1,5 X(I) = Y * X(I) ENDDO C ------------------------------------------------------------------ RETURN END Our point can be easily checked on VMS, DEC Fortran provides a compiler option that toggles on and off the aliasing assumption, our example program will give different results when compiled on VMS with: Deafult option: /ASSUME=NODUMMY_ALIASES (wrong in our case) ==================================================================== Before: 2 2 2 2 2 After: 4 4 4 4 4 Option: /ASSUME=DUMMY_ALIASES (slow, assures correct results) ==================================================================== Before: 2 2 2 2 2 After: 4 8 8 8 8 The no-aliasing assumption may be implemented by having a local copy of Y that is not affected by writes to array X, we get then the all 4s result. Don't try to use these effects in a program, in our small example it seems simple enough, but with more complex programs it may give unpredictable results. Moreover, different Fortran implementations may display different behaviour. To make it more clear why the second result is the correct one, let's inline the subroutine and unroll all loops (substituting Y = X(1)). A basic rule is that subroutines should give the same results as the inlined code. loop in main program: X(1) = 2 X = (2, *, *, *, *) X(2) = 2 X = (2, 2, *, *, *) X(3) = 2 X = (2, 2, 2, *, *) X(4) = 2 X = (2, 2, 2, 2, *) X(5) = 2 X = (2, 2, 2, 2, 2) loop in subroutine: X(1) = X(1) * X(1) X = (4, 2, 2, 2, 2) X(2) = X(1) * X(2) X = (4, 8, 2, 2, 2) X(3) = X(1) * X(3) X = (4, 8, 8, 2, 2) X(4) = X(1) * X(4) X = (4, 8, 8, 8, 2) X(5) = X(1) * X(5) X = (4, 8, 8, 8, 8) Our rule implies that (4,8,8,8,8) is the right answer. +---------------------------+ | AVOID VARIABLE ALIASING | +---------------------------+ Depending on assumptions about the computing order of subexpressions -------------------------------------------------------------------- The FORTRAN standard says nothing about the order in which the compiler will compute subexpressions when computing an arithmetical expression, except that the compiler will obey nested parentheses. Moreover, the standard explicitly requires that your program shouldn't depend on the computing order. A small program can check the computing order on your machine: PROGRAM ARGORD INTEGER I, F1, F2, F3, F4 EXTERNAL F1, F2, F3, F4 WRITE (*,*) 'Evaluation order: ' I = F1(I) + F2(I) + F3(I) + F4(I) END INTEGER FUNCTION F1() WRITE (*,*) ' ARG #1 ' F1 = 1 RETURN END INTEGER FUNCTION F2() WRITE (*,*) ' ARG #2 ' F2 = 1 RETURN END INTEGER FUNCTION F3() WRITE (*,*) ' ARG #3 ' F3 = 1 RETURN END INTEGER FUNCTION F4() WRITE (*,*) ' ARG #4 ' F4 = 1 RETURN END All evaluation sequences are possible outputs of this program, a common behaviour is (#1,#2,#3,#4), but exceptions like (#2,#3,#4,#1) are known. Assuming Short-circuit evaluation of expressions ------------------------------------------------ Programmers accustomed to other languages (e.g. C) may try coding tricks based on the assumption that logical expressions are computed only up to the point the value is known, for example: INTEGER I REAL ARRAY(100) ........................... IF ((I .GE. 1) .AND. (I .LE. 100) .AND. ARRAY(I) .GT. 0) THEN WRITE (*,*) 'Array element is positive ' ENDIF REAL X .............................................. IF ((X .GT. 0.0) .AND. (LOG(X) .LE. 1.23) THEN The idea behind these IF conditions is to avoid the testing of the last part of the condition, if the range-checking conditions are false. In these examples testing the last condition with incorrect value of I or X is an error and may abort the program. Fortran doesn't guarantee short-circuiting evaluation of the .AND. operator, so the compiler is free to evaluate the last condition even if the range-checking conditions were evaluated first and the result was .FALSE. Furthermore, if the last condition contains a function-call, the compiler is forced to evaluate the expression, even if it already "knows" that the result of the whole expression will be .FALSE. because the function might have side effects. Programs using such a 'trick' may work when using one compiler and abort with an error message on another. The FORTRAN standard allows short-circuiting evaluation, and many compilers indeed provide it, you can check that by looking at the assembly listings of several test programs. A VAX/VMS example: SUBROUTINE SHRTCT(I, J, K) C ------------------------------------------------------------------ INTEGER * I, J, K C ------------------------------------------------------------------ IF ((I .GT. 0) .AND. (J .GT. 0)) THEN K = 1 ELSE K = 2 ENDIF C ------------------------------------------------------------------ RETURN END .TITLE SHRTCT .IDENT 01 0000 .PSECT $CODE SUBROUTINE SHRTCT(I, J, K) 0000 SHRTCT:: 0000 .WORD ^MReturn to contents page0002 MOVAL @K(AP), R0 IF ((I .GT. 0) .AND. (J .GT. 0)) THEN 0006 TSTL @I(AP) 0009 BLEQ L$1 000B TSTL @J(AP) 000E BLEQ L$1 K = 1 0010 MOVL #1, (R0) 0013 BRB L$2 0015 NOP 0016 NOP 0017 NOP 0018 L$1: K = 2 0018 MOVL #2, (R0) 001B L$2: RETURN 001B RET .END Using trigonometric functions with large arguments on some machines ------------------------------------------------------------------- Computing trigonometric functions with large arguments is difficult because a very exact reduction of the argument to the primary range (say between -Pi and Pi) must be performed. In other words, the reminder upon dividing the argument by the transcendental number Pi must be found very accurately. Good algorithms were found around 1982, but some compilers didn't implement them, so they return garbage results, or 0.0 with an error message. Beware especially of PC and old UNIX compilers. Some standards (AT&T system V) says that the result of computing the trigonometric functions in such cases should be 0.0 with an error message. Incompatible argument lists in CALL and the routine definition -------------------------------------------------------------- At run-time all variables reside in memory without any data type information, and no type checking is done, variables are just accessed by pre-computed addresses (starting address of variable). When you pass variables in an argument list (or common block), and they are typed differently in the called procedure you will get wrong results. If you are familiar with the internal representations, you can perform special (and usually non-portable) processing tricks this way, it's just like using EQUIVALENCE (in a wild way). In short, if you don't know enough about the internal representation of data types don't use incompatible declarations.