Using Arrays Efficiently

Many of the array access efficiency techniques described in this section are applied automatically by the Intel Fortran loop transformations optimizations. Several aspects of array use can improve run-time performance:

• The fastest array access occurs when contiguous access to the whole array or most of an array occurs. Perform one or a few array operations that access all of the array or major parts of an array instead of numerous operations on scattered array elements. Rather than use explicit loops for array access, use elemental array operations, such as the following line that increments all elements of array variable a:

a = a + 1

When reading or writing an array, use the array name and not a DO loop or an implied DO-loop that specifies each element number. Fortran 95/90 array syntax allows you to reference a whole array by using its name in an expression. For example:

real :: a(100,100)
a = 0.0
a = a + 1       ! Increment all elements
!  of a by 1
.
.
.
write (8) a     ! Fast whole array use

Similarly, you can use derived-type array structure components, such as:

type x
integer a(5)
end type x
.
.
.
type (x) z
write (8)z%a    ! Fast array structure
!
component use

• Make sure multidimensional arrays are referenced using proper array syntax and are traversed in the natural ascending storage order, which is column-major order for Fortran. With column-major order, the leftmost subscript varies most rapidly with a stride of one. Whole array access uses column-major order.
Avoid row-major order, as is done by C, where the rightmost subscript varies most rapidly.
For example, consider the nested do loops that access a two-dimension array with the j loop as the innermost loop:

integer x(3,5), y(3,5), i, j
y = 0
do i=1,3               ! I outer loop varies slowest
do j=1,5               ! J inner loop varies fastest
x (i,j) = y(i,j) + 1   ! Inefficient row-major storage order
end do                 !  (rightmost subscript varies fastest)
end do
.
.
.
end program

Since j varies the fastest and is the second array subscript in the expression x (i,j), the array is accessed in row-major order.
To make the array accessed in natural column-major order, examine the array algorithm and data being modified. Using arrays x and y, the array can be accessed in natural column-major order by changing the nesting order of the do loops so the innermost loop variable corresponds to the leftmost array dimension:

integer x(3,5), y(3,5), i, j
y = 0
do j=1,5               ! J outer loop varies slowest
do i=1,3               ! I inner loop varies fastest
x (i,j) = y(i,j) + 1   ! Efficient column-major storage order
end do                 !  (leftmost subscript varies fastest)
end do
.
.
.
end program

The Intel Fortran whole array access (x = y + 1) uses efficient column major order. However, if the application requires that J vary the fastest or if you cannot modify the loop order without changing the results, consider modifying the application program to use a rearranged order of array dimensions. Program modifications include rearranging the order of:

• Dimensions in the declaration of the arrays x(5,3) and y(5,3)

• The assignment of x(j,i) and y(j,i) within the do loops

• All other references to arrays x and y

In this case, the original DO loop nesting is used where J is the innermost loop:

integer x(3,5), y(3,5), i, j
y = 0
do i=1,3              ! I outer loop varies slowest
do j=1,5              ! J inner loop varies fastest
x (j,i) = y(j,i) + 1  ! Efficient column-major storage order
end do                !  (leftmost subscript varies fastest)
end do
.
.
.
end program

Code written to access multidimensional arrays in row-major order (like C) or random order can often make use of the CPU memory cache less efficient. For more information on using natural storage order during record, see Improving I/O Performance.

• Use the available Fortran 95/90 array intrinsic procedures rather than create your own.

Whenever possible, use Fortran 95/90 array intrinsic procedures instead of creating your own routines to accomplish the same task. Fortran 95/90 array intrinsic procedures are designed for efficient use with the various Intel Fortran run-time components.

Using the standard-conforming array intrinsics can also make your program more portable.

• With multidimensional arrays where access to array elements will be noncontiguous, avoid leftmost array dimensions that are a power of two (such as 256, 512).

Since the cache sizes are a power of 2, array dimensions that are also a power of 2 may make less efficient use of cache when array access is noncontiguous. If the cache size is an exact multiple of the leftmost dimension, your program will probably make inefficient use of the cache. This does not apply to contiguous sequential access or whole array access.

One work-around is to increase the dimension to allow some unused elements, making the leftmost dimension larger than actually needed. For example, increasing the leftmost dimension of A from 512 to 520 would make better use of cache:

real a(512, 100)
do i= 2,511
do j = 2,99
a(i,j)=(a(i+1,j-1) + a(i-1, j+1)) * 0.5
end do
end do

In this code, array a has a leftmost dimension of 512, a power of two. The innermost loop accesses the rightmost dimension (row major), causing inefficient access. Increasing the leftmost dimension of a to 520 (real a (520,100)) allows the loop to provide better performance, but at the expense of some unused elements.

Because loop index variables I and J are used in the calculation, changing the nesting order of the do loops changes the results.

For more information on arrays and their data declaration statements, see the Intel® Fortran Language Reference.

Passing Array Arguments Efficiently

In Fortran, there are two general types of array arguments:

• Explicit-shape arrays used with Fortran 77.

These arrays have a fixed rank and extent that is known at compile time. Other dummy argument (receiving) arrays that are not deferred-shape (such as assumed-size arrays) can be grouped with explicit-shape array arguments.

• Deferred-shape arrays introduced with Fortran 95/90.

Types of deferred-shape arrays include array pointers and allocatable arrays. Assumed-shape array arguments generally follow the rules about passing deferred-shape array arguments.

When passing arrays as arguments, either the starting (base) address of the array or the address of an array descriptor is passed:

• When using explicit-shape (or assumed-size) arrays to receive an array, the starting address of the array is passed.

• When using deferred-shape or assumed-shape arrays to receive an array, the address of the array descriptor is passed (the compiler creates the array descriptor).

Passing an assumed-shape array or array pointer to an explicit-shape array can slow run-time performance. This is because the compiler needs to create an array temporary for the entire array. The array temporary is created because the passed array may not be contiguous and the receiving (explicit-shape) array requires a contiguous array. When an array temporary is created, the size of the passed array determines whether the impact on slowing run-time performance is slight or severe.

The following table summarizes what happens with the various combinations of array types. The amount of run-time performance inefficiency depends on the size of the array.

 Actual Argument Array Types (choose one:) Dummy Argument Array Types (choose one:) Explicit-Shape Arrays Deferred-Shape and Assumed-Shape Arrays Explicit-shape arrays Result when using this combination: Very efficient. Does not use an array temporary. Does not pass an array descriptor. Interface block optional. Result when using this combination: Efficient. Only allowed for assumed-shape arrays (not deferred-shape arrays). Does not use an array temporary. Passes an array descriptor. Requires an interface block. Deferred-shape and assumed-shape arrays Result when using this combination: When passing an allocatable array, very efficient. Does not use an array temporary. Does not pass an array descriptor. Interface block optional. When not passing an allocatable array, not efficient. Instead use allocatable arrays whenever possible. Uses an array temporary. Does not pass an array descriptor. Interface block optional. Result when using this combination: Efficient. Requires an assumed-shape or array pointer as dummy argument. Does not use an array temporary. Passes an array descriptor. Requires an interface block.