Vectorization Support

The directives discussed in this topic support vectorization.

IVDEP Directive

The IVDEP directive instructs the compiler to ignore assumed vector dependences. To ensure correct code, the compiler treats an assumed dependence as a proven dependence, which prevents vectorization. This directive overrides that decision. Use IVDEP only when you know that the assumed loop dependences are safe to ignore.

For example, if the expression j >= 0 is always true in the code fragment bellow, the IVDEP directive can communicate this information to the compiler. This directive informs the compiler that the conservatively assumed loop-carried flow dependences for values j < 0 can be safely ignored:

!DEC$ IVDEP
do i = 1, 100
a(i) = a(i+j)
enddo

Note
The proven dependences that prevent vectorization are not ignored, only assumed dependences are ignored.

The usage of the directive differs depending on the loop form.

Loop 1

Do i
= a(*) + 1
a(*) =
enddo

Loop 2

Do i
a(*) =
= a(*) + 1
enddo

For loops of the form 1, use old values of a, and assume that there is no loop-carried flow dependencies from DEF to USE.

For loops of the form 2, use new values of a, and assume that there is no loop-carried anti-dependencies from USE to DEF.

In both cases, it is valid to distribute the loop, and there is no loop-carried output dependency.

Example 1

CDEC$ IVDEP
do  j=1,n
a(j) = a(j+m) + 1
enddo

Example 2

CDEC$ IVDEP
do  j=1,n
a(j) = b(j) +1
b(j) = a(j+m) + 1
enddo

Example 1 ignores the possible backward dependencies and enables the loop to get software pipelined.

Example 2 shows possible forward and backward dependencies involving array a in this loop and creating a dependency cycle. With IVDEP , the backward dependencies are ignored.

IVDEP has options: IVDEP:LOOP and IVDEP:BACK. The IVDEP:LOOP option implies no loop-carried dependencies. The IVDEP:BACK option implies no backward dependencies.

The IVDEP directive is also used with the -ivdep_parallel option for Itanium®-based applications.

For more details on these directives, see "Directive Enhanced Compilation", section "General Directives", in the Intel® Fortran Language Reference.

Overriding Vectorizer's Efficiency Heuristics

In addition to IVDEP directive, there are more directives that can be used to override the efficiency heuristics of the vectorizer:

VECTOR ALWAYS
NOVECTOR
VECTOR ALIGNED
VECTOR UNALIGNED
VECTOR NONTEMPORAL

The VECTOR directives control the vectorization of the subsequent loop in the program, but the compiler does not apply them to nested loops. Each nested loop needs its own directive preceding it. You must place the vector directive before the loop control statement.

For more details on these directives, see "Directive Enhanced Compilation", section "General Directives", in the Intel® Fortran Language Reference.

The VECTOR ALWAYS and NOVECTOR Directives

The VECTOR ALWAYS directive overrides the efficiency heuristics of the vectorizer, but it only works if the loop can actually be vectorized, that is: use IVDEP to ignore assumed dependences.

The VECTOR ALWAYS directive can be used to override the default behavior of the compiler in the following situation. Vectorization of non-unit stride references usually does not exhibit any speedup, so the compiler defaults to not vectorizing loops that have a large number of non-unit stride references (compared to the number of unit stride references). The following loop has two references with stride 2. Vectorization would be disabled by default, but the directive overrides this behavior.

!DEC$ VECTOR ALWAYS
do i = 1, 100, 2
a(i) = b(i)
enddo

If, on the other hand, avoiding vectorization of a loop is desirable (if vectorization results in a performance regression rather than improvement), the NOVECTOR directive can be used in the source text to disable vectorization of a loop. For instance, the Intel® Compiler vectorizes the following example loop by default. If this behavior is not appropriate, the NOVECTOR directive can be used, as shown below.

!DEC$ NOVECTOR
do i = 1, 100
          
a(i) = b(i) + c(i)
enddo

For more details on these directives, see "Directive Enhanced Compilation", section "General Directives", in the Intel® Fortran Language Reference.

The VECTOR ALIGNED and UNALIGNED Directives

Like VECTOR ALWAYS, these directives also override the efficiency heuristics. The difference is that the qualifiers UNALIGNED and ALIGNED instruct the compiler to use, respectively, unaligned and aligned data movement instructions for all array references. This disables all the advanced alignment optimizations of the compiler, such as determining alignment properties from the program context or using dynamic loop peeling to make references aligned.

Note
The directives VECTOR [ALWAYS, UNALIGNED, ALIGNED] should be used with care. Overriding the efficiency heuristics of the compiler should only be done if the programmer is absolutely sure the vectorization will improve performance. Furthermore, instructing the compiler to implement all array references with aligned data movement instructions will cause a run-time exception in case some of the access patterns are actually unaligned.

For more details on these directives, see "Directive Enhanced Compilation", section "General Directives", in the Intel® Fortran Language Reference.

The VECTOR NONTEMPORAL Directive

The VECTOR NONTEMPORAL directive results in streaming stores on PentiumŪ 4 based systems. A floating-point type loop together with the generated assembly are shown in the example below. For large n, significant performance improvements result on a Pentium 4 systems over a non-streaming implementation.

The following example illustrates the use of the VECTOR NONTEMPORAL directive:

       subroutine set(a,n)
 integer i,n
 real a(n)
!DEC$ VECTOR NONTEMPORAL
!DEC$ VECTOR ALIGNED
 do i = 1, n
   a(i) = 1
 enddo
 end
 program setit
 parameter(n=1024*1204)
 real a(n)
 integer i
 do i = 1, n
   a(i) = 0
 enddo
 call set(a,n)
 do i = 1, n
    if (a(i).ne.1) then
      print *, 'failed nontemp.f', a(i), i
      stop
    endif
 enddo
 print *, 'passed nontemp.f'
 end

For more details on these directives, see "Directive Enhanced Compilation", section "General Directives", in the Intel® Fortran Language Reference.