# Vectorization Examples

This section contains simple examples of some common issues in vector programming.

## Argument Aliasing: A Vector Copy

The loop in the example of a vector copy operation does not vectorize because the compiler cannot prove that DEST(A(I)) and DEST(B(I)) are distinct.

Example of Unvectorizable Copy Due to Unproven Distinction:

SUBROUTINE VEC_COPY(DEST,A,B,LEN)
DIMENSION DEST(*)
INTEGER A(*), B(*)
INTEGER LEN, I
DO I=1,LEN
DEST(A(I)) = DEST(B(I))
END DO
RETURN
END

## Data Alignment

A 16-byte or greater data structure or array should be aligned so that the beginning of each structure or array element is aligned in a way that its base address is a multiple of 16.

The Misaligned Data Crossing 16-Byte Boundary figure shows the effect of a data cache unit (DCU) split due to misaligned data. The code loads the misaligned data across a 16-byte boundary, which results in an additional memory access causing a six- to twelve-cycle stall. You can avoid the stalls if you know that the data is aligned and you specify to assume alignment

 Misaligned Data Crossing 16-Byte Boundary

After vectorization, the loop is executed as shown in figure below.

 Vector and Scalar Clean-up Iterations

Both the vector iterations A(1:4) = B(1:4); and A(5:8) = B(5:8); can be implemented with aligned moves if both the elements A(1) and B(1) are 16-byte aligned.

Caution
If you specify the vectorizer with incorrect alignment options, the compiler will generate code with unexpected behavior. Specifically, using aligned moves on unaligned data, will result in an illegal instruction exception!

## Alignment Strategy

The compiler has at its disposal several alignment strategies in case the alignment of data structures is not known at compile-time. A simple example is shown below (several other strategies are supported as well). If in the loop shown below the alignment of A is unknown, the compiler will generate a prelude loop that iterates until the array reference, that occurs the most, hits an aligned address. This makes the alignment properties of A known, and the vector loop is optimized accordingly. In this case, the vectorizer applies dynamic loop peeling, a specific Intel® Fortran feature.

Example of Data Alignment:

Original loop:

SUBROUTINE DOIT(A)
REAL A(100)       ! alignment of argument A is unknown
DO I = 1, 100
A(I) = A(I) + 1.0
ENDDO
END SUBROUTINE

Aligning Data:

! The vectorizer will apply dynamic loop peeling as follows:
SUBROUTINE DOIT(A)
REAL A(100)
! let P be (A%16)where A is address of A(1)
IF (P .NE. 0) THEN
P = (16 - P) /4    ! determine run-time peeling
! factor
DO I = 1, P
A(I) = A(I) + 1.0
ENDDO
ENDIF
! Now this loop starts at a 16-byte boundary,
! and will be vectorized accordingly
DO I = P + 1, 100
A(I) = A(I) + 1.0
ENDDO
END SUBROUTINE