Language Support and Directives

This topic addresses language features that better help to vectorize code. The declspec(align(n)) declaration enables you to overcome hardware alignment constraints. The restrict qualifier and the pragmas address the stylistic issues due to lexical scope, data dependence, and ambiguity resolution.

Language Support

Feature Description
__declspec(align(n)) Directs the compiler to align the variable to an n-byte boundary. Address of the variable is address mod n=0.
__declspec(align(n,off)) Directs the compiler to align the variable to an n-byte boundary with offset off within each n-byte boundary. Address of the variable is address mod n=off.
restrict Permits the disambiguator flexibility in alias assumptions, which enables more vectorization.
__assume_aligned(a,n) Instructs the compiler to assume that array a is aligned on an n-byte boundary; used in cases where the compiler has failed to obtain alignment information.
#pragma ivdep Instructs the compiler to ignore assumed vector dependencies.
#pragma vector{aligned|unaligned|always} Specifies how to vectorize the loop and indicates that efficiency heuristics should be ignored.
#pragma novector Specifies that the loop should never be vectorized

Multi-version Code

Multi-version code is generated by the compiler in cases where data dependence analysis fails to prove independence for a loop due to the occurrence of pointers with unknown values. This functionality is referred to as dynamic dependence testing.

Pragma Scope

These pragmas control the vectorization of only the subsequent loop in the program, but the compiler does not apply them to any nested loops. Each nested loop needs its own pragma preceding it in order for the pragma to be applied. You must place a pragma only before the loop control statement.

#pragma vector always

Syntax: #pragma vector always

Definition: This pragma instructs the compiler to override any efficiency heuristic during the decision to vectorize or not. #pragma vector always will vectorize non-unit strides or very unaligned memory accesses.

Example:

for(i = 0; i <= N; i++)

{

   a[32*i] = b[99*i];

}

#pragma ivdep

Syntax: #pragma ivdep

Definition: This pragma instructs the compiler to ignore assumed vector dependences. To ensure correct code, the compiler treats an assumed dependence as a proven dependence, which prevents vectorization. This pragma overrides that decision. Only use this when you know that the assumed loop dependences are safe to ignore.

The loop in this example will not vectorize with the ivdep pragma, since the value of k is not known (vectorization would be illegal if k<0 ).

Example:

#pragma ivdep

for (i = 0; i < m; i++)

{

   a[i] = a[i + k] * c;

}

#pragma vector

Syntax: #pragma vector{aligned | unaligned}

Definition: The vector loop pragma means the loop should be vectorized, if it is legal to do so, ignoring normal heuristic decisions about profitability. When the aligned (or unaligned) qualifier is used with this pragma, the loop should be vectorized using aligned (or unaligned) operations. Specify one and only one of aligned or unaligned.

Caution

If you specify aligned as an argument, you must be absolutely sure that the loop will be vectorizable using this instruction. Otherwise, the compiler will generate incorrect code.

The loop in the following example uses the aligned qualifier to request that the loop be vectorized with aligned instructions, as the arrays are declared in such a way that the compiler could not normally prove this would be safe to do so.

Example:

void foo (float *a)

{

   #pragma vector aligned

   for (i = 0; i < m; i++)

   {

      a[i] = a[i] * c;

   }

}

The compiler has at its disposal several alignment strategies in case the alignment of data structures is not known at compile-time. A simple example is shown (but several other strategies are supported as well). If, in the loop, the alignment of a is unknown, the compiler will generate a prelude loop that iterates until the array reference that occurs the most hits an aligned address. This makes the alignment properties of a known, and the vector loop is optimized accordingly.

Alignment Strategies Example

float *a;

// alignment unknown

for (i = 0; i < 100; i++)

{

   a[i] = a[i] + 1.0f;

}

 

// dynamic loop peeling

p = a & 0x0f;

if (p != 0)

{

   p = (16 - p) / 4;

   for (i = 0; i < p; i++)

   {

      a[i] = a[i] + 1.0f;

   }

}

 

// loop with a aligned (will be vectorized accordingly)

for (i = p; i < 100; i++)

{

   a[i] = a[i] + 1.0f;

}

#pragma novector

Syntax: #pragma novector

Definition: The novector loop pragma specifies that the loop should never be vectorized, even if it is legal to do so. In this example, suppose you know the trip count (ub - lb) is too low to make vectorization worthwhile. You can use #pragma novector to tell the compiler not to vectorize, even if the loop is considered vectorizable.

Example:

void foo (int lb, int ub)

{

   #pragma novector

   for (j = lb; j < ub; j++)

   {

      a[j] = a[j] + b[j];

   }

}

#pragma vector nontemporal

Syntax: #pragma vector nontemporal

Definition: #pragma vector nontemporal results in streaming stores on PentiumŪ 4 based systems. An example loop (float type) together with the generated assembly are shown in the example. For large N, significant performance improvements result on a Pentium 4 systems over a non-streaming implementation.

Example:

#pragma vector nontemporal

for (i = 0; i < N; i++)

   a[i] = 1;

   .B1.2:

movntps XMMWORD PTR _a[eax], xmm0

movntps XMMWORD PTR _a[eax+16], xmm0

add eax, 32

cmp eax, 4096

jl .B1.2

Dynamic Dependence Testing Example

float *p, *q;

for (i = L; I <= U; i++)

{

   p[i] = q[i];

}

...

pL = p * 4*L;

pH = p + 4*U;

qL = q + 4*L;

qH = q + 4*U;

if (pH < qL || pL > qH)

{

   // loop without data dependence

   for (i = L; i <= U; i++)

   {

      p[i] = q[i];

   } else {

   for (i = L; i <= U; i++)

   {

      p[i] = q[i];

   }

}