Loop Unrolling with -unroll
The -unroll[n] option is used in the
the maximum number of times you want to unroll a loop. The following example
unrolls a loop at most four times:
prompt>icpc -unroll4 a.cpp
To disable loop unrolling, specify n as
0. The following example disables loop unrolling:
prompt>icpc -unroll0 a.cpp
lets the compiler decide whether to perform unrolling or not. This is
the default; the compiler uses default heuristics or defines n.
- -unroll0 (n
= 0) disables the loop unroller.
The ItaniumŪ compiler currently recognizes only n
= 0; any other value is ignored.
Benefits and Limitations of Loop Unrolling
The benefits of loop unrolling are as follows:
- Unrolling eliminates branches and some of the code.
- Unrolling enables you to aggressively schedule (or
pipeline) the loop to hide latencies if you have enough free registers
to keep variables live.
- The Intel®
4 and Intel®
Xeon(TM) processors can correctly predict the exit branch for an inner
loop that has 16 or fewer iterations, if that number of iterations is
predictable and there are no conditional branches in the loop. Therefore,
if the loop body size is not excessive, and the probable number of iterations
is known, unroll inner loops for:
- Pentium 4 processors, until they have a maximum
of 16 iterations
- Pentium III or Pentium II processors, until they
have a maximum of 4 iterations
A potential limitation is that excessive unrolling, or unrolling of
very large loops, can lead to increased code size.
For more information on how to optimize with -unroll[n],
refer to the Intel®
4 and Intel®
Processor Optimization Reference Manual.