Instruction Scheduling for the Intel® Itanium® 2 Processor
Instruction scheduling is improved on the Itanium® 2 processor.
On the Itanium® processor, use of the output of MM instructions (variable shifts, etc.) by integer instructions (ALU, st, ld) must be completed or the pipeline is flushed. Flushing the pipeline causes a penalty of ten cycles, because the compiler must insert blocks of nops with stop bits after shift operations. These blocks result because the MM instructions take an average latency of 4 cycles. The Integer instructions that use the outputs of the MM instructions are placed at least 4 cycles away from the issue of the MM instructions.
On the Itanium 2 processor, these operations are scoreboarded, removing the risk of flushing the pipeline. Therefore:
The example on the next page shows a comparison of the assembly code generated with and without the -G2 option.