Efficient Compilation

Understandably, efficient compilation contributes to performance improvement. Before you analyze your program for performance improvement, and improve program performance, you should think of efficient compilation itself. Based on the analysis of your application, you can decide which Intel Fortran Compiler optimizations and command-line options can improve the run-time performance of your application.

Efficient Compilation Techniques

The efficient compilation techniques can be used during the earlier stages and later stages of program development.

During the earlier stages of program development, you can use incremental compilation with minimal optimization. For example:

ifort -c -g -O0 sub2.f90 (generates object file of sub2)

ifort -c -g -O0 sub3.f90 (generates object file of sub3)

ifort -o main -g -O0 main.f90 sub2.o sub3.o

The above commands turn off all compiler default optimizations (for example, -O2) with -O0. You can use the -g option to generate symbolic debugging information and line numbers in the object code for all routines in the program for use by a source-level debugger. The main file created in the third command above contains symbolic debugging information as well.

During the later stages of program development, you should specify multiple source files together and use an optimization level of at least -O2 (default) to allow more optimizations to occur. For instance, the following command compiles all three source files together using the default level of optimization, -O2:

ifort -o main main.f90  sub2.f90  sub3.f90

Compiling multiple source files lets the compiler examine more code for possible optimizations, which results in:

For very large programs, compiling all source files together may not be practical. In such instances, consider compiling source files containing related routines together using multiple ifort commands, rather than compiling source files individually.

Options That Improve Run-Time Performance

The table below lists the options in alphabetical order that can directly improve run-time performance. Most of these options do not affect the accuracy of the results, while others improve run-time performance but can change some numeric results. The Intel Fortran Compiler performs some optimizations by default unless you turn them off by corresponding command-line options. Additional optimizations can be enabled or disabled using command options.

Option

Description

-align keyword

Analyzes and reorders memory layout for variables and arrays.
Controls whether padding bytes are added between data items within common blocks, derived-type data, and record structures to make the data items naturally aligned.

-ax{K|W|N|B|P}
IA-32
and Intel® Extended Memory 64 Technology (Intel® EM64T) systems only

Optimizes your application's performance for specific processors. Regardless of which -ax suboption you choose, your application is optimized to use all the benefits of that processor with the resulting binary file capable of being run on any Intel IA-32 processor.

-fast

Enables a collection of optimizations for run-time performance.

-O1

Optimizes to favor code size and code locality. See Setting Optimizations with -On Options.

-O2

Optimizes for code speed. Sets performance-related options. Setting Optimizations with -On Options.

-O3

Activates loop transformation optimizations. Setting Optimizations with -On Options.

-openmp

Enables the parallelizer to generate multithreaded code based on the OpenMP* directives.

-parallel

Enables the auto-parallelizer to generate multithreaded code for loops that can be safely executed in parallel.

-qp

Requests profiling information, which you can use to identify those parts of your program where improving source code efficiency would most likely improve run-time performance. After you modify the appropriate source code, recompile the program and test the run-time performance.

-tpp{n}

Optimizes your application's performance for specific Intel processors. See Targeting a Processor, -tpp{n}.

-unrolln

Specifies the number of times a loop is unrolled (n) when specified with optimization level -O3. If you omit n in -unroll, the optimizer determines how many times loops can be unrolled.

Options That Slow Down the Run-time Performance

The table below lists options in alphabetical order that can slow down the run-time performance. Some applications that require floating-point exception handling or rounding might need to use the -fpen dynamic option. Other applications might need to use the -assume dummy_aliases or -vms options for compatibility reasons. Other options that can slow down the run-time performance are primarily for troubleshooting or debugging purposes.

The following table lists the options that can slow down run-time performance.

Option

Description

-assume dummy_aliases

Forces the compiler to assume that dummy (formal) arguments to procedures share memory locations with other dummy arguments or with variables shared through use association, host association, or common block use. These program semantics slow performance, so you should specify
-assume
dummy_aliases only for the called subprograms that depend on such aliases.

The use of dummy aliases violates the Fortran 77 and Fortran 95/90 standards but occurs in some older programs.

-check bounds

Generates extra code for array bounds checking at run time.

-check overflow

Generates extra code to check integer calculations for arithmetic overflow at run time. Once the program is debugged, omit this option to reduce executable program size and slightly improve run-time performance.

-fpe 3

Using this option enables certain types of floating-point exception handling, which can be expensive.

-g  

Generate extra symbol table information in the object file. Specifying this option also reduces the default level of optimization to -O0 or -O0 (no optimization).

Note

The -g option only slows your program down when no optimization level is specified, in which case -g turns on -O0, which slows the compilation down. If -g, -O2 are specified, the code runs very much the same speed as if -g were not specified.

-O0

Turns off optimizations. Can be used during the early stages of program development or when you use the debugger.

-save

Forces the local variables to retain their values from the last invocation terminated. This may change the output of your program for floating-point values as it forces operations to be carried out in memory rather than in registers, which in turn causes more frequent rounding of your results.

-vms

Controls certain VMS-related run-time defaults, including alignment. If you specify the -vms option, you may need to also specify the -align records option to obtain optimal run-time performance.