The options used for basic PGO optimizations are:
-prof_gen to generate instrumented code
-prof_use to generate a profile-optimized executable
-prof_format_32 to produce 32-bit counters for .dyn and .dpi files
In cases where your code behavior differs greatly between executions, you have to ensure that the benefit of the profile information is worth the effort required to maintain up-to-date profiles. In the basic profile-guided optimization, the following options are used in the phases of the PGO:
The -prof_gen option instruments the program for profiling to get the execution count of each basic block. It is used in phase 1 of the PGO to instruct the compiler to produce instrumented code in your object files in preparation for instrumented execution. Parallel make is automatically supported for -prof_gen compilations.
The -prof_use option is used in phase 3 of the PGO to instruct the compiler to produce a profile-optimized executable and merges available dynamic-information (.dyn) files into a pgopti.dpi file.
The dynamic-information files are produced in phase 2 when you run the instrumented executable.
If you perform multiple executions of the instrumented program, -prof_use merges the dynamic-information files again and overwrites the previous pgopti.dpi file.
The Intel Fortran compiler by default produces profile data with 64-bit counters to handle large numbers of events in the .dyn and .dpi files. The -prof_format_32 option produces 32-bit counters for compatibility with the earlier compiler versions. If the format of the .dyn and .dpi files is incompatible with the format used in the current compilation, the compiler issues the following message:
Error: xxx.dyn has old or incompatible file format - delete file and redo instrumentation compilation/execution.
The 64-bit format for counters and pointers in .dyn and .dpi files eliminate the incompatibilities on various platforms due to different pointer sizes.
-fnsplit- disables function splitting on ItaniumŪ-based systems. Function splitting is enabled by -prof_use in phase 3 to improve code locality by splitting routines into different sections: one section to contain the cold or very infrequently executed code and one section to contain the rest of the code (hot code).
You can use -fnsplit- to disable function splitting for the following reasons:
Most importantly, to get improved debugging capability. In the debug symbol table, it is difficult to represent a split routine, that is, a routine with some of its code in the hot code section and some of its code in the cold code section. The -fnsplit- option disables the splitting within a routine but enables function grouping, an optimization in which entire routines are placed either in the cold code section or the hot code section. Function grouping does not degrade debugging capability.
Another reason can arise when the profile data does not represent the actual program behavior, that is, when the routine is actually used frequently rather than infrequently.
For ItaniumŪ-based applications, if you intend to use the -prof_use option with optimizations at the -O3 level, the -O3 option must be on. If you intend to use the -prof_use option with optimizations at the -O2 level or lower, you can generate the profile data with the default options.
See an example of using PGO.