Intel® Math Kernel Library 7.0.1 for Linux*
Technical User Notes

Contents

Purpose
Compiler Support
Directory Structure
Linking with the Intel® Math Kernel Library (Intel® MKL)
Linking with ScaLAPACK
Using MKL Parallelism
Memory Management
Performance
Configuration File
Obtaining Version Information
Custom Shared Object Builder
Technical Support

 
 

Purpose

The Intel® Math Kernel Library 7.0.1 for Linux* Technical User Notes describe the details of how to compile, link and run with Intel® MKL 7.0.1 for Linux*. It should be used in conjunction with the Intel® MKL 7.0.1 for Linux* release notes to reference how to use Intel® MKL 7.0.1 for Linux* in your application.

The Technical User Notes for the Intel® Math Kernel Library 7.0.1 for Linux* apply to two products: the Intel® Math Kernel Library 7.0.1 for Linux* and the Intel® Cluster Math Kernel Library (Intel® Cluster MKL) 7.0.1 for Linux*. Some routines in this document, such as ScaLAPACK, are present only in the Intel® Cluster MKL 7.0.1 for Linux* version. ScaLAPACK and related libraries are further marked in the Directory Structure with asterisk (*) indicating they are available in the cluster package only.

Compiler Support

Intel supports the Intel® Math Kernel Library (Intel® MKL) for use only with compilers identified in the release notes. However, the library has been successfully used with other compilers as well.

When using the cblas interface, the header file mkl.h will simplify program development since it specifies enumerated values as well as prototypes for all the functions. The header determines if the program is being compiled with a C++ compiler and, if it is, the included file will be correct for use with C++ compilation.

Directory Structure

Intel® MKL separates IA-32 versions of the library, Intel® Extended Memory 64 Technology (Intel® EM64T), and versions for Intel® Itanium® 2 processor. The IA-32 versions are located in the lib/32 directory. Intel® EM64T versions are located in the lib/em64t directory. Intel® Itanium® 2 processor versions are located in the lib/64 directory. Intel® MKL consists of two parts: high level libraries (ScaLAPACK*, sparse solver, LAPACK), and processor specific kernels in libmkl_ia32.a, libmkl_em64t.a, and libmkl_ipf.a. The high level libraries are optimized without regard to processor and can be used effectively on processors from Intel® Pentium® processor through Intel® Pentium® 4 processor. Processor specific kernels containing BLAS, cblas, FFTs, DFTs, VSL, and VML routines are optimized for each specific processor. Also, threading software is supplied as a separate library, libguide.a, for static linking, and dynamic link library, libguide.so, when linking dynamically to Intel® MKL.

The information below indicates the library's directory structure.

lib/32 Contains all libraries for 32-bit applications
libmkl_ia32.a Optimized kernels for Intel® Pentium®, Pentium® III, and Pentium® 4 processors
libmkl_lapack.a LAPACK routines and drivers
libmkl_solver.a Sparse solver routines
libmkl_scalapack.a* ScaLAPACK* routines
libmkl_blacs.a* BLACS* routines
libmkl_blacsF77init.a* BLACS* initialization routines for Intel® Fortran compiler users
libmkl_blacsF77init_gnu.a* BLACS* initialization routines for GNU Fortran compiler users
libmkl_blacsCinit.a* BLACS* initialization routines for Intel® C/C++ compiler users
libmkl_blacsCinit_gnu.a* BLACS* initialization routines for GNU C compiler users
libmkl_scalapacktesting_intel80.a* ScaLAPACK* testing routines for Intel® Fortran 8.0 compiler users
libmkl_scalapacktesting_intel.a* ScaLAPACK* testing routines for Intel® Fortran 7.1 compiler users
libmkl_scalapacktesting_gnu.a* ScaLAPACK* testing routines for GNU Fortran compiler users
libguide.a Threading library for static linking
libmkl.so Library dispatcher for dynamic load of processor specific kernel
libmkl_lapack32.so LAPACK routines and drivers, single precision data types
libmkl_lapack64.so LAPACK routines and drivers, double precision data types
libmkl_def.so Default kernel (Intel® Pentium®, Pentium® Pro, and Pentium® II processors)
libmkl_p3.so Intel® Pentium® III processor kernel
libmkl_p4.so Pentium® 4 processor kernel
libmkl_p4p.so Kernel for Intel® Pentium® 4 processor with Streaming SIMD Extensions 3 (SSE3)
libvml.so Library dispatcher for dynamic load of processor specific VML kernels
libmkl_vml_def.so VML part of default kernel (Pentium®, Pentium® Pro, Pentium® II processors)
libmkl_vml_p3.so VML part of Pentium® III processor kernel
libmkl_vml_p4.so VML part of Pentium® 4 processor kernel
libmkl_vml_p4p.so VML for Pentium® 4 processor with Streaming SIMD Extensions 3 (SSE3)
libguide.so Threading library for dynamic linking
 
lib/em64t Contains all libraries for Intel® EM64T applications
libmkl_em64t.a Optimized kernels for Intel® EM64T
libmkl_lapack.a LAPACK routines and drivers
libmkl_solver.a Sparse solver routines
libmkl_scalapack.a* ScaLAPACK* routines
libmkl_blacs.a* BLACS* routines
libmkl_blacsF77init.a* BLACS* initialization routines for Intel® Fortran compiler users
libmkl_blacsF77init_gnu.a* BLACS* initialization routines for GNU Fortran compiler users
libmkl_blacsCinit.a* BLACS* initialization routines for Intel® C/C++ compiler users
libmkl_blacsCinit_gnu.a* BLACS* initialization routines for GNU C compiler users
libmkl_scalapacktesting_intel.a* ScaLAPACK* testing routines for Intel® Fortran 8.1 compiler users
libmkl_scalapacktesting_gnu.a* ScaLAPACK* testing routines for GNU Fortran compiler users
libguide.a Threading library for static linking
libmkl.so Library dispatcher for dynamic load of processor specific kernel
libmkl_lapack32.so LAPACK routines and drivers, single precision data types
libmkl_lapack64.so LAPACK routines and drivers, double precision data types
libmkl_def.so Default kernel
libmkl_p4n.so Kernel for Intel® Xeon™ processor with Intel® EM64T
libvml.so Library dispatcher for dynamic load of processor specific VML kernels
libmkl_vml_def.so VML part of default kernel
libmkl_vml_p4n.so VML for Intel® Xeon™ processor with Intel® EM64T
libguide.so Threading library for dynamic linking
 
lib/64 Contains all libraries for Itanium® 2-based applications
libmkl_ipf.a Processor kernels for Intel® Itanium® 2 processor
libmkl_lapack.a LAPACK routines and drivers
libmkl_solver.a Sparse solver routines
libmkl_scalapack.a* ScaLAPACK* routines
libmkl_blacs.a* BLACS* routines
libmkl_blacsF77init.a* BLACS* initialization routines for Intel® Fortran compiler users
libmkl_blacsF77init_gnu.a* BLACS* initialization routines for GNU Fortran compiler users
libmkl_blacsCinit.a* BLACS* initialization routines for Intel® C/C++ compiler users
libmkl_blacsCinit_gnu.a* BLACS* initialization routines for GNU C compiler users
libmkl_scalapacktesting_intel80.a* ScaLAPACK* testing routines for Intel® Fortran 8.0 compiler users
libmkl_scalapacktesting_intel.a* ScaLAPACK* testing routines for Intel® Fortran 7.1 compiler users
libmkl_scalapacktesting_gnu.a* ScaLAPACK* testing routines for GNU Fortran compiler users
libguide.a Threading library for static linking
libmkl_lapack32.so LAPACK routines and drivers, single precision data types
libmkl_lapack64.so LAPACK routines and drivers, double precision data types
libguide.so Threading library for dynamic linking
libmkl.so Library dispatcher for dynamic load of processor specific kernel
libmkl_i2p.so Itanium® 2 processor kernel
libmkl_vml_i2p.so Itanium® 2 processor VML kernel
libvml.so Library dispatcher for dynamic load of processor specific VML kernel

Linking with Intel® MKL

To link to libraries in Intel® MKL 7.0.1, follow this general form:

<linker script> <files to link>
-L<MKL 7.0.1 path>
[-lmkl_solver]
{[-lmkl_lapack] -lmkl_{ia32, em64t, ipf},[-lmkl_lapack{32,64}] -lmkl}
-lguide -lpthread

To use LAPACK and BLAS software, you must link the following libraries: LAPACK, processor optimized kernels, threading library, and system library for threading support. If you want to use FFT/DFT, you may add "-lm" in your link option. Some possible variants:

ld myprog.o -L$MKLPATH -lmkl_ia32 -lguide -lpthread
IA-32 applications static linking of LAPACK and kernels. Processor dispatcher will call the appropriate kernel for the system at runtime.
ld myprog.o -L$MKLPATH -lmkl_ia32 -lguide -lpthread -lm
IA-32 applications static linking of BLAS and FFT/DFT. Processor dispatcher will call the appropriate kernel for the system at runtime.
ld myprog.o -L$MKLPATH -lmkl_lapack -lmkl_ipf -lguide -lpthread
Itanium®-based applications static linking of LAPACK and kernels. Processor dispatcher will call the appropriate kernel for the system at runtime.
ld myprog.o -L$MKLPATH -lmkl_solver -lmkl_lapack -lmkl_ipf -lguide -lpthread
Itanium®-based applications linking of sparse solver and possibly other routines within MKL including the kernels needed to support the sparse solver.

Libguide libraries have the same name for both static and dynamic cases. The previous example's libguide demonstrated the dynamic case, in which the shared object is used for the linking. To force the static library linking, either –static flag or explicit static form can be used:

ld myprog.o $MKLPATH/libmkl_lapack.a $MKLPATH/libmkl_ia32.a $MKLPATH/libguide.a -lpthread
IA-32 static linking, LAPACK library, IA-32 processor kernels. Processor dispatcher will call the appropriate kernel for the system at runtime.
ld myprog.o $MKLPATH/libmkl_lapack.a $MKLPATH/libmkl_ipf.a $MKLPATH/libguide.a -lpthread
Itanium®-based applications static linking of LAPACK and kernels. Processor dispatcher will call the appropriate kernel for the system at runtime.

Linking with ScaLAPACK*

To link a program that calls ScaLAPACK, you need to know how to link an MPICH application first. Typically, this involves using mpich scripts mpicc or mpif77 (C or Fortran77 scripts) that use the correct MPICH header files and so forth. If, for instance, MPICH is installed in /opt/mpich, then typically /opt/mpich/bin/mpicc and /opt/mpich/bin/mpif77 will be the compiler scripts and /opt/mpich/lib/libmpich.a will be the library used for that install.

To link to ScaLAPACK in Intel® MKL 7.0.1, follow this general form:

<mpich linker script> <files to link>                    \
       -L<Cluster MKL 7.0.1 path>                        \
       -lmkl_scalapack -lmkl_blacs{F77,C}init[_gnu]      \
       -lmkl_blacs -lmkl_blacs{F77,C}init[_gnu]          \
       <MKL LAPACK & BLAS libraries>

where F77 or C is chosen according to the main module programming language (Fortran or C/C++), _gnu suffix is used for the GNU compilers support (when the application is compiled with GNU), <MKL LAPACK & BLAS libraries> - LAPACK, processor optimized kernels, threading library, and system library for threading support linked as described above.

For instance, suppose you have MPICH 1.2.5 or later installed in /opt/mpich, Intel® Cluster MKL 7.0.1 installed in /opt/intel/mkl701cluster, you use Intel compilers and the main module is in C. To link with MKL for a cluster of IA-32 systems, you would use the following:

/opt/mpich/bin/mpicc <user files to link>                \
       -L/opt/intel/mkl701cluster/lib/32                  \
       -lmkl_scalapack -lmkl_blacsCinit                  \
       -lmkl_blacs -lmkl_blacsCinit                      \
       -lmkl_lapack -lmkl_ia32 -lguide                   \
       -lpthread

Another example, suppose you have MPICH 1.2.5 or later installed in /opt/mpich, Intel® Cluster MKL 7.0.1 installed in /opt/intel/mkl701cluster, you use GNU compilers and the main module is in Fortran. To link with MKL for a cluster of Intel® Itanium® processor family systems, you would use the following:

/opt/mpich/bin/mpif77 <user files to link>               \
       -L/opt/intel/mkl701cluster/lib/64                  \
       -lmkl_scalapack -lmkl_blacsF77init_gnu            \
       -lmkl_blacs -lmkl_blacsF77init_gnu                \
       -lmkl_lapack -lmkl_ipf -lguide                    \
       -lpthread

You may note that five BLACS libraries are included with Intel® MKL Cluster 7.0.1, but only two are used at any one time and that the third instance on the link line is a repeat of one of the libraries. This is not a typographical error.

If you build NetLib ScaLAPACK tests, you may need to link with NetLib testing support routines which are included into three separate libraries: libmkl_scalapacktesting_intel80.a, libmkl_scalapacktesting_intel.a, and libmkl_scalapacktesting_gnu.a (the first two are for Intel-compiled tests, the third is for GNU). In this case insert -lmkl_scalapacktesting[_gnu, _intel, _intel80] entry before -lmkl_scalapack.

A ScaLAPACK binary is run just like any other MPICH application. Consult the documentation that comes with MPICH. The script mpirun is involved, however, and the number of MPI processes is set by -np <number>.

Some final cautions and reminders:

Make certain that all nodes have the same [correct] OMP_NUM_THREADS value. In Intel® Cluster MKL 7.0.1 this value is one (1) by default. In previous versions the default value was the number of CPUs detected, which is dangerous (but not necessarily bad) for MPICH. The best way to set this variable is in the login environment. Please, remember that mpirun starts a fresh default shell on all of the nodes and so changing this value on the head node and then doing the run (which works on an SMP system) will not effectively change the variable as far as your program is concerned. In .bashrc, you could add a line at the top that looks like: OMP_NUM_THREADS=1; export OMP_NUM_THREADS.

It is possible to run multiple CPUs per node, but the MPICH must be built to allow it. Please, be aware that certain MPICH applications may not work perfectly in a threaded environment (see the Known Limitations section in the Release Notes). The safest thing for multiple CPUs, although not necessarily the fastest, is to run one MPI process per processor with OMP_NUM_THREADS set to one. Always verify that the combination with OMP_NUM_THREADS=1 works correctly.

All needed shared libraries must be visible on all the nodes in runtime. One way to accoplish this is to point these libraries by LD_LIBRARY_PATH environment variable in .bashrc file. Or, if Intel® MKL is only installed on one node, then the users should link statically when building their Intel® MKL applications.

Either the Intel compilers or GNU compilers can be used to compile a program that uses Intel® MKL, but make certain that MPICH and compiler match up correctly.

Using Intel® MKL Parallelism

Intel® MKL is threaded in a number of places: sparse solver, LAPACK (*GETRF, *POTRF, *GBTRF, *GEQRF, *ORMQR routines), all Level 3 BLAS, all DFTs (except 1D transformations when DFTI_NUMBER_OF_TRANSFORMS=1 and sizes are not power-of-two), and all FFTs. The library uses OpenMP* threading software.

There are situations in which conflicts can exist in the execution environment that make the use of threads in Intel® MKL problematic. We list them here with recommendations for dealing with these. First, a brief discussion of why the problem exists is appropriate.

If the user threads the program using OpenMP directives and uses the Intel compilers to compile the program, Intel® MKL and the user program will both use the same threading library. Intel® MKL tries to determine if it is in a parallel region in the program, and if it is, it does not spread its operations over multiple threads. But Intel® MKL can be aware that it is in a parallel region only if the threaded program and Intel® MKL are using the same threading library. If the user program is threaded by some other means, Intel® MKL may operate in multithreaded mode and the computations may be corrupted. Here are several cases with recommendations for the user:

  1. User threads the program using OS threads (pthreads on Linux*, Win32* threads on Windows*). If more than one thread calls the library, and the function being called is threaded, it is important that threading in Intel® MKL be turned off. Set OMP_NUM_THREADS=1 in the environment. This is the default with Intel® MKL 7.0.1 except sparse solver.
  2. User threads the program using OpenMP directives and/or pragmas and compiles the program using a compiler other than a compiler from Intel. This is more problematic in that setting OMP_NUM_THREADS in the environment affects both the compiler's threading library and the threading library with Intel® MKL. At this time, the safe approach is to set MKL_SERIAL=YES (or MKL_SERIAL=yes) which forces Intel® MKL to serial mode regardless of OMP_NUM_THREADS value.
  3. There are multiple programs running on a multiple-cpu system, as in the case of a parallelized program running using MPI for communication in which each processor is treated as a node. The threading software will see multiple processors on the system even though each processor has a separate process running on it. In this case OMP_NUM_THREADS should be set to 1.

Setting the number of threads: The OpenMP* software responds to the environmental variable OMP_NUM_THREADS. The number of threads can be set in the shell the program is running in. To change the number of threads, in the command shell in which the program is going to run, enter:

export OMP_NUM_THREADS=<number of threads to use>.

To force the library to serial mode, environment variable MKL_SERIAL should be set to YES. It works regardless of OMP_NUM_THREADS value. MKL_SERIAL is not set by default.

If the variable OMP_NUM_THREADS is not set, Intel® MKL software will run on the number of threads equal to 1. We recommend always setting OMP_NUM_THREADS to the number of processors you wish to use in your application.

Note. Currently the default number of threads for sparse solver is the number of processors in system.

Memory Management

Intel® MKL has memory management software that controls memory buffers for use by the library functions. When a call is made to certain library functions (such as those in the Level 3 BLAS or DFTs), new buffers are allocated if there are no free ones (marked as free) currently available. These buffers are not deallocated until the program ends. If at some point the user's program needs to free memory, it may do so with a call to MKL_FreeBuffers(). If another call is made to a library function that needs a memory buffer, then the memory manager will again allocate the buffers and they will again remain allocated until either the end of the program or the program deallocates the memory.

This memory management software is turned on by default. To disable it, set the environment variable MKL_DISABLE_FAST_MM to any value, which will cause memory to be allocated and freed from call to call. Disabling this feature will negatively impact performance of routines such as the level 3 BLAS, especially for small problem sizes.

Memory management has a restriction for the number of allocated buffers in each thread. Currently this number is 32. To avoid this restriction, disable memory management.

Performance

To obtain the best performance with Intel® MKL, make sure the following conditions are met: arrays must be aligned on a 16-byte boundary, and the leading dimension values (n*element_size) of two-dimensional arrays should be divisible by 16. There are additional conditions for the FFT functions. The addresses of first elements of arrays and the leading dimension values (n*element_size) of two-dimensional arrays should be divisible by cache line size (32 byte for Pentium® III processor and 64 byte for Pentium® 4 processor). Furthermore, for the C-style FFTs on the Pentium® 4 processor, the distance L between arrays that represent real and imaginary parts should not satisfy the following inequality:

k*2**16 <= L < k*2**16+64

These conditions are needed due to the use of Streaming SIMD Extensions (SSE).

To obtain the best performance with the library on Itanium®-based applications the following conditions are desirable.

For the C-style FFT a sufficient condition is for the distance L between arrays that represent real and imaginary parts is not divisible by 64. The best case is if L=k*64 + 16.

For DGEMM it is desirable that the leading dimension values (n*element_size) of two-dimensional arrays are divisible by 16, but not divisible by 32.

For DFTs it is desirable that the leading dimension values (n*element_size) of two-dimensional arrays are not power-of-two.

Configuration File

MKL configuration file will provide the possibilities to customize several features of the MKL, namely:

The configuration file is mkl.cfg file by default. The file contains several variables that can be changed. Below is the example of the configuration file containing all possible variables with default values:

//
// Default values for mkl.cfg file
//
// SO names for IA-32
MKL_X87so = mkl_def.so
MKL_SSE1so = mkl_p3.so
MKL_SSE2so = mkl_p4.so
MKL_SSE3so = mkl_p4p.so
MKL_VML_X87so = mkl_vml_def.so
MKL_VML_SSE1so = mkl_vml_p3.so
MKL_VML_SSE2so = mkl_vml_p4.so
MKL_VML_SSE3so = mkl_vml_p4p.so
// SO names for Intel(R) EM64T
MKL_EM64TDEFso = mkl_def.so
MKL_EM64TSSE3so = mkl_p4n.so
MKL_VML_EM64TDEFso = mkl_vml_def.so
MKL_VML_EM64TSSE3so = mkl_vml_p4n.so
// SO names for Intel(R) Itanium(R) processor family
MKL_I2Pso = mkl_i2p.so
MKL_VML_I2Pso = mkl_vml_i2p.so
// DLL names for LAPACK libraries
MKL_LAPACK32so = mkl_lapack32.so
MKL_LAPACK64so = mkl_lapack64.so
// Serial or parallel mode
//     YES – single threaded
//     NO - multi threaded
//     OMP – control by OMP_NUM_THREADS
MKL_SERIAL = YES
// Input parameters check
//     ON – checkers are used (default)
//     OFF – checkers are not used
MKL_INPUT_CHECK = ON

When any MKL function is called first, Intel® MKL checks to see if the configuration file exists, and if so, it operates with the specified variables. The path to the configuration file is specified by environment variable MKL_CFG_FILE. If this variable is not defined, then first the current directory is searched through, and then the directories specified in the PATH environment variable. If the MKL configuration file does not exist, the library operates with default values of variables (standard names of libraries, checkers on, non-threaded operation mode).
If the variable is not specified in the configuration file, or specified incorrectly, the default value is used.

Below is an example of the configuration file that only redefines the library names:

// SO redefinition
MKL_X87so = matlab_x87.so
MKL_SSE1so = matlab_sse1.so
MKL_SSE2so = matlab_sse2.so
MKL_SSE3so = matlab_sse2.so
MKL_ITPso = matlab_ipt.so
MKL_I2Pso = matlab_i2p.so

Obtaining Version Information

Intel® MKL provides a facility by which you can obtain information about the library (e.g., the version number). Two methods are provided for extracting this information. First, you may extract a version string using the function MKLGetVersionString. Or, alternatively, you can use the MKLGetVersion function to obtain an MKLVersion structure that contains the version information. Example programs for extracting this information are provided in the examples/versionquery directory. A makefile is also provided to automatically build the examples and output summary files containing the version information for the current library.

Custom Shared Object Builder

Custom shared object builder is targeted for dynamic library (shared object) creation with selected functions and placed in tools/builder folder. The builder contains a makefile and a definition file with the list of functions. The makefile has three targets: "ia32", "ipf", and "em64t". ia32 target is used for IA-32, ipf is used for Intel® Itanium® processor family and em64t is used for Intel® Xeon™ processor with Intel® EM64T.
There are several macros (parameters) for the makefile:

export = functions_list
determines the name of the file that contains the list of entry points functions, which will be included into shared object. This file is used for definition file creation and then for export table creation. Default name is functions_list.
name = mkl_custom
specifies the name of the created library. By default the library mkl_custom.so is built.
xerbla = user_xerbla.obj
specifies the name of object file that contains user’s error handler. This error handler will be added to the library and then will be used instead of standard MKL error handler xerbla. By default, that is, when this parameter is not pointed, standard MKL xerbla is used.

All parameters are not mandatory. For the simplest case, the command line could be make ia32 and the values of the remaining parameters will be taken by default. As a result mkl_custom.so library for IA-32 will be created, the functions list will be taken from functions_list file, and the standard MKL error handler xerbla will be used.

Another example for a more complex case:
make ia32 export=my_func_list.txt name=mkl_small xerbla=my_xerbla.o
In this case mkl_small.so library for IA-32 will be created, the functions list will be taken from my_func_list.txt file, user’s error handler my_xerbla.o will be used.

Entry points in functions_list file should be adjusted to interface:

dgemm_
ddot_
dgetrf_

If selected functions have several processor specific versions they all will be included into the custom library and managed by dispatcher.

Technical Support

Please see the Intel® MKL support website at http://www.intel.com/support/performancetools/libraries/mkl/.

 

Celeron, Dialogic, i386, i486, iCOMP, Intel, Intel Centrino, Intel logo, Intel386, Intel486, Intel740, IntelDX2, IntelDX4, IntelSX2, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetStructure, Intel Xeon, Intel XScale, Itanium, MMX, MMX logo, Pentium, Pentium II Xeon, Pentium III Xeon, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
 
* Other names and brands may be claimed as the property of others.
 
Copyright © 2000-2004 Intel Corporation.