Aligning Memory

The performance of Intel® IPP, when operating on aligned or misaligned data, can be significantly different. Access to memory is faster if pointers to the data are aligned, and Intel IPP functions perform better if they process data with aligned pointers. To align pointers, you should apply the special Intel IPP memory allocation function ippsMalloc(). There are several modifications of the function in the library, differing only in the data types they deal with. 

The functions allocate memory and return pointers aligned by 32 bytes. The following performance results using the Intel IPP copy function on an Intel® Pentium® 4 processor clearly present the differences for different alignments of the source data:

Source align in bytes

16 + 0 16 + 4 16 + 3
CPU cycles per element (CPE) 0.3 0.5 0.7

 

As you may see from the table, when the data is not aligned, like in the two examples given in the right-hand columns, the number of cycles your CPU makes is about two times as high as in the case when the memory is aligned.