Aligning Memory for Intel® Pentium® 4 Processor

This topic does not apply to the Intel® Pentium® 4 processor that supports Streaming SIMD Extensions 3 (SSE3).

If you intend to develop applications using the Intel® Pentium® 4 processor, you have to make sure that two pointers used by a function do not differ by a multiple of 64 KB. Otherwise the function performance decreases significantly. As an example, consider the function ippsCplxToReal_32fc:

for(n=0;n<length;n++) {
pRe[n]=pSrc[n].re;
pIm[n]=pSrc[n].im;
}

As you see from the following graphic, performance decreases sharply if the difference between the pRe and pIm pointers is 64 KB or a multiple of that. The performance drop could be as huge as 16 times, from just five cycles per element to a prohibitive eighty