Using FFT

Fast Fourier Transform is a universal method to increase performance of data processing, especially in the field of digital signal processing where filtering is essential. 

The convolution theorem states that filtering of two signals in the spatial domain can be computed as point-wise multiplication in the frequency domain. The data transform to and from the frequency domain is usually fulfilled using the Fourier transform. You can apply the Finite Impulse Response (FIR) filter to the input signal using Intel IPP FFT functions, which are very fast on Intel® processors. You can also increase the data array length to the next greater power of two by padding the array with zeroes and then applying the FFT forward transform function to the input signal and the FIR filter coefficients. Fourier coefficients obtained in this way are  multiplied point-wise and the result can easily be transformed back to the spatial domain. The performance gain achieved by using FFT could be very significant. 

If the applied filter is the same for several processing iterations then the FFT transform of the filter coefficients need to be done only once. The twiddle tables and the bit reverse tables are created in the initialization function for the forward and for inverse transforms at the same time. The main operations in this kind of filtering are presented below:

ippsFFTInitAlloc_R_32f( &pFFTSpec, fftord, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone );
/// perform forward FFT to put source data x to freq-domain

ippsFFTFwd_RToPack_32f( xx, XX, pFFTSpec, 0 );
/// point-wise multiplication in freq-domain is convolution

ippsMulPack_32f_I( HH, XX, fftlen );
/// perform inverse FFT to get result in time-domain

ippsFFTInv_PackToR_32f( XX, yy, pFFTSpec, 0 );
/// free FFT tables

ippsFFTFree_R_32f( pFFTSpec ); 

Another way to significantly improve performance is by using FFT and multiplication for processing large size data. Note that the zeros in the example above could be pointers to the external memory, which is another way to increase performance. Note that the Intel IPP signal processing FIR filter is implemented using FFT and you do not need to create a special implementation of the FIR functions.