Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

MKL FFT inside OpenMP loop (MKL 2018)

$
0
0

 

I have an openmp loop

#pragma openmp parallel for

for (int i=0;i<n;i++){

// routine that calls MKL FFT

}

The thread performance is pretty abysmal, on an 8 core machine, showing just over 1 core being used.

What is surprising  is that Intel Amplifier shows that the time is spent in DftiCommitDescriptor, not the actual computation.

Function / Call Stack    CPU Time    Module    Function (Full)    Source File    Start Address
DftiCommitDescriptor    83.7%    mkl_rt.dll    DftiCommitDescriptor    [Unknown]    0x180a45b68

.....
DftiComputeForward    0.5%    mkl_rt.dll    DftiComputeForward    [Unknown]    0x180a45f10

Any suggested best practices here. typically the FFT function will be called with the same data length, say ,10K-20K..

 


Viewing all articles
Browse latest Browse all 2652

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>