Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

Question cycle count of 2048 MKL FFT DftiComputeForward code

$
0
0

Hello There,

Recently I am using MKL FFT code to get the cycle count of DftiComputeForward. Form mkl documents, DFTI_NUMBER_OF_USER_THREADS is no longer used in latest MKL version. But I made a test.

Method is adding "status = DftiSetValue(FFT_desc, DFTI_NUMBER_OF_USER_THREADS, (1/2/3/4));" in my test code and result is: 

 
Cycle count

FFT and thread setting
No setting thread
1 thread
2 thread
3 thread
4 thread

128-point
740
800
698
540
448

256-point
1418
923
956
920
960

512-point
3002
2263
1968
1984
1968

1024-point
5848
5044
4130
4185
4113

2048-point
24262
21624
9782
9714
9825

 test code is below:     
   //DFTI_SINGLE is single precision, DFTI_DOUBLE is double precision
        status = DftiCreateDescriptor(&FFT_desc, DFTI_SINGLE, DFTI_COMPLEX, 1, FFTSize);
        //DFTI_INPLACE is FFT output overwrites input, DFTI_NOT_INPLACE is FFT output does not overwrite input
        status = DftiSetValue(FFT_desc, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
  status = DftiSetValue(FFT_desc, DFTI_NUMBER_OF_USER_THREADS, 4);
        //frease FFT descriptor
        status = DftiCommitDescriptor(FFT_desc);

        j = 0;
        for (idxTimeLoop = 0; idxTimeLoop < taskCallsNumber / internalLoopCounter; idxTimeLoop++)
        {
            unsigned __int64 clockStart, clockEnd;
            clockStart = GetTickAndTime(&getStartTick, &getStartTime);

            for (idxLoop = 0; idxLoop < internalLoopCounter; idxLoop++)
            {
                //run fft with forward method
                status = DftiComputeForward(FFT_desc, FFT_in_singlePrecision, FFT_out_singlePrecision);
 
            }
            clockEnd = GetTickAndTime(&getEndTick, &getEndTime);
            clockNumArray[j] = getEndTick - getStartTick;
            timeDurationArray[j] = (getEndTime - getStartTime)*1000.0;
            j++;
        }

My MKL version information:
Major version:           11
Minor version:           2
Update version:          3
Product status:          Product
Build:                   20150413
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors

OS: win7

Porcessor: i5-3320M 2.6GHz.

My question: why the cycle count of 2048-point MKL FFT DftiComputeForward is about 4 times than 1024-point. Does this question is brought by data cache or something else?  And why setting DFTI_NUMBER_OF_USER_THREADS can affect performance of 2048-point FFT DftiComputeForward. Please feel free to contact me if you need more info about my test code.

Thanks a lot!

Lei Fu

 


Viewing all articles
Browse latest Browse all 2652

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>