Question cycle count of 2048 MKL FFT DftiComputeForward code

Hello There,

Recently I am using MKL FFT code to get the cycle count of DftiComputeForward. Form mkl documents, DFTI_NUMBER_OF_USER_THREADS is no longer used in latest MKL version. But I made a test.

Method is adding "status = DftiSetValue(FFT_desc, DFTI_NUMBER_OF_USER_THREADS, (1/2/3/4));" in my test code and result is:

Cycle count

FFT and thread setting
No setting thread
1 thread
2 thread
3 thread
4 thread

128-point
740
800
698
540
448

256-point
1418
923
956
920
960

512-point
3002
2263
1968
1984
1968

1024-point
5848
5044
4130
4185
4113

2048-point
24262
21624
9782
9714
9825

 test code is below:     
   //DFTI_SINGLE is single precision, DFTI_DOUBLE is double precision
        status = DftiCreateDescriptor(&FFT_desc, DFTI_SINGLE, DFTI_COMPLEX, 1, FFTSize);
        //DFTI_INPLACE is FFT output overwrites input, DFTI_NOT_INPLACE is FFT output does not overwrite input
        status = DftiSetValue(FFT_desc, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
  status = DftiSetValue(FFT_desc, DFTI_NUMBER_OF_USER_THREADS, 4);
        //frease FFT descriptor
        status = DftiCommitDescriptor(FFT_desc);

        j = 0;
        for (idxTimeLoop = 0; idxTimeLoop < taskCallsNumber / internalLoopCounter; idxTimeLoop++)
        {
            unsigned __int64 clockStart, clockEnd;
            clockStart = GetTickAndTime(&getStartTick, &getStartTime);

            for (idxLoop = 0; idxLoop < internalLoopCounter; idxLoop++)
            {
                //run fft with forward method
                status = DftiComputeForward(FFT_desc, FFT_in_singlePrecision, FFT_out_singlePrecision);
 
            }
            clockEnd = GetTickAndTime(&getEndTick, &getEndTime);
            clockNumArray[j] = getEndTick - getStartTick;
            timeDurationArray[j] = (getEndTime - getStartTime)*1000.0;
            j++;
        }

My MKL version information:
Major version:           11
Minor version:           2
Update version:          3
Product status:          Product
Build:                   20150413
Platform:                Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors

OS: win7

Porcessor: i5-3320M 2.6GHz.

My question: why the cycle count of 2048-point MKL FFT DftiComputeForward is about 4 times than 1024-point. Does this question is brought by data cache or something else? And why setting DFTI_NUMBER_OF_USER_THREADS can affect performance of 2048-point FFT DftiComputeForward. Please feel free to contact me if you need more info about my test code.

Thanks a lot!

Lei Fu

Question cycle count of 2048 MKL FFT DftiComputeForward code

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List