Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

Poor scaling for real-to-real FFT with OpenMP

$
0
0

In the attached file I use MKL to compute a real-to-real FFT using OpenMP for multithreading.

The code is compiled with

icpc -o bench-fft -Wall -O3 -g -march=native -fopenmp bench-fft.cxx -mkl

The machine has 4 cores.

It seems that the code does not scale well with the number of threads.

When run with

OMP_NUM_THREADS=1 ./bench-fft 4194304

the total time taken is 0.1640 user, 0.0440 sys while with

OMP_NUM_THREADS=2 ./bench-fft 4194304

the total time taken is 0.3000 user, 0.0560 sys. So there seems to be a large synchronization overhead since the total CPU time almost doubles.

Is this to be expected or am I doing something wrong in my code.

AttachmentSize
Downloadtext/x-c++srcbench-fft.cxx1.93 KB

Thread Topic: 

Question

Viewing all articles
Browse latest Browse all 2652

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>