Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

mkl_dcsrmv slower than openMP implementation

$
0
0

Hi,

I'm trying to find the fastest way to do a multithreaded sparse matrix-vector multiply. I've written some benchmarking code to form a large random sparse matrix in CSR format, and then time 3 different implementations to compute y = y + A*x. I have a serial implementation, an openMP implementation, and mkl_dcsrmv. I'm computing the average and minimum time over a number of runs, say, 10.

Strangely, though, the openMP implementation beats MKL always. For the matrix sizes in the code, openMP has a min time of 0.199272 seconds, while MKL has a min time of 0.249399 seconds over 10 runs. This is for a matrix with about 256 million nonzeros.

I'm running this on a machine with 32 cores. I've adjusted the number of threads and played with the KMP_AFFINITY environment variable. The openMP code does better in every case.

Any idea why I'm getting these results? Perhaps I'm using MKL sub-optimally? Any help would be greatly appreciated.

I've attached the code I'm running. I compile with "icc -mkl -openmp rand_mat.c"

Thanks,

AJ

AttachmentSize
Downloadrand_mat.c6.18 KB

Viewing all articles
Browse latest Browse all 2652

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>