Hi,
I am working with a matrix multiplication of sizes A = 40 x 40 and B is 40 x 10k with MKL support functions "cblas_cgemm". It is taking a 30 milliseconds,
I have enabled mkl multithreading also, which I belive it is more.
I have read in internet that "MKL functions are optimized for generic matrix multiplications"..
Anybody agrees or disagrees with me.
Thanks in advance .