Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

A'*B using mkl_dcscmm

$
0
0

I tried mkl_dcscmm to compute both A*B and A'*B  using a Matlab mex file (64-bit Linux, Matlab 2013a and 2013b) similar to the code posted in
http://software.intel.com/en-us/forums/topic/472320

MKL is faster than matlab's own implemention on A*B. It is strange that MKL is slower than matlab's version on A'*B and the results are slightly different.

(the first column of cpu is from matlab's implementation and the second column is from MKL)
seed:  76080079, A*B: err 0.00e+00, cpu (0.91, 0.44), A'*B: err 1.43e-09, cpu (0.76, 0.71)
seed:  66432737, A*B: err 0.00e+00, cpu (0.91, 0.43), A'*B: err 1.43e-09, cpu (0.75, 0.79)
seed:  90643494, A*B: err 0.00e+00, cpu (0.92, 0.45), A'*B: err 1.43e-09, cpu (0.77, 0.88)
seed:  75317986, A*B: err 0.00e+00, cpu (0.94, 0.46), A'*B: err 1.45e-09, cpu (0.75, 0.82)
seed:  31023079, A*B: err 0.00e+00, cpu (0.92, 0.42), A'*B: err 1.43e-09, cpu (0.75, 0.80)
seed:  86467634, A*B: err 0.00e+00, cpu (0.94, 0.48), A'*B: err 1.44e-09, cpu (0.76, 0.86)
seed:  19834911, A*B: err 0.00e+00, cpu (0.93, 0.61), A'*B: err 1.42e-09, cpu (0.78, 0.76)
seed:  79273667, A*B: err 0.00e+00, cpu (0.93, 0.48), A'*B: err 1.43e-09, cpu (0.75, 0.82)
seed:  11976366, A*B: err 0.00e+00, cpu (0.93, 0.45), A'*B: err 1.42e-09, cpu (0.78, 0.89)
seed:  16420430, A*B: err 0.00e+00, cpu (0.92, 0.40), A'*B: err 1.43e-09, cpu (0.75, 0.80)

My codes are attached. It can be compiled as
mex -O  -largeArrayDims  -output sfmult mkl-sfmult-v1.cpp 
A*B and A'*B can be computed as sfmult(A, B, 1) and sfmult(A, B, 2), respectively.

Although A'*B can also be computed as sfmult(A', B, 1) by first doing the transpose, it is better to provide the A matrix and use the flag of transpose inside mkl_dcscmm.

Any suggestion or comment is welcome. Thanks!


Viewing all articles
Browse latest Browse all 2652

Trending Articles