Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

SVD speed of 'small' matrices in MKL 2018_0_124

$
0
0

I'm using SVD during some least-square fitting, typically operating on spectral data (1000-2000 data points) and fitting with very few parameters (2-5).

For this, I'm generally using a direct implementaion of the SVD routines from the "numerical recipes" (single-threaded).

When I started needing SVDs in other areas (bigger matrices with a less extreme aspect ratio, typtically ~ 10000 x 1000) I started using MKL Lapacke, currenlty using version 2017_4_210 and here the routines greatly outperform the NR routines.

So I also started using them for the fitting as described above. However, when applying it to the "extreme" data of only very few parameters ( typical matrix size 2048 x 3 ), the Lapacke routines fell behind and the NR routines are just faster.

Just as a "guideline": Running the same (iterative) fitting on a typical standard data-set, my profile tells me I'm staying with the SVD-routines for about  4sec using NR routines and for about 7sec with the MKL routines)

Now, when MKL 2018 was announced a month ago, I was quite excited to read in the Release Notes (https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2018-release-notes):

LAPACK:

  • Added the following improvements and optimizations for small matrices (N<16):
  • Added ?gesvd, ?geqr/?gemqr, ?gelq/?gemlq  optimizations for tall-and-skinny/short-and-wide matrice

So I gave it a try, but was quite disappointed. Not only did the NR still outperfrom MKL routines, but for reasons not clear to me, the performance actually dropped significantly in the 2018_0_124 MKL compared to the 2017_4_210 version.

The same data for guideline:
- NR routines: 4sec
- MKL 2017: 7sec
- MKL 2018: 14sec

The only changes I did when comparing both variantes was to re-compile/link with the newer version and use the according new version DLLs.
Did I miss something? Or did I misunderstand the release notes? Does anybody have some other comparative data for running SVDs on matrices of size ( 2048 x 3 ) which will help me figure out whether it is problem of the lirbary or of my implementation of it?

I ran my tests on 8 cores enabled  on a (4 core hyper-threaded i7-4712 HQ).

 


Viewing all articles
Browse latest Browse all 2652

Trending Articles