SVD speed of 'small' matrices in MKL 2018_0

I'm using SVD during some least-square fitting, typically operating on spectral data (1000-2000 data points) and fitting with very few parameters (2-5).

For this, I'm generally using a direct implementaion of the SVD routines from the "numerical recipes" (single-threaded).

When I started needing SVDs in other areas (bigger matrices with a less extreme aspect ratio, typtically ~ 10000 x 1000) I started using MKL Lapacke, currenlty using version 2017_4_210 and here the routines greatly outperform the NR routines.

So I also started using them for the fitting as described above. However, when applying it to the "extreme" data of only very few parameters ( typical matrix size 2048 x 3 ), the Lapacke routines fell behind and the NR routines are just faster.

Just as a "guideline": Running the same (iterative) fitting on a typical standard data-set, my profile tells me I'm staying with the SVD-routines for about 4sec using NR routines and for about 7sec with the MKL routines)

Now, when MKL 2018 was announced a month ago, I was quite excited to read in the Release Notes (https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2018-release-notes):

LAPACK:

Added the following improvements and optimizations for small matrices (N<16):
Added ?gesvd, ?geqr/?gemqr, ?gelq/?gemlq optimizations for tall-and-skinny/short-and-wide matrice

So I gave it a try, but was quite disappointed. Not only did the NR still outperfrom MKL routines, but for reasons not clear to me, the performance actually dropped significantly in the 2018_0_124 MKL compared to the 2017_4_210 version.

The same data for guideline:
- NR routines: 4sec
- MKL 2017: 7sec
- MKL 2018: 14sec

The only changes I did when comparing both variantes was to re-compile/link with the newer version and use the according new version DLLs.
Did I miss something? Or did I misunderstand the release notes? Does anybody have some other comparative data for running SVDs on matrices of size ( 2048 x 3 ) which will help me figure out whether it is problem of the lirbary or of my implementation of it?

I ran my tests on 8 cores enabled on a (4 core hyper-threaded i7-4712 HQ).

SVD speed of 'small' matrices in MKL 2018_0_124

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112