Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

QR algorithm scalability problems

$
0
0

Dear support team!

 

We’ve faced some problems with QR algorithm scalability, implemented using MKL functions LAPACKE_sgehrd to reduce our matrix to Hessenberg form and LAPACKE_shseqr to perform iterations of QR algorithm itself.

 

Here is the code we launched on Xeon E5 v3 processor with 14 cores:

 

    omp_set_num_threads(threads_count);

    cout << "threads count: "<< omp_get_max_threads() << endl;

    

    double t1 = omp_get_wtime();

    LAPACKE_sgehrd(LAPACK_ROW_MAJOR, size, 1, size, A, size, tau);

    double t2 = omp_get_wtime();

    cout << "LAPACKE_sgehrd time: "<< t2 - t1 << " sec"<< endl;

    

    float *re = new float[size];

    float *im = new float[size];

    float *z;

    

    t1 = omp_get_wtime();

    LAPACKE_shseqr(LAPACK_ROW_MAJOR, 'E', 'N', size, 1, size, A, size, re, im, z, size);

    t2 = omp_get_wtime();

    cout << "LAPACKE_shseqr time: "<< t2 - t1 << " sec"<< endl;

 

The compiler we used is icc (ICC) 15.0.3 20150407. Here are the results of launches on 1, 2, 3, 4 and 14 cores:

 

threads count: 1

LAPACKE_sgehrd time: 84.4017 sec

LAPACKE_shseqr time: 30.4593 sec

 

threads count: 2

LAPACKE_sgehrd time: 45.2026 sec

LAPACKE_shseqr time: 27.8578 sec

 

threads count: 3

LAPACKE_sgehrd time: 35.0818 sec

LAPACKE_shseqr time: 25.2905 sec

 

threads count: 4

LAPACKE_sgehrd time: 27.3022 sec

LAPACKE_shseqr time: 28.1272 sec

 

threads count: 14

LAPACKE_sgehrd time: 19.8118 sec

LAPACKE_shseqr time: 27.1131 sec

 

As it is clear, LAPACKE_sgehrd has poor scalability, while LAPACKE_shseqr has no scalability at all. The question is if there is any way we can improve the scalability of both this routines, or it its working as intended?

 

Sincerely,

Vladislav Shishvatov

 


Viewing all articles
Browse latest Browse all 2652

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>