Hello all
I am using the lapack subroutine 'dgelsd' in order to calculate the linear least square solution of (||Ax-b||) system. For that I have used Intel MKL Parallel library. When I run my code I can see that only 57% of the total CPU is used. Also setting the number of threads for MKL also has no effect. For that I used
call mkl_set_num_threads( 32 )
I am working on the workstation, whose specs are given below:
Intel(R) Xeon(R) CPU E5-2620 v4@ 2.10 GHz, Cores = 16, Logical processors = 32, Windows 10 Pro, 64-bit Operating system, x64-based processor.
Please suggest me how i can make use of available processing capacity. Presently my code is taking so much time to give results and its main computational part is calling DGELSD (where it is spending most of its time to give least square solution).
Thanks