Hi
the program below implements the inversion of an autoregressive matrix.
Program Test use blas95 use lapack95 USE IFPORT use mkl_service implicit none integer(kind=8) :: istat, n, c1, c2, ise integer(kind=4) :: dy character(len=200) :: msg Real(kind=8), allocatable :: A(:,:) real(kind=8) :: r1=0.0D0, r2=0.0D0 outer:block dy=1 write(*,*) "dynamic: ", dy call mkl_set_dynamic(dy) call mkl_set_num_threads(mkl_get_max_threads()) n=10000 write(*,"(*(g0"",""))") n r1=dclock() !!start building the matrix allocate(& &A(n,n),& &stat=istat,errmsg=msg) if(istat/=0) Then write(*,*) msg;exit outer end if !$OMP PARALLEL DO PRIVATE(c1) Do c1=1,size(A,2) Do c2=c1,size(A,1) A(c2,c1)=0.5**(c2-c1) end Do end Do !$OMP END PARALLEL DO ise=size(A,1) !$OMP PARALLEL DO PRIVATE(c1) FIRSTPRIVATE(ise) Do c1=1,ise-1 A(c1,(c1+1):ise)=A((c1+1):ise,c1) End Do !$OMP END PARALLEL DO r2=Dclock() write(*,*) "alloc: ", r2-r1 !!end building matrix r2=Dclock() call potrf(A=A,UPLO="U",INFO=istat) r1=dclock() write(*,*) "potrf: ",r1-r2 call POTRI(A=A,Info=istat) r2=dclock() write(*,*) "potri: ",r2-r1 End block outer End Program Test
For setting mkl_dynamic to 0 or 1, I noticed hardly any difference in processing time when using mkl 17.08.
mkl_dynamic=0:
potrf: 0.88 seconds, potri: 2.12 seconds
mkl_dynamic=1
potrf: 0.58 seconds, potri: 2.09 seconds
However, with mkl 19.02 the differences are such that mkl_dynamic=0 makes the program unusable.
mkl_dynamic=0:
potrf: 0.37 seconds, potri: 110.69 seconds
mkl_dynamic=1
potrf: 0.37 seconds, potri: 1.11 seconds
Times were obtained on Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz.
Environment variables were:
MKL_NUM_THREADS=36
KMP_AFFINITY=granularity=core,scatter
I noticed that potri in mkl 19.02 use a lot of time all 72 threads (including hyperthreading)
Is this a newly introduced bug or am I doing anything wrong.
Thanks