Detecting if Intel MKL is enabled in Visual Studio project's properties

July 24, 2018, 12:28 am

Latest and popular articles on Intel Technologies

≫ Next: Issue in cluster_sparse_solver examples

Hello all,

I am working on a project where Intel MKL is nice to have, but not available on all the targeted platforms, so I have to check for its presence to behave accordingly.

I have enabled the Intel Performance Libraries in my Visual Studio project's properties, as explained in Compiling and Linking Intel® Math Kernel Library with Microsoft* Visual C++* and in Intel® Math Kernel Library (Intel® MKL) 2018 Getting Started but I'm not getting any of the preprocessor definitions described in Using Predefined Preprocessor Symbols for Intel® MKL Version-Dependent Compilation, e.g. __INTEL_MKL__ is not defined.

Any ideas how I can have these defined? Or any other means to detect Intel MKL?

Please note that I don't use any intermediary build tool like Cmake.

Thanks in advance.

↧

Issue in cluster_sparse_solver examples

July 24, 2018, 10:25 am

Latest and popular articles on Intel Technologies

≫ Next: lapack function cpbtrs slower in mkl 18.0 vs 14.0

≪ Previous: Detecting if Intel MKL is enabled in Visual Studio project's properties

Hello,

I have an issue when using Parallel Direct Sparse Solver for Clusters. I modified a test case sample (cl_solver_sym_sp_0_based_c.c) in "mkl/example/examples_cluster_c.tgz" routines which includes a system of linear equation (AX=b) with a sparse symmetric matrix (A).

When I use the original file and run the code with different number of cores the code PASSES the test. But, when I modify just a member of A matrix (REPLACE THE DIAGONAL ELEMENT IN 2nd ROW from -4.0 to 0.0), the code fails when I use multiple cores. It works using 1 core. (I attached the modified file to this post)

The original shape of matrix is:

    float a[18] = { 7.0, /*0*/ 1.0, /*0*/ /*0*/ 2.0, 7.0, /*0*/
                         -4.0, 8.0, /*0*/ 2.0, /*0*/ /*0*/ /*0*/
                             1.0, /*0*/ /*0*/ /*0*/ /*0*/ 5.0,
                                   7.0, /*0*/ /*0*/ 9.0, /*0*/
                                            5.0, 1.0, 5.0, /*0*/
                                             -1.0, /*0*/ 5.0,
                                                        11.0, /*0*/
                                                         5.0

The modified

    float a[18] = { 7.0, /*0*/ 1.0, /*0*/ /*0*/ 2.0, 7.0, /*0*/
                         -0.0, 8.0, /*0*/ 2.0, /*0*/ /*0*/ /*0*/
                             1.0, /*0*/ /*0*/ /*0*/ /*0*/ 5.0,
                                   7.0, /*0*/ /*0*/ 9.0, /*0*/
                                            5.0, 1.0, 5.0, /*0*/
                                             -1.0, /*0*/ 5.0,
                                                        11.0, /*0*/
                                                         5.0

I am using these routines for couples of days and I may miss something.

Thank you for your help.

Nima Mansouri

Attachment	Size
Download cl_solver_sym_sp_0_based_c.c	8.89 KB

↧

lapack function cpbtrs slower in mkl 18.0 vs 14.0

July 24, 2018, 4:45 pm

Latest and popular articles on Intel Technologies

≫ Next: enabling MKL breaks existing openMP code

≪ Previous: Issue in cluster_sparse_solver examples

Hi,

I am experiencing a slowdown of cpbtrs function in mkl 18.0 comparing with mkl 14.0. My system is a Xeon E3-1240 v3

The following (single-threaded) code seems to run more than 2x slower with 18.0:

         niter = 100000
         n   = 60
         nbd = 11
         ldb = 181

         allocate(a(nbd*n,niter))
         allocate(b(ldb,niter))

         a = cmplx(0.1,0.1)
         b = cmplx(0.5,0.5)

         do iter = 1 ,niter
            call CPBTRS('U', n,nbd -1, 1, a(:,iter), nbd, b(:,iter),ldb, status)
         enddo

The linking command that I used with ifort 18.0:

$INTEL_HOME/ifort -I$MKL_HOME/include/ cpbtrs.f90 
-Wl,--start-group -Wl,-Bstatic -L$MKL_HOME_LIB/lib 
-lmkl_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_lapack95_lp64 -liomp5 -Wl,--end-group

↧

enabling MKL breaks existing openMP code

July 25, 2018, 6:58 am

Latest and popular articles on Intel Technologies

≫ Next: thread safe random number generation

≪ Previous: lapack function cpbtrs slower in mkl 18.0 vs 14.0

In Windows using Visual studio 2015, I have a project that extensively uses OpenMP to speed up for loops. Works great - we've been using it for a long time and have had nothing but good experiences with it.

However, I've discovered that (after installing MKL) going into project properties and setting "use intel MKL" to parallel completely breaks these loops such that only a single core is now used (which in my case renders the program unusable), even though I have not yet added a single line of MKL code! Even worse, none of the suggested "workarounds" I have seen appear to fix it. Functions like omp_set_num_threads, mkl_set_num_threads and omp_set_dynamic appear to do absolutely nothing. If I instead set MKL to sequential rather than parallel, I appear to not have this problem, however due to how mysterious this behavior seems I don't feel comfortable just setting it that way and leaving it alone without getting some answers first (not to mention that I may well run into a scenario later where I want to run an MKL routine in parallel).

Can someone please explain what is going on here and how to correctly link MKL into a program configured as parallel while still having said program be able to use all cores in omp parallel for loops? What is the actual logic for how Intel allocates threads to openMP and MKL processes, and how can the programmer step in and control this when Intel's default logic does not produce the desired outcome? Fundamentally, I don't at all understand why the enabling of MKL to potentially run parallel code should lead to any interference with OpenMP parallel for loops when there is no actual parallel MKL code being run.

Thanks in advance.

↧

thread safe random number generation

July 25, 2018, 5:47 pm

Latest and popular articles on Intel Technologies

≫ Next: undefined reference to `mkl_dnn_getTtl_F32'

≪ Previous: enabling MKL breaks existing openMP code

Hi all,

this code implements a random number generator class and test class. An object of the test class contains a pointer to an object of the random number class which can either be allocated individually or point to a global object. The number of objects created for the test class may be several thousand. The question is whether this code for random generation thread safe?:

module Mod_Ran
  Implicit None
  Private
  Type, Public :: Ran
    Integer, Private :: Error=0
    Integer(kind=8), allocatable :: Seed
    Character(:), allocatable :: CSMSG
    Type(VSL_STREAM_STATE), allocatable :: TSS
  contains
    Private
    Procedure, PAss, Public :: Init => SubInit
    Generic, Public :: GetUniform => SubGetUniformVector
  End type Ran
contains
  Subroutine SubInit(this)
    Implicit None
    CLass(Ran), Intent(InOut) :: this
    Integer(Kind=8) :: brng
    Brng=VSL_BRNG_MCG59 !1.20
    if(allocated(this%TSS)) Deallocate(this%TSS)
    Allocate(this%TSS)
    if(.not.allocated(this%ISSeed)) Then
      Allocate(this%Seed,source=12345)
    End if
    this%Error=vslnewstream(this%TSS,brng,this%Seed)
  End Subroutine SubInit
  Subroutine SubGetUniformvector(this,InOut,lb,rb)
    Implicit None
    CLass(Ran), Intent(InOut) :: this
    Real(Kind=8), Intent(In) :: rb, lb
    Real(Kind=8), Intent(InOut) :: InOut(:)
    this%error=vdrnguniform(&
      &method=VSL_RNG_METHOD_UNIFORM_STD_ACCURATE,&
      &stream=this%TSS,&
      &n=size(InOUt,1),&
      &r=InOut,&
      &a=lb,&
      &b=rb)
  End Subroutine SubGetUniformVector
End module Ran
!!@@@@@@@@@@@@@@@@@@@@@@@@@@
!!@@@@@@@@@@@@@@@@@@@@@@@@@@
!!@@@@@@@@@@@@@@@@@@@@@@@@@@
Module Mod_Type
  use Mod_Ran
  private
  Type, Public :: TT
    integer(kind=8), allocatable :: seed
    Type(Ran), Pointer :: TSR=>Null()
    Real(kind=8), allocatable :: tmp(:)
  contains
    Procedure, Pass :: fill => subFill
  End type TT
contains
  Subroutine SubFill(this)
    Implicit None
    real(kind=8) :: rsr
    Class(TT), Intent(InOut) :: this
    if(.not.allocated(this%tmp)) Then
      allocate(this%tmp(100))
    End if
    !!@@@@@@@@@@@@@@
    !!check whether a stream exists, if not create one
    if(.not.associated(this%tsr)) Then
      if(.not.allocated(this%Seed)) Then
        call random_number(rsr)
        allocate(this%seed,source=int(rsr*100000.0D0,kind=8))
      End if
      Allocate(this%tsr);Allocate(this%tsran%seed,source=this%Seed)
    End if
    call this%tsr%getuniform(inout=this%tmp,lb=0.0D0,rb=1.0D0)
  End Subroutine SubFill
End Module Mod_Type
!!@@@@@@@@@@@@@@@@@@@@@
!!@@@@@@@@@@@@@@@@@@@@@
!!@@@@@@@@@@@@@@@@@@@@@
Program Test
  use Mod_Type
  use Mod_Ran
  Implicit none
  Type(TT), allocatable :: TVT(:)
  Type(Ran), allocatable, Target :: TSR
  Integer :: i
  Allocate(TVT(10000))
  !!@@@@@@@@@@@@@@@@
  !!option 1: everybody gets its on stream
  !$OMP PARALLEL DO
  Do i=1,size(TVT)
    call tvt(i)%fill()
  End Do
  !$OMP END PARALLEL DO
  !!@@@@@@@@@@@@@@@@
  !!option 2: everybody uses the same stream
  Allocate(TSR); call TSR%init(800466)
  !$OMP PARALLEL DO PUBLIC(TSR)
  Do i=1,size(TVT)
    tvt(i)%TSR=>TSR
    call tvt(i)%fill()
  End Do
  !$OMP END PARALLEL DO
end Program Test

From what I understood from the MKL manual, option 1 should be thread safe, option 2 not (in terms of correlations). Is that right?

Thanks a lot

karl

↧

undefined reference to `mkl_dnn_getTtl_F32'

July 27, 2018, 12:01 am

Latest and popular articles on Intel Technologies

≫ Next: MODERATOR, Please Delete!! C++, cheking allocation of complex array with FFTW wrappers

≪ Previous: thread safe random number generation

i'm trying to run this script :

----- Compiling clang_lp64_parallel_intel64_lib ----- s_score_sample
clang -m64 -w -I"/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/include" \
s_score_sample.c \
"/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_intel_lp64.a" \
"/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_intel_thread.a" \
"/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a" \
-Wl,-rpath,/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin -Wl,-rpath,/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/../compiler/lib \
-L"/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/../compiler/lib" -liomp5 -lpthread -lm -o _results/clang_lp64_parallel_intel64_lib/s_score_sample.outwhy

does anyone know why could i possibly be having the following error message ?

/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conversion_s_avx512_mic.o): In function `mkl_dnn_avx512_mic_doConversionSimplest_F32':
conversion.c:(.text+0x4d5): undefined reference to `mkl_dnn_getTtl_F32'
conversion.c:(.text+0x4ea): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conversion_s_avx512_mic.o): In function `cvFltSimpleToBlkPclFwd':
conversion.c:(.text+0x1092): undefined reference to `mkl_dnn_getTtl_F32'
conversion.c:(.text+0x10ae): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conversion_s_avx512_mic.o): In function `cvFltBlkPclFwdToSimple':
conversion.c:(.text+0x1dcc): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conversion_s_avx512_mic.o):conversion.c:(.text+0x1de8): more undefined references to `mkl_dnn_getTtl_F32' follow
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx512_mic.o): In function `doit_bwd_filt_par':
gemmConvolution.c:(.text+0xf93): undefined reference to `mkl_blas_sgemm'
gemmConvolution.c:(.text+0x102b): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx512_mic.o): In function `mkl_dnn_avx512_mic_bkdGemmDirectConv_F32':
gemmConvolution.c:(.text+0x1958): undefined reference to `mkl_dnn_getTtl_F32'
gemmConvolution.c:(.text+0x1a7b): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx512_mic.o): In function `doit_fwd_par':
gemmConvolution.c:(.text+0x26bc): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx512_mic.o): In function `doit_bwd_data_par':
gemmConvolution.c:(.text+0x2c6f): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx512_mic.o): In function `mkl_dnn_avx512_mic_doConversion_Simple_To_PCLData_F32':
conv_pcl.c:(.text+0x2495): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x24b1): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx512_mic.o): In function `mkl_dnn_avx512_mic_doConversion_PCLData_To_Simple_F32':
conv_pcl.c:(.text+0x4142): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x41b5): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x41d1): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx512_mic.o):conv_pcl.c:(.text+0x4b85): more undefined references to `mkl_dnn_getTtl_F32' follow
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx512.o): In function `doit_bwd_filt_par':
gemmConvolution.c:(.text+0xe08): undefined reference to `mkl_blas_sgemm'
gemmConvolution.c:(.text+0xe94): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx512.o): In function `mkl_dnn_avx512_bkdGemmDirectConv_F32':
gemmConvolution.c:(.text+0x1645): undefined reference to `mkl_dnn_getTtl_F32'
gemmConvolution.c:(.text+0x1769): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx512.o): In function `doit_fwd_par':
gemmConvolution.c:(.text+0x221b): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx512.o): In function `doit_bwd_data_par':
gemmConvolution.c:(.text+0x26ef): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx512.o): In function `mkl_dnn_avx512_doConversion_Simple_To_PCLData_F32':
conv_pcl.c:(.text+0x1ac4): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x1ae0): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx512.o): In function `mkl_dnn_avx512_doConversion_PCLData_To_Simple_F32':
conv_pcl.c:(.text+0x2e51): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x2ec3): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x2edf): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx512.o):conv_pcl.c:(.text+0x3835): more undefined references to `mkl_dnn_getTtl_F32' follow
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx2.o): In function `doit_bwd_filt_par':
gemmConvolution.c:(.text+0xc6c): undefined reference to `mkl_blas_sgemm'
gemmConvolution.c:(.text+0xcf8): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx2.o): In function `mkl_dnn_avx2_bkdGemmDirectConv_F32':
gemmConvolution.c:(.text+0x13f5): undefined reference to `mkl_dnn_getTtl_F32'
gemmConvolution.c:(.text+0x1519): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx2.o): In function `doit_fwd_par':
gemmConvolution.c:(.text+0x1e4e): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx2.o): In function `doit_bwd_data_par':
gemmConvolution.c:(.text+0x2301): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx2.o): In function `mkl_dnn_avx2_doConversion_Simple_To_PCLData_F32':
conv_pcl.c:(.text+0x1ea4): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x1ec0): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx2.o): In function `mkl_dnn_avx2_doConversion_PCLData_To_Simple_F32':
conv_pcl.c:(.text+0x3801): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x3873): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x388f): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx2.o):conv_pcl.c:(.text+0x41a5): more undefined references to `mkl_dnn_getTtl_F32' follow
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx.o): In function `doit_bwd_filt_par':
gemmConvolution.c:(.text+0xa92): undefined reference to `mkl_blas_sgemm'
gemmConvolution.c:(.text+0xb1b): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx.o): In function `mkl_dnn_avx_bkdGemmDirectConv_F32':
gemmConvolution.c:(.text+0x11f5): undefined reference to `mkl_dnn_getTtl_F32'
gemmConvolution.c:(.text+0x1319): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx.o): In function `doit_fwd_par':
gemmConvolution.c:(.text+0x1a75): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_avx.o): In function `doit_bwd_data_par':
gemmConvolution.c:(.text+0x1ee3): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx.o): In function `mkl_dnn_avx_doConversion_Simple_To_PCLData_F32':
conv_pcl.c:(.text+0x1fc4): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x1fe0): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx.o): In function `mkl_dnn_avx_doConversion_PCLData_To_Simple_F32':
conv_pcl.c:(.text+0x3ad1): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x3b43): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x3b5f): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_avx.o):conv_pcl.c:(.text+0x4535): more undefined references to `mkl_dnn_getTtl_F32' follow
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_mc3.o): In function `doit_bwd_filt_par':
gemmConvolution.c:(.text+0xa7d): undefined reference to `mkl_blas_sgemm'
gemmConvolution.c:(.text+0xb13): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_mc3.o): In function `mkl_dnn_mc3_bkdGemmDirectConv_F32':
gemmConvolution.c:(.text+0x1205): undefined reference to `mkl_dnn_getTtl_F32'
gemmConvolution.c:(.text+0x1329): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_mc3.o): In function `doit_fwd_par':
gemmConvolution.c:(.text+0x1a0d): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_mc3.o): In function `doit_bwd_data_par':
gemmConvolution.c:(.text+0x1dff): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_mc3.o): In function `mkl_dnn_mc3_doConversion_Simple_To_PCLData_F32':
conv_pcl.c:(.text+0x2094): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x20b0): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_mc3.o): In function `mkl_dnn_mc3_doConversion_PCLData_To_Simple_F32':
conv_pcl.c:(.text+0x3d11): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x3d83): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x3d9f): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_mc3.o):conv_pcl.c:(.text+0x4675): more undefined references to `mkl_dnn_getTtl_F32' follow
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_mc.o): In function `doit_bwd_filt_par':
gemmConvolution.c:(.text+0xa7d): undefined reference to `mkl_blas_sgemm'
gemmConvolution.c:(.text+0xb13): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_mc.o): In function `mkl_dnn_mc_bkdGemmDirectConv_F32':
gemmConvolution.c:(.text+0x1205): undefined reference to `mkl_dnn_getTtl_F32'
gemmConvolution.c:(.text+0x1329): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_mc.o): In function `doit_fwd_par':
gemmConvolution.c:(.text+0x1a0d): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_mc.o): In function `doit_bwd_data_par':
gemmConvolution.c:(.text+0x1dff): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_mc.o): In function `mkl_dnn_mc_doConversion_Simple_To_PCLData_F32':
conv_pcl.c:(.text+0x1ee4): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x1f00): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_mc.o): In function `mkl_dnn_mc_doConversion_PCLData_To_Simple_F32':
conv_pcl.c:(.text+0x3db1): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x3e23): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x3e3f): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_mc.o):conv_pcl.c:(.text+0x4715): more undefined references to `mkl_dnn_getTtl_F32' follow
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_def.o): In function `doit_bwd_filt_par':
gemmConvolution.c:(.text+0xa7d): undefined reference to `mkl_blas_sgemm'
gemmConvolution.c:(.text+0xb13): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_def.o): In function `mkl_dnn_def_bkdGemmDirectConv_F32':
gemmConvolution.c:(.text+0x1205): undefined reference to `mkl_dnn_getTtl_F32'
gemmConvolution.c:(.text+0x1329): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_def.o): In function `doit_fwd_par':
gemmConvolution.c:(.text+0x1a0d): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(gemmConvolution_s_def.o): In function `doit_bwd_data_par':
gemmConvolution.c:(.text+0x1e00): undefined reference to `mkl_blas_sgemm'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_def.o): In function `mkl_dnn_def_doConversion_Simple_To_PCLData_F32':
conv_pcl.c:(.text+0x1ee4): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x1f00): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_def.o): In function `mkl_dnn_def_doConversion_PCLData_To_Simple_F32':
conv_pcl.c:(.text+0x3dd1): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x3e43): undefined reference to `mkl_dnn_getTtl_F32'
conv_pcl.c:(.text+0x3e5f): undefined reference to `mkl_dnn_getTtl_F32'
/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib//intel64_lin/libmkl_core.a(conv_pcl_s_def.o):conv_pcl.c:(.text+0x4735): more undefined references to `mkl_dnn_getTtl_F32' follow
clang: error: linker command failed with exit code 1 (use -v to see invocation)
makefile:206: recipe for target 's_score_sample' failed
make[1]: *** [s_score_sample] Error 1
makefile:199: recipe for target 'libintel64' failed
make: *** [libintel64] Error 2

↧

MODERATOR, Please Delete!! C++, cheking allocation of complex array with FFTW wrappers

July 29, 2018, 5:53 am

Latest and popular articles on Intel Technologies

≫ Next: Link errors for MKL function from mkl_scalapack.h with C++

≪ Previous: undefined reference to `mkl_dnn_getTtl_F32'

Moderator, can you delete this topic?!

After some hours debugging, I was doing improper error checking on other parts of the code that led me to believe that an if() testing a fftw wrapped array was being evaluated true, but it was not the case.

Thanks.

=============

Good afternoon, everybody.

I have a 1D FFT implementation, originally written for FFTW, and am crash-testing the code to catch memory allocation problems before running the transform. In the code below I declare the complex array and condition the creation of the plan to the allocation of the array, and if all goes fine, run the transform.

// using 32bit precision, I don't need more than this
fftwf_complex	*fft_complex_1D;
fftwf_plan	plan_fwfft_1D;

// Allocates the complex array
fft_complex_1D = (fftwf_complex*)fftwf_malloc(sizeof(fftwf_complex) * array_length_fft_1D);

// Creates the plan if the allocation worked
if (fft_complex_1D != NULL)
    plan_fwfft_1D = fftwf_plan_dft_r2c_1d(static_cast<int>(array_length_fft_1D), in_fft_float_1D, fft_complex_1D, FFTW_ESTIMATE);

// If the plan was created successfully, run it
if (plan_fwfft_1D != NULL)
    fftwf_execute(plan_fwfft_1D);

The variable array_length_fft_1D is a size_t provided by user and in_fft_float_1D is an array previously declared and initialized. When I use values up to 2^29 for the length, it will work fine, but when I test the memory allocation with a value that I know my machine can't allocate, such as 2^31, I should get the line "if (fft_complex_1D != NULL)" evaluated to false. But it evaluates to true.

Then the next test, if the plan was created, the transform doesn't execute because "if (plan_fwfft_1D != NULL)" evaluates to false. Printing the value of fft_complex_1D gives 0000000000000000, so indeed it couldn't allocate the complex array of the specified length.

Do you see anything wrong here, or is there any consideration you want to make about the first "if" evaluating to true when, apparently, it shouldn't?

↧

Link errors for MKL function from mkl_scalapack.h with C++

July 30, 2018, 9:23 am

Latest and popular articles on Intel Technologies

≫ Next: No threading observed during factoring or solving with Pardiso on OSX/Clang

≪ Previous: MODERATOR, Please Delete!! C++, cheking allocation of complex array with FFTW wrappers

Hello,
I hope you could help please in this problem.
I try to compile a small C++ function that calls the MKL function "pdgesvd" from the header "mkl_scalapack.h".
The C++ code is:

#include "mkl_scalapack.h"
int main()
{
// defining the variables ...

   pdgesvd(jobu, jobvt, m, n,
           a, ia, ja, desca,
           s, u, iu, ju, descu,
           vt, ivt, jvt, descvt,
           work, lwork, info);

return 0;
}

The link fails with the errors in the attached file "link_errors.pdf".

I have run the script:
source /intel/bin/compilervars.sh -arch intel64 -platform linux

I got the compilation/link options from the online "Intel® Math Kernel Library Link Line Advisor" (the options attached in the snapshot intel_advisor.jpg) and compiling as below:

g++ -O main.cpp \
-DMKL_ILP64 -m64 -I${MKLROOT}/include \
${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_cdft_core.a ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

I would appreciate any assistance,

Thank you.

Attachment	Size
Download intel_advisor.jpg	63.52 KB
Download link_errors.pdf	16.85 KB

↧

No threading observed during factoring or solving with Pardiso on OSX/Clang

July 30, 2018, 9:48 am

Latest and popular articles on Intel Technologies

≫ Next: "could not initialize a memory descriptor" error using tensorflow on windows

≪ Previous: Link errors for MKL function from mkl_scalapack.h with C++

We are using Pardiso (from MKL 2017 update 2 with the latest Eigen wrappers) to factor and solve large sparse symmetric positive-definite matrices (e.g. 3Mx3M) and, although everything works fine, there is absolutely no threading observed on Activity Monitor. We use TBB only (no OpenMP). TBB is linked dynamically and MKL statically. We compile and link with Clang 3.9 on OSX 10.12.

My main question is whether Pardiso is expected to utilize multiple cores for either factoring (LDLT or LU) and/or solving. If so, what could I be missing?
Also, we would really need for mkl_progress() to be called so that we can stop long computations but that doesn't seem to happen.

On a side note, the only threading behaviour I've observed was when OOC was enabled. Possibly related to that, my override of mkl_progress() was also only called in that mode and only for very large matrices.

↧

"could not initialize a memory descriptor" error using tensorflow on windows

August 1, 2018, 5:09 am

Latest and popular articles on Intel Technologies

≫ Next: MKL Performance issue in threaded application

≪ Previous: No threading observed during factoring or solving with Pardiso on OSX/Clang

I've built tensorflow (r1.9, using the CMake tools) linked with MKL (v2018 U3) and mkl-dnn (v0.15). I'm running windows 10 (build 17134) on an Intel Core i7-7820HQ CPU.

I've built mkl-dnn from source and it's test are passing. However, a C++ project that loads a pre-trained tensorflow graph and passes an image for inference gives the following error when calling Session::run() :

W d:\dev\tensorflow\tensorflow\core\framework\op_kernel.cc:1318] OP_REQUIRES failed at mkl_conv_ops.cc:888 : Aborted: Operation received an exception:Status: 3, message: could not initialize a memory descriptor, in file d:\dev\tensorflow\tensorflow\core\kernels\mkl_conv_ops.cc:886
Aborted: Operation received an exception:Status: 3, message: could not initialize a memory descriptor, in file d:\dev\tensorflow\tensorflow\core\kernels\mkl_conv_ops.cc:886
         [[Node: conv1/BiasAdd = _MklConv2DWithBias[T=DT_FLOAT, _kernel="MklOp", data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](conv1_pad/Pad, conv1/kernel, conv1/bias, DMT/_0, DMT/_1, DMT/_2)]]

The same code (with the same pre-trained model) does work on a linux machine (Intel(R) Xeon(R) CPU E5-2673 v3, running on Microsoft Azure), also with TF built with mkl-dnn.

What am I doing wrong?

Thanks.

↧

MKL Performance issue in threaded application

August 1, 2018, 6:43 pm

Latest and popular articles on Intel Technologies

≫ Next: Setting up a FFT 2D, defining some parameters

≪ Previous: "could not initialize a memory descriptor" error using tensorflow on windows

We are working on RNN kernel optimization and we are trying to parallel 2 SGEMM on 2 socket SKX6148 server( 20 core per socket).

The SGEMM size is M = 20， N = 2400， K = 800.

Our target is to map the first SGEMM to socket0 and the other SGEMM to socket1.

We measured the GFLOPS with this benchmark(https://github.com/xhzhao/GemmEfficiency/tree/tbb), and got the following performance data:

OMP 1 x 40 core 2261 GFLOPS code: https://github.com/xhzhao/GemmEfficiency/blob/tbb/test_omp.cpp#L120
Pthread 2 * 20 core 3550 GFLOPS code: https://github.com/xhzhao/GemmEfficiency/blob/tbb/test_omp.cpp#L291
OMP Nested 2 x 20 core 1068 GFLOPS code: https://github.com/xhzhao/GemmEfficiency/blob/tbb/test_omp.cpp#L336
TBB Nested 2 x 20 core 752 GFLOPS code: https://github.com/xhzhao/GemmEfficiency/blob/tbb/test_tbb.cpp#L159

I found that the performance of OMP+MKL or TBB MKL is not as good as we expect, and i'm not sure if i miss something with MKL in threaded application.

BTW, the pthread+MKL solution is not suitable for our real case , as it will double the threads and make the performance even worse.

Thanks in advance.

↧

Setting up a FFT 2D, defining some parameters

August 3, 2018, 2:03 pm

Latest and popular articles on Intel Technologies

≫ Next: zgemm crash with signal 11

≪ Previous: MKL Performance issue in threaded application

Good evening, all.

I have successfully used FFTW wrappers for FFT 1D but am writing the 2D portion of my program using the proper MKL interface. However, there are a few points that I am in doubt as I couldn't get enough information from the development reference or other threads.

The matrix I'm working on has 2^14 columns by 2^15 rows, or 536870912 elements. The descriptor for the forward R2C transform is as follows (omitting declaration of descriptor and error checking), following the required parameters as described in manual page 2127 of the 2018 reference:

DftiCreateDescriptor(&fwfft_2D_handle, DFTI_SINGLE, DFTI_REAL, 2, <size>);
DftiSetValue(fwfft_2D_handle, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);
DftiSetValue(fwfft_2D_handle, DFTI_PACKED_FORMAT, DFTI_CCE_FORMAT);
DftiSetValue(fwfft_2D_handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
DftiSetValue(fwfft_2D_handle, DFTI_INPUT_STRIDES, <stride>);
DftiSetValue(fwfft_2D_handle, DFTI_OUTPUT_STRIDES, <stride>);
DftiCommitDescriptor(fwfft_2D_handle);

Notice that I left <size> and <stride> undefined here as they are the subjects of my questions below:

- If I am looping through the rows, should <size> be the number of columns of the matrix? If so, how do I provide the length of the rows?
- Is <stride> an array of MKL_LONG of the individual position of each element? If so, then the program will need 2 float arrays (input, output), 1 complex, and yet another 1 of MKL_LONG?!

If I am messing up with the understanding, I appreciate any guidance on how to setup these parameters of a 2D transform for, say, a matrix of 16384 columns by 32768 rows.

Thanks in advance.

↧

zgemm crash with signal 11

August 6, 2018, 11:20 pm

Latest and popular articles on Intel Technologies

≫ Next: where can I get old mkl version 11.3 for windows?

≪ Previous: Setting up a FFT 2D, defining some parameters

Hi,

We have huge matrix with about 100GB, and zgemm crashed. There is enough memory.

The Stack trace shows the problematic routine is : mkl_blas_avx2_zgemm_zcopy_right6_ea

We used 12 cpus with the multi-threaded lib.

What can we do to avoid such issue?

Thanks for the help.

↧

where can I get old mkl version 11.3 for windows?

August 7, 2018, 5:45 pm

Latest and popular articles on Intel Technologies

≫ Next: PARDISO Error: -2

≪ Previous: zgemm crash with signal 11

Can I get mkl version 11.3 for windows?

↧

PARDISO Error: -2

August 13, 2018, 5:12 am

Latest and popular articles on Intel Technologies

≫ Next: MKL/Xeon Phi Offload Runtime Issue - 3120A

≪ Previous: where can I get old mkl version 11.3 for windows?

Hi all,

I have recently started using MKL. I'm trying to solve Ax=b sort of equation with sparse A matrix using PARDISO. It is actually time domain finite element problem and I need to call the solver in a for loop. So I solve Ax=b at each time step while using values of the previous step to update A matrix.

For smaller systems my code is running fine; however for bigger ones, depending on matrix size, after some time in the loop I get the error: -2 during numerical factorization. I don't think NNZ changes a lot.

I'm using Visual Studio on Windows and when I get the error at least half of the RAM seems to be available and empty. So I'm not sure why I get this error at all.

This a part from the message it prompts:

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

*** Error in PARDISO ( insufficient_memory) error_num= 8

*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 6861 bytes failed

total memory wanted here: 32291 kbyte

I'm open to try any suggestions.

Below is the part where I call PARDISO.

MKL_INT mtype = 11;       /* Real unsymmetric matrix */
MKL_INT nrhs = 1;     /* Number of right hand sides. */
void *pt[64]; 

char *uplo;
for (i = 0; i < 64; i++)
	{
		iparm[i] = 0;
	}
iparm[0] = 0;         /* No solver default */
iparm[1] = 2;         /* Fill-in reordering from METIS */
iparm[2] = 2;
iparm[3] = 0;         /* No iterative-direct algorithm */
iparm[4] = 0;         /* No user fill-in reducing permutation */
iparm[5] = 0;         /* Write solution into x */
iparm[6] = 0;         /* Not in use */
iparm[7] = 2;         /* Max numbers of iterative refinement steps */
iparm[8] = 0;         /* Not in use */
iparm[9] = 13;        /* Perturb the pivot elements with 1E-13 */
iparm[10] = 1;        /* Use nonsymmetric permutation and scaling MPS */
iparm[11] = 0;        /* Conjugate transposed/transpose solve */
iparm[12] = 1;        /* Maximum weighted matching algorithm is switched-on (default for non-symmetric) */
iparm[13] = 0;        /* Output: Number of perturbed pivots */
iparm[14] = 0;        /* Not in use */
iparm[15] = 0;        /* Not in use */
iparm[16] = 0;        /* Not in use */
iparm[17] = -1;       /* Output: Number of nonzeros in the factor LU */
iparm[18] = -1;       /* Output: Mflops for LU factorization */
iparm[19] = 0;        /* Output: Numbers of CG Iterations */
maxfct = 1;           /* Maximum number of numerical factorizations. */
mnum = 1;         /* Which factorization to use. */
msglvl = 1;           /* Print statistical information in file */
error = 0;            /* Initialize error flag */
for (i = 0; i < 64; i++)
	{
		pt[i] = 0;
	}
phase = 11;
PARDISO(pt, &maxfct, &mnum, &mtype, &phase,
		&aRowN, vallsA, rowwsA, collsA, &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error);
         	cout << iparm[16] << endl;
	if (error != 0)
	{
		printf("\nERROR during symbolic factorization: %d", error);
		cin.get();
		exit(1);
	}

	
phase = 22;
PARDISO(pt, &maxfct, &mnum, &mtype, &phase,
		&aRowN, vallsA, rowwsA, collsA, &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error);
	if (error != 0)
	{
		printf("\nERROR during numerical factorization: %d", error);
		cout << "Press any key to exit."<< endl;
		cin.get();
		exit(2);
	}

	
phase = 33;
	if (iparm[11] == 0)
		uplo = "non-transposed";

	PARDISO(pt, &maxfct, &mnum, &mtype, &phase,
			&aRowN, vallsA, rowwsA, collsA, &idum, &nrhs, iparm, &msglvl, matBB, solVals, &error);
	if (error != 0)
		{
			printf("\nERROR during solution: %d", error);
			cin.get();
			exit(3);
		}

		
phase = -1;           /* Release internal memory. */
	PARDISO(pt, &maxfct, &mnum, &mtype, &phase,
		&aRowN, &ddum, rowwsA, collsA, &idum, &nrhs,
		iparm, &msglvl, &ddum, &ddum, &error);

↧

MKL/Xeon Phi Offload Runtime Issue - 3120A

August 14, 2018, 12:23 am

Latest and popular articles on Intel Technologies

≫ Next: Query related to PDSYEV and PZHEEV

≪ Previous: PARDISO Error: -2

Hello there

I have set up my Xeon phi 3120A in Windows 10 Pro, with MPSS 3.8.4 and Parallel XE 2017 (Initial Release). I have chosen this Parallel XE as this was the last supported XE for the x100 series. I have installed the MKL version that is packaged with the Parallel XE 2017 (Initial Release).

What have I done / setup:

After setting up MPSS 3.8.4, and following the steps such as flashing and pinging, I have checked that micctrl -s shows “mic0 ready” (with linux image containing the appropriate KNC name), miccheck produces all "passes" and micinfo gives me a reading for all the key stats that the co-processor is providing.

Hence to me it looks like the co-processor is certainly installed and being recognised by my computer. I can also see that mic0 is up and running in the micsmc gui.

I have then set up my environment variables to enable automatic offload, namely, MKL_MIC_ENABLE=1, OFFLOAD_DEVICES= 0, MKL_MIC_MAX_MEMORY= 2GB, MIC_ENV_PREFIX= MIC, MIC_OMP_NUM_THREADS= 228, MIC_KMP_AFFINITY= balanced.

The Problem

When I go to run some simple code in R-3.4.3 (copied below, designed specifically for automatic offload), it keeps running the code through my host computer rather than running anything through the Xeon phi. To support this, I cannot see any activity onthe xeon Phis when I look at the micsmc gui.

The R code:

require(Matrix)
sink("output.txt")
N <- 16000
cat("Initialization...\n")
a <- matrix(runif(N*N), ncol=N, nrow=N);
b <- matrix(runif(N*N), ncol=N, nrow=N);
cat("Matrix-matrix multiplication of size ", N, "x", N, ":\n")
for (i in 1:5) {
  dt=system.time( c <- a %*% b )
  gflops = 2*N*N*N*1e-9/dt[3]
  cat("Trial: ", i, ", time: ", dt[3], " sec, performance: ", gflops, " GFLOP/s\n")
}

Other steps I have tried:

I then proceeded to set up the MKL_MIC_DISABLE_HOST_FALLBACK=1 environmental variable, and as expected, when I ran the above code, R terminated.

In https://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf it says that if the HOST_FALLBACK flag is active and offload is attempted but fails (due to “offload runtime cannot find a coprocessor or cannot initialize it properly”), it will terminate the program – this is happening in that R is terminating completely. For completeness, this problem is happening on R-3.5.1, Microsoft R Open 3.5.0 and R-3.2.1 as well.

So my questions are:

What am I missing to make the R code run on the Xeon phi? Can you please advise me on what I need to do to make this work?
(linked to 1) is there a way to check if the MKL offload runtime can see the Xeon phi? Or that it is correctly set up, or what (if any) problem that MKL is having initialising the Xeon phi?

Will sincerely appreciate your help – I believe that I am missing a fundamental/simple step, and have been tearing my hair out trying to make this work.

Many thanks in advance,

Keyur

↧

Query related to PDSYEV and PZHEEV

August 14, 2018, 4:40 am

Latest and popular articles on Intel Technologies

≫ Next: IMSL library call to Intel MKL

≪ Previous: MKL/Xeon Phi Offload Runtime Issue - 3120A

I have used the SCALAPACK routines PDSYEV and PZHEEV for diagonalization of symmetric and Hermitian matrices respectively. Using these routines to diagonalize a square matrix in a square process grid (for ex.- 2 x 2, 4 x 4) I have got correct eigenvalues and eigenvectors. The problem arises when I choose a rectangular process grid (for ex.- 2 x 4, 4 x 12 ) I get incorrect eigenvalues and eigenvectors. I wanted to know whether the SCALAPACK routines PDSYEV and PZHEEV can be applied to rectangular process grid or not.

↧

IMSL library call to Intel MKL

August 14, 2018, 9:27 pm

Latest and popular articles on Intel Technologies

≫ Next: pardiso C# "not enough memory"

≪ Previous: Query related to PDSYEV and PZHEEV

Hi,

I've recently re-compiled some Fortran code which uses the Intel MKL and the IMSL libraries. The code had compiled and run correctly under earlier releases ( 18.0.124 and earlier) of the compiler & MKL. There is a call from c:\Program Files (x86)\VNI\imsl\fn701\Intel64\lib\imslmkl_dll.dll to a routine in the MKL which is not being found. Has something changed in the new release (18.3.210) of MKL which makes it no longer compatible with IMSL 7.01. ?

The Error is:

"Entry Point Not Found

The procedure entry point mkl_lapack_ao_zunmqr could not be located in the dynamic link library C:\Program Files (x86)\VNI\imsl\fnl701\Intel64\lib\imslmkl_dll.dll"

Note: I am not compiling for Intel phi and there are no calls to zunmqr in my code...

I was using INCLUDE 'link_fnl_shared.h'

When I re-install version 8.0.124 of the Fortran compiler and re-compile, the code runs fine. When I install the latest update 3 (18.3.210) and re-compile I get an "Entry Point Error" as above. If I re-install 18.0.124 and re-compile all runs fine again. Also if 18.0.124 is the most recent Install performed I can still select 18.3.210 and re-compile and all runs fine. If I re-install 18.3.210 the error re-occurs even if I run an executable the was compiled while the 18.0.124 version was used to compile. So it does not appear to be a compiler versionissue, but a runtime issue between the IMSL library and what ever version of MKL is being accessed after the 18.3.210 Fortran install.

I've re-compiled using INCLUDE 'link_fnl_static.h' and the code appears to run file, however I have not re-installed ver 18.3.210... I will try this combination next...

Thanks for any assistance, I'd like to be able to keep the code up to date with the latest compiler/MKL and IMSL libs !

↧

pardiso C# "not enough memory"

August 15, 2018, 1:11 am

Latest and popular articles on Intel Technologies

≫ Next: pardiso forward substitution problems

≪ Previous: IMSL library call to Intel MKL

I'm trying to use pardiso in C#, in order to resolve a unsymmetric matrix of complex numbers.

I started executing an example I've found in internet, that works fine.

By the way when I replace the data with others, taken from my examples, it fails at the first step: factorization, with an error related to a memory "-2: not enough memory".

This is obviouslly not possible, because the matrix is quite small.

I suppose it should be related to some parameter, but there are many and it is possible that I've missed something.

Is there any one that can help me?

I attach the example, the system has 24 equations.

Thank you
Gianluca

Attachment	Size
Download TestPardiso.zip	11.41 KB

↧

pardiso forward substitution problems

August 15, 2018, 8:11 pm

Latest and popular articles on Intel Technologies

≫ Next: Deprecated Sparse BLAS Level 2 and Level 3 Routines

≪ Previous: pardiso C# "not enough memory"

Hi all,

I have difficulties to line up results from pardiso forward substitution with results from explicit trials (eg. in R or pyhton).

Let "C" be a positive definite symmetric squared matrix and "v" be a vector of 1 with length equal to the rows of "C".

When solving "Cb=v" pardiso results looking good with phase 33, where "b" contains the sum over rows of "C^{-1}".

Now "C" maybe decomposed into "LL'" (cholesky factor). A forward substitution would solve "Lb=v", thus the result must be the sum over the rows of "L^{-1}".

However, when feeding the same matrix "C" into pardiso, but setting phase to 331 the results doesn't seem to reflect the above. I guessed that it might be due to permutation. Retrieving the permutation by setting iparm(5) to reshuffle the result vector did not do anything. Moreover, since the range fo the results from pardiso where different to an explicit operation (taking the row sums of "L^{-1}"), the cause cannot the permutation.

Did I miss anything?? For instance is pardiso's LL' not the cholesky of C??

Any advice appreciated.

Thanks

NB: iparm(36) is zero

↧