Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 2652 articles
Browse latest View live

Can't start MKL pardiso in Mac OSX!

$
0
0

Hi, there is command line for the build an Intel example  pardiso_sym_f90.f90

ifort pardiso_sym_f90.f90 -I/opt/intel/mkl/include /opt/intel/mkl/lib/libmkl_core.a /opt/intel/mkl/lib/libmkl_intel_ilp64.a /opt/intel/mkl/lib/libmkl_intel_thread.a /opt/intel/lib/libiomp5.a

Compiling and linking was successful. However, a running output file (a.out) was not a beautiful so, 

 Reordering completed ... 

 The following ERROR was detected:           -1

 The following ERROR on release stage was detected:           -1

 

Thank you for any help!

Malik.

 

 

 

 

 


Pardiso memory problem with Visual Fortran

$
0
0

I use mkl with Visual Fortran Compiler XE 14.0.0.103. 

I employ the pardiso function for a program that involves solution of a system of linear equations: [A]{x} = {b}, where [A], {b} are given and we want to find vector {x}. My code has to solve this system of equations repeatedly (for many iterations), for different values of [A], {b}.  The values of [A], {b} at each iteration depends on the value of {x} from the previous iteration, so the algorithm is something like this:

Initialize {x}

do i = 1,Niter

[Find [A], {b}, given {x}]

[Solve [A]{x} = {b} and find updated {x}]

end do !i

My code has encountered a SERIOUS problem with memory management. Specifically, I see a continuous increase in the amount of memory used, until my computer runs out of memory and the program aborts. I do not have any dynamic memory allocation in my code (I do not use pointers), so I believe that the problem is due to mkl. 

I tried to include the line:

call mkl_free_buffers()

but this did not help in any way. I found some posts in this forum with similar comments, but I did not find anything helpful. Any help on this issue would be greatly appreciated!!

zgetrf/zgetri cannot invert matrix

$
0
0

Using code that worked with previous versions (Intel 12.1 OpenMPI 1.4.4 MKL 10.2.5.035 on CentOS 6) of MKL, zgetrf/zgetri cannot invert a matrix that it previously inverted.

Could the newer intel modules be changing how the arguments are accepted for lapack functions?

The pivot index is determined by the output of zgetrf( .. , ipvt , ..) which is then fed into zgetri as input similarly. I believe this is the way to do it.

I have just tested a sample program that only does the zgetrf and zgetri portions and I believe now that this is where the problem lies. I have tested two sample programs, one with a relevant 8x8 matrix that I calculate in my program, and one made-up example 8x8 matrix that I checked has an inverse.

Both programs run fine using the old modules, compiling with mpif90 etc as I do now when I am taking measurements. I get no runtime errors or error outputs from the zgetrf and zgetri subroutines. However, when using the new modules (intel2018), I do get an error output from the zgetri and zgetrf subroutines for both sample matrices. This error output, labeled "INFO" by default, rerturn a value of 1. To save you some searching, INFO should return 0 if successful (zgetrf factorizes the matrix to be inverted, zgetri actually inverts it). However if INFO = i > 0, then U(i,i) = 0 and the matrix is singular. This means that something that I'm doing with the new modules is causing a verifiable invertible matrix to be considered non-invertible, or singular.

Now, how this causes a segfault in the program at large I'm not 100% sure, it doesn't do so in my small sample program. However it seems very likely that since I do not suppress the output of the subroutines when it returns something other than a successful exit value for INFO, that the output of zgetri in this case is garbage that eventually causes some segfault error when it eventually gets multiplied by other matrices and is solved by zgetrs subroutine later on. I can verify that the output of zgetri when INFO=1 is nonsense from my small tests.

So to recap, I have found that using the new modules causes zgetri and zgetrf subroutines to return an error for the same, invertible matrices. This problem may also occur for the subroutine zgetrs, but I haven't tested it. In any case, the return value after the error then probably causes the segfault. The question remains, why is zgetrf/zgetri not working with the new modules?

 

Here is sample program (pgrmcheck_cond_zgetri.f90) which reproduces the problem:

! placeholder
!mpiifort -o pgrmcheck.x -O3 pgrmcheck_cond_zgetrf.f90 -i8
!-I${MKLROOT}/include/intel64/ilp64 -I${MKLROOT}/include
!${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64.a -L${MKLROOT}/lib/intel64
!-lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core
!-lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl

        PROGRAM dyncond_check
        IMPLICIT NONE
        include 'mpif.h'

        complex,DIMENSION(8) :: rightvec,work
        integer,dimension(8) ::ipvt
        complex,DIMENSION(8,8) :: tmatrixN1
        integer:: error1,error2,i,j

        tmatrixN1=0

        do i=1,8
         do j=1,8

         if (j<(i+1)) tmatrixN1(i,j)=i+(j-1)*8
         enddo
        enddo

        call mkl_set_num_threads(1)

        call zgetrf(8,8,tmatrixN1,8,ipvt,error1)
        print *,error1
        call zgetri(8,tmatrixN1,8,ipvt,work,8,error2)
        print *,error2
        do i=1,8
         do j=1,8
!                print *,tmatrixN1(i,j)
         enddo
        enddo

end program

 

Compiled with:

### Command to compile for intel 2018 modules (Not functioning) ###

module load intel/cluster/2018
mpiifort -o pgrmcheck2.x -O3 pgrmcheck_cond_zgetri.f90 -i8 -I${MKLROOT}/include/intel64/ilp64 -I${MKLROOT}/include ${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64.a -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl
./pgrmcheck2.x

###

### Command to compile for (old) intel 12.1 modules (functioning) ###

module load intel/12.1 ompi/1.4.4/intel mkl/10.2.5.035
mpif90 -o pgrmcheck2.x -O3 -r8 pgrmcheck_cond_zgetri.f90 -L/soft/intel/mkl/10.2.1.017/lib/em64t -lmkl_lapack -lmkl_intel_thread -lmkl_core -lguide -lmkl_intel_lp64
./pgrmcheck2.x

###

Thanks!

 

NNZ coefficients in dss/paridso LU factor

$
0
0

Hi,

I was wondering whether there is any way to obtain the number of non-zero coefficients of the matrix factor generated by the MKL function mkl_dss_real? I checked the MKL manual but there seems to be no way to get this number.

Background:

I want to multiply a vector v with a matrix K-1 , b=K-1v, where K is sparse and K-1 is not constructable. A way to obtain b=K-1v is to solve iteratively Kb=v for which I use the mkl_dss solver. In the special setting K is of dimension 2.5Mio x 2.5Mio, is symmetric and positive definite and has 14Mio NNZ coefficients. I understood that the dss_solver uses a LU factorization and subsequently foreward-backward substitution for solving. I also understood that the time complexity of forward/backward substitution is 2o(n2). Given the number of NNZ coefficients in K I could make a rough approximation of the number of floating point operations in routine "mkl_dss_solve" and the associated processing time. However, "mkl_dss_solve" needed much more (~x100) processing time. Currenly the only explanation for this observations is that the nnz coefficients in L/U must be much larger than in K.

Any suggestions are welcomed.

Thanks

potential bug in mkl configuration in version 2018 update 2

$
0
0

I installed Intel® Parallel Studio XE with VS 2017 integration on a Win 10 platform.

The parallel MKL option in the Properties page works fine as expected. But it seems like the sequential option sometimes fails the compilation using MSVC compiler. Linking error such as cannot find certain library (for example, libmmt.lib, ifconsol.lib) is generated here and there.

I know these libraries are located under windows/compiler/lib/intel_win folder on my machine. Manually adding this path to additional linking path can be a workaround. But I am keen to know what makes a difference between sequential and parallel configuration.

Then I investigated the toolset file Intel.Libs.MKL.v141.props. This file is located under this folder on my machine

C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Platforms\x64\PlatformToolsets\v141\ImportAfter

Starting from line 44,

<MKLProductDir>$([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\Intel\Suites\$(_MKLSubKey)\MKL', 'ProductDir', null, RegistryView.Registry32))</MKLProductDir>
    <MKLIncludeDir>$([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\Intel\Suites\$(_MKLSubKey)\MKL\$(ICPlatform)', 'IncludeDir', null, RegistryView.Registry32))</MKLIncludeDir>
    <MKLLibDir>$([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\Intel\Suites\$(_MKLSubKey)\MKL\$(ICPlatform)', 'LibDir', null, RegistryView.Registry32))</MKLLibDir>
    <OmpLibDir>$([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\Intel\Suites\$(_MKLSubKey)\MKL\$(ICPlatform)', 'OMPLibDir', null, RegistryView.Registry32))</OmpLibDir>

    <_MKLCombinedPath Condition="'$(UseIntelMKL)' != '' AND '$(UseIntelMKL)' != 'No'">$([System.IO.Path]::Combine($(MKLLibDir), ..\..\..\redist\$(IntelPlatform)\mkl))</_MKLCombinedPath>
    <MKLPath Condition="'$(_MKLCombinedPath)' !=''">$([System.IO.Path]::GetFullPath($(_MKLCombinedPath)));</MKLPath>

    <LibraryPath Condition="'$(UseIntelMKL)' != '' AND '$(UseIntelMKL)' != 'No' AND '$(UseEnv)' != 'true'">$(MKLLibDir);$(LibraryPath)</LibraryPath>
    <LibraryPath Condition="'$(UseIntelMKL)' != '' AND '$(UseIntelMKL)' != 'No' AND '$(UseIntelMKL)' != 'Sequential' AND '$(UseEnv)' != 'true'">$(OmpLibDir);$(LibraryPath)</LibraryPath>
    <IncludePath Condition="'$(UseIntelMKL)' != '' AND '$(UseIntelMKL)' != 'No' AND !$(MKLIncludeDir.Contains('/I')) AND '$(UseEnv)' != 'true'">$(MKLIncludeDir);$(IncludePath)</IncludePath>

The paths are retrieved from the registry, I echoed related variables and the results are

echo $(MKLLibDir)
C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.2.185\windows\mkl\lib\intel64_win

echo $(OmpLibDir)
C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.2.185\windows\compiler\lib\intel64_win

So in the sequential configuration, only MKLLibDir is added to LibraryPath, the parallel option further add OmpLibDir to it. This missing path causes the linking error in sequential configuration for some projects.

Now I can rewrite this condition such as

<!--     <LibraryPath Condition="'$(UseIntelMKL)' != '' AND '$(UseIntelMKL)' != 'No' AND '$(UseEnv)' != 'true'">$(MKLLibDir);$(LibraryPath)</LibraryPath>
    <LibraryPath Condition="'$(UseIntelMKL)' != '' AND '$(UseIntelMKL)' != 'No' AND '$(UseIntelMKL)' != 'Sequential' AND '$(UseEnv)' != 'true'">$(OmpLibDir);$(LibraryPath)</LibraryPath> -->
    <LibraryPath Condition="'$(UseIntelMKL)' != '' AND '$(UseIntelMKL)' != 'No' AND '$(UseEnv)' != 'true'">$(OmpLibDir);$(MKLLibDir);$(LibraryPath)</LibraryPath>

to include all paths into LibraryPath as long as MKL is switched on.

No problem is found for Intel compiler toolset, as in that case, all paths are added when combining ICIncludeDir with LibraryPath, this occurs in following file on my machine.

C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Platforms\x64\PlatformToolsets\Intel C++ Compiler 18.0\Toolset.props

Problems

Although I can fix it by myself, I still think that, the compiler lib path shall be included whatsoever, this shall be configured in a fashion similar to what is done in other toolsets.

In previous version I did not run into similar problem. Since this may fail other's projects, it shall be fixed am I right?

Bandwidth of matrix

$
0
0

Hi,

I'm using the DSS interface for solving a large linear set of equations. In order to do some analytical work on the performance of my implementation I would like to know the bandwidth of the system matrix. Is there a way to determine the bandwidth of the system matrix from the internal data structure? I also don't see how to use the indices of the non-zero entries to calculate the bandwidth, as these are the indices of the un-reordered system matrix.

Any help would be greatly appreciated. Many thanks in advance.

Regards,

Wouter

ScaLAPACK: pzheev error 706: (eigenvalue computation) needs MB=NB

$
0
0

UPDATE: sorry, my mistake, it IS in the documentation:

*  Alignment requirements
*  ======================
*
*  The distributed submatrices A(IA:*, JA:*) and C(IC:IC+M-1,JC:JC+N-1)
*  must verify some alignment properties, namely the following
*  expressions should be true:
*
*  ( MB_A.EQ.NB_A.EQ.MB_Z .AND. IROFFA.EQ.IROFFZ .AND. IROFFA.EQ.0 .AND.
*    IAROW.EQ.IZROW )
*  where
*  IROFFA = MOD( IA-1, MB_A ) and ICOFFA = MOD( JA-1, NB_A ).

 

---------------------------------------

The following is missing in the documentation of pzheev and one has to look into the source code in order to understand the error:

The blocking factors of the matrix to be diagonalized must be equal, otherwise one obtains an error with INFO= -706 (error in the 6th element of the DESCA array). This error originates from the source code lines in pzheev.f: (line 446 in the version that I have)

           ELSE IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN

               INFO = -( 700+NB_ )
           END IF

The message is in order to help others with the same problem and/or the people who maintain the documentation of pzheev.

 

MKL Operations in TensorFlow

$
0
0

 

Hi all,

I run a TensorFlow model Inception with Intel MKL-DNN support. 

The execution time of MKL operations in MKL (e.g. _MklConv2DBackpropFilter, _MklConv2D, _MklConv2DBackpropInput, etc) have no change when I change the number of intra-threads (means the number of threads running one operation). The results are as below. Does anyone know why the MKL the performance of MKL operations will not change when changing the number of threads running this operation? Or does anyone explain the implementation of MKL-DNN support for TensorFlow for this situation? Thank you!

Name                        Intra-threads           Time

_MklConv2DBackpropFilter     8                470
                                               17                466
                                               34                468
                                               68                467

_MklConv2DBackpropInput     8                354
                                               17                344
                                               34                347
                                               68                347
        
_MklConv2D                            8                300
                                               17                304
                                               34                311
                                               68                311

Kevin


Intel® MKL version 2018 Update 3 is now available

$
0
0

Intel® Math Kernel Library (Intel® MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance.

Intel MKL 2018 Update 3 packages are now ready for download.

Intel MKL is available as part of the Intel® Parallel Studio XE and Intel® System Studio. Please visit the Intel® Math Kernel Library Product Page.

Please see What's new in Intel MKL 2018 and in MKL 2018 Update 3 follow this link - https://software.intel.com/en-us/articles/intel-math-kernel-library-release-notes-and-new-features

 

Intel ode solvers

$
0
0

Здравствуйте. Можно ли узнать почему intel отказался от поддержки и развития библиотеки intel ode solvers ?

apt repository broken

$
0
0

Dear all,

I get the following errors in Ubuntu 18.04 when running

wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS...
apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
wget https://apt.repos.intel.com/setup/intelproducts.list -O /etc/apt/sources.list.d/intelproducts.list
# The previous three work without problems, but the following line has the issue:
apt-get update && apt-get install -y --no-install-recommends intel-mkl-64bit-2018.3

 

Hit:1 http://archive.ubuntu.com/ubuntu bionic InRelease
Hit:2 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
Get:4 https://apt.repos.intel.com/intelpython binary/ InRelease [1403 B]
Get:5 https://apt.repos.intel.com/mkl all InRelease [4422 B]
Hit:6 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
Get:7 https://apt.repos.intel.com/ipp all InRelease [4416 B]
Get:8 https://apt.repos.intel.com/tbb all InRelease [4400 B]
Get:9 https://apt.repos.intel.com/daal all InRelease [4414 B]
Get:10 https://apt.repos.intel.com/mpi all InRelease [4358 B]
Get:11 https://apt.repos.intel.com/intelpython binary/ Packages [1853 B]
Get:12 https://apt.repos.intel.com/mkl all/main amd64 Packages [39.3 kB]
Err:12 https://apt.repos.intel.com/mkl all/main amd64 Packages
  File has unexpected size (45690 != 39339). Mirror sync in progress? [IP: 23.67.128.143 443]
  Hashes of expected file:
   - Filesize:39339 [weak]
   - SHA512:6c55f783460cb6b09e2006fd85ad9c10d9abb0226de4f96d2c2ee3558ce3b593ddaa470170dfde7187271bbeae2f843e5bacce4095f7cb2ac0adac853096f99b
   - SHA256:0ecd4a8ec38322c936789aefd423fd0c562aa2088a0baa20b0cabc7824f9032e
   - SHA1:aa20c4f4f9944676a1421c22b794cde84b054ee2 [weak]
   - MD5Sum:439ae5cfb5a4584eeaec01f13f2acc63 [weak]
  Release file created at: Thu, 15 Mar 2018 11:31:06 +0000
Get:13 https://apt.repos.intel.com/mkl all/main all Packages [13.0 kB]
Err:13 https://apt.repos.intel.com/mkl all/main all Packages

Get:14 https://apt.repos.intel.com/ipp all/main all Packages [6740 B]
Get:15 https://apt.repos.intel.com/ipp all/main amd64 Packages [18.0 kB]
Get:16 https://apt.repos.intel.com/tbb all/main all Packages [5967 B]
Get:17 https://apt.repos.intel.com/tbb all/main amd64 Packages [8200 B]
Get:18 https://apt.repos.intel.com/daal all/main all Packages [6028 B]
Get:19 https://apt.repos.intel.com/daal all/main amd64 Packages [10.5 kB]
Get:20 https://apt.repos.intel.com/mpi all/main all Packages [791 B]
Get:21 https://apt.repos.intel.com/mpi all/main amd64 Packages [1725 B]
Fetched 83.3 kB in 2s (53.7 kB/s)
Reading package lists... Done
W: Conflicting distribution: https://apt.repos.intel.com/intelpython binary/ InRelease (expected binary/ but got )
E: Failed to fetch https://apt.repos.intel.com/mkl/dists/all/main/binary-amd64/Packages.gz  File has unexpected size (45690 != 39339). Mirror sync in progress? [IP: 23.67.128.143 443]
   Hashes of expected file:
    - Filesize:39339 [weak]
    - SHA512:6c55f783460cb6b09e2006fd85ad9c10d9abb0226de4f96d2c2ee3558ce3b593ddaa470170dfde7187271bbeae2f843e5bacce4095f7cb2ac0adac853096f99b
    - SHA256:0ecd4a8ec38322c936789aefd423fd0c562aa2088a0baa20b0cabc7824f9032e
    - SHA1:aa20c4f4f9944676a1421c22b794cde84b054ee2 [weak]
    - MD5Sum:439ae5cfb5a4584eeaec01f13f2acc63 [weak]
   Release file created at: Thu, 15 Mar 2018 11:31:06 +0000
E: Failed to fetch https://apt.repos.intel.com/mkl/dists/all/main/binary-all/Packages.gz
E: Some index files failed to download. They have been ignored, or old ones used instead.

It worked multiple times on Sunday, but that was still 2018.2, and it worked once today for 2018.3, but failed about 5 times today already. It appears as if one mirror is ok and another is in a broken state. Can anyone confirm this? Any ideas how to resolve this?

Thanks and best regards,

Jonathan

possible bug in 18.0.2 mkl

$
0
0

Hi there,

the following code yield as segfault at run time:

Program Test
  use lapack95
  Implicit none
  Real*8, allocatable :: x(:,:)
  Integer*8 :: ISError
  if(allocated(x)) Then
    call DPOTRF_F95(A=x,UPLO="U",INFO=ISError)
  End if
End Program Test

compier  flags were:

ifort -i8 -warn nounused -warn declarations -O0 -check all -warn interface -check noarg_temp_created -static -c -o NoOMP_MKLSEQ_ifort_4.16.12-1-ARCH/Test.o Test.f90 -I /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/include/intel64/ilp64

linker flags were:

ifort -i8 -warn nounused -warn declarations -O0 -check all -warn interface -check noarg_temp_created -static -o Test_NoOMP_MKLSEQ_4.16.12-1-ARCH NoOMP_MKLSEQ_ifort_4.16.12-1-ARCH/Test.o    /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_blas95_ilp64.a /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_lapack95_ilp64.a -Wl,--start-group /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_intel_ilp64.a /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_core.a /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_sequential.a -Wl,--end-group -lpthread -lm -ldl

runtime output  were:

user@linux:~/./Test_NoOMP_MKLSEQ_4.16.12-1-ARCH
Segmentation fault (core dumped)

ifort parallel studio version: 18.0.2, linux kernel: 4.16.12

Does run in 17.07.

Is this a bug and if yes is that fixed in 18.0.3??

Thanks

Cheers

Karl

Pardiso - storing factorized array

$
0
0

Hello, I am using pardiso through a subroutine in a program (made with Intel Visual Fortran). I call pardiso with phase=12, then I call with phase=33 (to obtain the solution). There are some cases where I have to use pardiso again, while the coefficient array has not changed from the previous call. In such cases, I do not need to conduct the factorization from scratch, and I would like to use pardiso through my subroutine only once, for phase=33, as the factorization of the coefficient array has already been done in previous calls. I wanted to ask if this is doable, and - if yes - how can I ensure that the factorization of my sparse coefficient array will be kept stored for future calls to pardiso.  

Numpy and Scipy with MKL: installation troubles

$
0
0

Hi,

I am currently trying to build Numpy and Scipy with MKL on Ubuntu 18.04 following your tutorial (but not using the Intel compilers). It seems that everything works for Numpy. Here is the 

np.__config__.show()

 

blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/intel/compilers_and_libraries_2018/linux/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/opt/intel/compilers_and_libraries_2018/linux/mkl/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/intel/compilers_and_libraries_2018/linux/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/opt/intel/compilers_and_libraries_2018/linux/mkl/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/intel/compilers_and_libraries_2018/linux/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/opt/intel/compilers_and_libraries_2018/linux/mkl/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/intel/compilers_and_libraries_2018/linux/mkl/lib/intel64']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/opt/intel/compilers_and_libraries_2018/linux/mkl/include']

 

but when trying to build SciPy, it happens that my system cannot find mutiarray which seems to be due to the fact that it cannot find 

libmkl_rt.so

I don't know how to work around that. I have tried following this tutorial too https://www.elliottforney.com/blog/npspmkl/

 

Any hints would be appreciated!

 

Thanks in advance

Default value for mkl_set_threading_layer() in SDL?

$
0
0

 

I am using MKL with SDL (mkl_rt.lib). I know I need to use mkl_set_threading_layer() to set the threading layer in my code.

My question is, if I don't call mkl_set_threading_layer(), is there a default value for this function? In other words, is there a default library that MKL will pick at run time?

Through my testing (on a Windows machine), it seems that MKL will use the MKL_THREADING_INTEL as default value if I don't set  mkl_set_threading_layer(). I hope someone from Intel can confirm.

The reason I am experimenting this is that, we have in our programs different components that load and use MKL independently. We wish to use different libraries (sequential/tbb/intel) for different components. We hope we can come up with a way that, within a component, if the mkl_set_threading_layer() is not called, MKL will use the openMP threaded library (MKL_THREADING_INTEL) by default, while in other components,  mkl_set_threading_layer() will be called if a specific library is to be used.

Hope someone can help to confirm or provide some suggestions. Thank you.

Ling


Pardiso Low rank Update does not accelerate the decomposition

$
0
0

Hello, I have a problem: Pardiso Low rank Update does not accelerate the decomposition. The usual pardiso decomposition is faster than decomposition with low rank update.

I have a complex symmetric matrix with number of nonzeros in factor about 2 millions. 

Only 400 elements of matrix was changed, structure of matrix wasn't changed.

Initialization:

	/* -------------------------------------------------------------------- */
	/* .. Setup Pardiso control parameters. */
	/* -------------------------------------------------------------------- */
	mtype = 6; /* Complex symmetric matrix */
	for (i = 0; i < 64; i++)
	{
		iparm[i] = 0;
	}
	iparm[0] = 1; // No solver default */
				  //iparm[1] = 2; // Fill-in reordering from METIS
	iparm[1] = 2;   //parallel(OpenMP) version of the nested dissection algorithm

					// Numbers of processors, value of OMP_NUM_THREADS
	iparm[2] = 0;
	iparm[3] = 0; // No iterative-direct algorithm 
	iparm[4] = 0; // No user fill-in reducing permutation 
	iparm[5] = 0; // Write solution into x 
	iparm[6] = 0; // Not in use 
	iparm[7] = 0; // Max numbers of iterative refinement steps 
	iparm[8] = 0; // Not in use 
	iparm[9] = 8; // Perturb the pivot elements with 1E-8
	iparm[10] = 0; // Use nonsymmetric permutation and scaling MPS 
	iparm[11] = 0; // Not in use 
	iparm[12] = 0; // Maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm[12] = 1 in case of inappropriate accuracy 
	iparm[13] = 0; // Output: Number of perturbed pivots 
	iparm[14] = 0; // Not in use 
	iparm[15] = 0; // Not in use 
	iparm[16] = 0; // Not in use 
	iparm[17] = -1; // Output: Number of nonzeros in the factor LU 
	iparm[18] = -1; // Output: Mflops for LU factorization 
	iparm[19] = 0; // Output: Numbers of CG Iterations 
	iparm[20] = 1;   // Apply 1x1 and 2x2 Bunch and Kaufman pivoting during the factorization process
	iparm[23] = 10;  // PARDISO uses new two - level factorization algorithm
	iparm[24] = 2; //Parallel forward/backward solve control. Intel MKL PARDISO uses a parallel algorithm for the solve step.
	iparm[26] = 1;
	iparm[30] = 0; // Partial solution
	iparm[34] = 1; //zero-based index


	maxfct = 1; // Maximum number of numerical factorizations. 
	mnum = 1; // Which factorization to use. 
	msglvl = 0; // Print statistical information in file 
	error = 0; // Initialize error flag 
			   /* -------------------------------------------------------------------- */
			   /* .. Initialize the internal solver memory pointer. This is only */
			   /* necessary for the FIRST call of the PARDISO solver. */
			   /* -------------------------------------------------------------------- */
	for (i = 0; i < 64; i++)
	{
		pt[i] = 0;
	}

	/* -------------------------------------------------------------------- */
	/* .. Reordering and Symbolic Factorization. This step also allocates */
	/* all memory that is necessary for the factorization. */
	/* -------------------------------------------------------------------- */

	phase = 11;
	PARDISO(pt, &maxfct, &mnum, &mtype, &phase,
		&nRows, complexValues, rowIndex, columns, &idum, &nRhs,
		iparm, &msglvl, &ddum, &ddum, &error);

	printf("\nReordering completed ...\n");
	printf("Number of nonzeros in factors = %d\n", iparm[17]);
	printf("Number of factorization MFLOPS = %d\n\n", iparm[18]);

	if (error != 0)
	{
		printf("\nERROR during symbolic factorization: %d", error);
		exit(1);
	}

Then, decompose with original matrix:
 

	// -------------------------------------------------------------------- 
	// .. Numerical factorization. 
	// -------------------------------------------------------------------- 

	phase = 22;
	PARDISO(pt, &maxfct, &mnum, &mtype, &phase,
		&nRows, complexValues, rowIndex, columns, &idum, &nRhs,
		iparm, &msglvl, &ddum, &ddum, &error);

	if (error != 0)
	{
		printf("\nERROR during numerical factorization: %d", error);
		exit(2);
	}

And after that after solving system, decompose with changed elements in matrix
 

	// -------------------------------------------------------------------- 
	// .. Numerical factorization. 
	// -------------------------------------------------------------------- 

	phase = 22;
	iparm[38] = 1;

	PARDISO(pt, &maxfct, &mnum, &mtype, &phase,
		&nRows, complexValues, rowIndex, columns, perm, &nRhs,
		iparm, &msglvl, &ddum, &ddum, &error);

	if (error != 0)
	{
		printf("\nERROR during numerical factorization: %d", error);
		exit(2);
	}

	iparm[38] = 0;

Vector perm contains row and columns indexes of changed elements in all matrix. But I have complex symmetric matrix, should I build vector perm only with changed elements in upper triangle? 

FGMRES ILU, Diagonal and RCI iterative solvers

$
0
0

I am trying to put together an FGMRES iterative sparse solver routine, that uses one of the examples given by INTEL, (I have tried many technics; the ILU, Diagonal preconditioner and the RCI; Reverse Communication Interface - the code is being written in c++).

I am having some trouble with the results of all the solver types above, the given solution seems does contain the correct results, and want to rule out a problem in how I am applying the preconditioner or the RCI.

My sparse matrix is in a compressed row format with a column index array (ja), row pointer array (ia), and values (ar) and the right-hand side array (r_rhs) as shown in the MKL documentation. The indices in ja and iaare in the Fortran format, so they start with 1. m_dResidualValue is the tolerance (1.0e-7) in the default case, n is the number of rows, nz is the number of the non-zeros.

Please find the code used in theILU preconditioner, most of the code is from the FGMRES with ILU preconditioner examples, with a few small modifications.  Does this code look like the correct way to apply a diagonal preconditioner during the FGMRES solve with a preconditioner? I have included the entire iterative solve method I'm attempting for reference (again, most of the code here is directly from the FGMRES solver examples, so much of it should be correct). Obviously, portions of this code need to be cleaned up, but please bear with me as this is a first attempt at getting the solver to work.

 

int FGMRES_Iterative_Solver_ILU_Precond(CString path)
{

	int bSucceeded = 1;


	/*---------------------------------------------------------------------------
	/* Allocate storage for the ?par parameters and the solution/r_rhs/residual vectors
	/*---------------------------------------------------------------------------*/
	double *b, *computed_solution, *residual;
	double *trvec, *bilu0, *dpar, *tmp;; //trvec[*n],bilu0[*nz];
	MKL_INT *ipar, i;
	// allcoate memory 

	b = (double *)calloc(n, sizeof(double));
	computed_solution = (double *)calloc(n, sizeof(double));
	residual = (double *)calloc(n, sizeof(double));

	trvec = (double *)calloc(n, sizeof(double));
	bilu0 = (double *)calloc(nz, sizeof(double));
	ipar = (MKL_INT *)calloc(128, sizeof(MKL_INT));

	unsigned int iSize = n;
	MKL_INT sizetmp = (n)*(2 * (n)+1) + ((n)*((n)+9)) / 2 + 1;
	printf("sizetmpi= %d, sizeof(ia)= %d\n", sizetmp, sizeof(ia));

	//double dpar[size];
	// double tmp[sizetmp];
	dpar = (double *)calloc(128, sizeof(double));
	tmp = (double *)calloc(sizetmp, sizeof(double));
	printf("n=%d, nz= %d\n", n, nz);
	double tol = 1.0e-6;
	MKL_INT matsize = nz, incx = 1;
	double nrm2;

	//for(i=0; i<=*n; i++) --ia[i];
	//for(i=0; i<*nz; i++) --ja[i];

	/*---------------------------------------------------------------------------
	/* Some additional variables to use with the (P)FGMRES solver
	/*---------------------------------------------------------------------------*/
	MKL_INT itercount, ierr = 0;
	MKL_INT RCI_request, ivar;
	printf("LinearSolver: Line 48\n");
	double dvar;
	char cvar, cvar1, cvar2;


	/*---------------------------------------------------------------------------
	/* Initialize variables and the right hand side through matrix-vector product
	/*---------------------------------------------------------------------------*/
	ivar = n;
	cvar = 'n';
	/*---------------------------------------------------------------------------
	/* Save the right-hand side in vector b for future use
	/*---------------------------------------------------------------------------*/
	i = 1;
	dcopy(&n, r_rhs, &i, b, &i);

	//clbas_dcopy(ivar, r_rhs, i, b, i);
	/*---------------------------------------------------------------------------
	/* Initialize the initial guess
	/*---------------------------------------------------------------------------*/
	for (i = 0; i < n; i++)
		computed_solution[i] = 0.0;
	//computed_solution[0] = 1.0;


	/*---------------------------------------------------------------------------
	/* Initialize the solver
	/*---------------------------------------------------------------------------*/
	printf("dfgmres_init\n");
	dfgmres_init(&ivar, computed_solution, r_rhs, &RCI_request, ipar, dpar, tmp);
	printf("dfgmres_end\n");
	if (RCI_request != 0)
		goto FAILED;

	/*---------------------------------------------------------------------------
	/* Calculate ILU0 preconditioner.
	/*                      !ATTENTION!
	/* DCSRILU0 routine uses some IPAR, DPAR set by DFGMRES_INIT routine.
	/* Important for DCSRILU0 default entries set by DFGMRES_INIT are
	/* ipar[1] = 6 - output of error messages to the screen,
	/* ipar[5] = 1 - allow output of errors,
	/* ipar[30]= 0 - abort DCSRILU0 calculations if routine meets zero diagonal element.
	/*
	/* If ILU0 is going to be used out of MKL FGMRES context, than the values
	/* of ipar[1], ipar[5], ipar[30], dpar[30], and dpar[31] should be user
	/* provided before the DCSRILU0 routine call.
	/*
	/* In this example, specific for DCSRILU0 entries are set in turn:
	/* ipar[30]= 1 - change small diagonal value to that given by dpar[31],
	/* dpar[30]= 1.E-20 instead of the default value set by DFGMRES_INIT.
	/*                  It is a small value to compare a diagonal entry with it.
	/* dpar[31]= 1.E-16 instead of the default value set by DFGMRES_INIT.
	/*                  It is the target value of the diagonal value if it is
	/*                  small as compared to dpar[30] and the routine should change
	/*                  it rather than abort DCSRILU0 calculations.
	/*---------------------------------------------------------------------------*/

	//ipar[30] = 1;
	//dpar[30] = 1.E-50;
	//dpar[31] = 1.E-50;

	printf("ilu begin\n");
	dcsrilu0(&n, &ar[0], &ia[0], &ja[0], &bilu0[0], &ipar[0], &dpar[0], &ierr);
	printf("ilu end\n");

	nrm2 = dnrm2(&matsize, bilu0, &incx);

	if (ierr != 0)
	{
		printf("Preconditioner dcsrilu0 has returned the ERROR code %d", ierr);
		goto FAILED1;
	}

	/*---------------------------------------------------------------------------
	/* Set the desired parameters:
	/* do the restart after 2 iterations
	/* LOGICAL parameters:
	/* do not do the stopping test for the maximal number of iterations
	/* do the Preconditioned iterations of FGMRES method
	/* Set parameter ipar[10] for preconditioner call. For this example,
	/* it reduces the number of iterations.
	/* DOUBLE PRECISION parameters
	/* set the relative tolerance to 1.0D-3 instead of default value 1.0D-6
	/* NOTE. Preconditioner may increase the number of iterations for an
	/* arbitrary case of the system and initial guess and even ruin the
	/* convergence. It is user's responsibility to use a suitable preconditioner
	/* and to apply it skillfully.
	/*---------------------------------------------------------------------------*/
	ipar[14] = 2;
	ipar[7] = 0;
	ipar[10] = 1;
	dpar[0] = m_dResidualValue;

	/*---------------------------------------------------------------------------
	/* Check the correctness and consistency of the newly set parameters
	/*---------------------------------------------------------------------------*/
	printf("dfgmres check begin\n");
	dfgmres_check(&ivar, computed_solution, r_rhs, &RCI_request, ipar, dpar, tmp);
	if (RCI_request != 0) goto FAILED;
	printf("dfgmres check end\n");

	/*---------------------------------------------------------------------------
	/* Compute the solution by RCI (P)FGMRES solver with preconditioning
	/* Reverse Communication starts here
	/*---------------------------------------------------------------------------*/
	printf("dfgmres begin to solve\n");
ONE:  dfgmres(&ivar, computed_solution, r_rhs, &RCI_request, ipar, dpar, tmp);

	printf("dfgmres end of solve\n");

	/*---------------------------------------------------------------------------
	/* If RCI_request=0, then the solution was found with the required precision
	/*---------------------------------------------------------------------------*/
	if (RCI_request == 0) goto COMPLETE;
	/*---------------------------------------------------------------------------
	/* If RCI_request=1, then compute the vector A*tmp[ipar[21]-1]
	/* and put the result in vector tmp[ipar[22]-1]
	/*---------------------------------------------------------------------------
	/* NOTE that ipar[21] and ipar[22] contain FORTRAN style addresses,
	/* therefore, in C code it is required to subtract 1 from them to get C style
	/* addresses
	/*---------------------------------------------------------------------------*/
	if (RCI_request == 1)
	{
		mkl_dcsrgemv(&cvar, &ivar, ar, ia, ja, &tmp[ipar[21] - 1], &tmp[ipar[22] - 1]);
		goto ONE;
	}
	/*---------------------------------------------------------------------------
	/* If RCI_request=2, then do the user-defined stopping test
	/* The residual stopping test for the computed solution is performed here
	/*---------------------------------------------------------------------------
	/* NOTE: from this point vector b[n] is no longer containing the right-hand
	/* side of the problem! It contains the current FGMRES approximation to the
	/* solution. If you need to keep the right-hand side, save it in some other
	/* vector before the call to dfgmres routine. Here we saved it in vector
	/* r_rhs[n]. The vector b is used instead of r_rhs to preserve the
	/* original right-hand side of the problem and guarantee the proper
	/* restart of FGMRES method. Vector b will be altered when computing the
	/* residual stopping criterion!
	/*---------------------------------------------------------------------------*/
	if (RCI_request == 2)
	{
		/* Request to the dfgmres_get routine to put the solution into b[n] via ipar[12]
		/*---------------------------------------------------------------------------
		/* WARNING: beware that the call to dfgmres_get routine with ipar[12]=0 at this stage may
		/* destroy the convergence of the FGMRES method, therefore, only advanced users should
		/* exploit this option with care */
		ipar[12] = 1;
		/* Get the current FGMRES solution in the vector b[n] */
		dfgmres_get(&ivar, computed_solution, b, &RCI_request, ipar, dpar, tmp, &itercount);
		/* Compute the current true residual via MKL (Sparse) BLAS routines */
		mkl_dcsrgemv(&cvar, &ivar, ar, ia, ja, b, residual);
		dvar = -1.0E0;
		i = 1;
		daxpy(&ivar, &dvar, r_rhs, &i, residual, &i);
		dvar = dnrm2(&ivar, residual, &i);
		if (dvar < m_dResidualValue)
			goto COMPLETE;

		else goto ONE;
	}
	/*---------------------------------------------------------------------------
	/* If RCI_request=3, then apply the preconditioner on the vector
	/* tmp[ipar[21]-1] and put the result in vector tmp[ipar[22]-1]
	/*---------------------------------------------------------------------------
	/* NOTE that ipar[21] and ipar[22] contain FORTRAN style addresses,
	/* therefore, in C code it is required to subtract 1 from them to get C style
	/* addresses
	/* Here is the recommended usage of the result produced by ILU0 routine
	/* via standard MKL Sparse Blas solver routine mkl_dcsrtrsv.
	/*---------------------------------------------------------------------------*/
	if (RCI_request == 3)
	{
		cvar1 = 'L';
		cvar = 'n';
		cvar2 = 'U';
		mkl_dcsrtrsv(&cvar1, &cvar, &cvar2, &ivar, bilu0, ia, ja, &tmp[ipar[21] - 1], trvec);
		cvar1 = 'U';
		cvar = 'n';
		cvar2 = 'n';
		mkl_dcsrtrsv(&cvar1, &cvar, &cvar2, &ivar, bilu0, ia, ja, trvec, &tmp[ipar[22] - 1]);
		goto ONE;
	}

	/*---------------------------------------------------------------------------
	/* If RCI_request=4, then check if the norm of the next generated vector is
	/* not zero up to rounding and computational errors. The norm is contained
	/* in dpar[6] parameter
	/*---------------------------------------------------------------------------*/
	if (RCI_request == 4)
	{
		if (dpar[6] < 1.0E-50)
			goto COMPLETE;
		else goto ONE;
	}
	/*---------------------------------------------------------------------------
	/* If RCI_request=anything else, then dfgmres subroutine failed
	/* to compute the solution vector: computed_solution[n]
	/*---------------------------------------------------------------------------*/
	else
	{
		goto FAILED;
	}
	/*---------------------------------------------------------------------------
	Reverse Communication ends here
	Get the current iteration number and the FGMRES solution (DO NOT FORGET to
	call dfgmres_get routine as computed_solution is still containing
	the initial guess!). Request to dfgmres_get to put the solution
	into vector computed_solution[n] via ipar[12]
	---------------------------------------------------------------------------*/
COMPLETE:   ipar[12] = 0;
	dfgmres_get(&ivar, computed_solution, r_rhs, &RCI_request, ipar, dpar, tmp, &itercount);
	/*---------------------------------------------------------------------------
	Print solution vector: computed_solution[n] and the number of iterations: itercount
	---------------------------------------------------------------------------*/
	printf("The system has been solved \n");
	/*for (i = 0; i < n; i++)
	r_rhs[i + 1] = computed_solution[i];*/
	writeSolution(path, computed_solution, 1, n);
	printf("\nNumber of iterations: %d\n", itercount);
	printf("\n");
	goto CLEANMEMORY;

FAILED:
	bSucceeded = 0;
	printf("The solver has returned the ERROR code %d \n", RCI_request);
FAILED1:
	bSucceeded = 0;
	printf("-------------------------------------------------------------------\n");
	printf("Unfortunately, FGMRES+ILU0 C example has FAILED\n");
	printf("-------------------------------------------------------------------\n");

CLEANMEMORY:
	free(b);
	free(computed_solution);
	free(residual);
	free(trvec);
	free(bilu0);
	free(tmp);
	free(dpar);
	free(ipar);

	return bSucceeded;
}

 

mkl_sparse_z_export_csr of Inspector-executor Sparse BLAS return bizarre results.

$
0
0

Dear MKL experts,

       I'm testing the function,mkl_sparse_z_export_csr, in Inspector-executor Sparse BLAS, to convert COO format into CSR format and output the CSR format. The function returns BIZARRE results as shown in the below. I guess I did something wrong. I attach the simple code at the end. Please tell what the correct way to do it.

Dan

!   Sparse representation of the matrix A
!
!                 |   1       -1      0   -3     0   |
!                 |  -2        5      0    0     0   |
!   A    =     |   0        0      4    6     4    |,
!                 |  -4        0      2    7     0   |
!                 |   0        8      0    0    -5   |
!

   info = mkl_sparse_z_export_csr (csrA, i, nrow, ncol, rows_start,rows_end,col_indx,csrA_value)

 OUTPUT DATA FOR [A] in CSR Format =
 rows_start(1:M) =      8546432  1919251285  1631870067  1883331694  1952531568
 rows_end(1:M)  =      8546436  1952531568  1867275361  1550606691  1886217556
 col_indx(1:NNZ) =     8546688  1279350084  1313817944  1398079538  1095651909  1144866125
                                 1426091617  1347568979  1229344594  1128088908  1934974010  1551069797     7233860
 csr_AB_value(1:NNZ) =
 (2.099310184342793E+021,1.887537090390452E+219)
 (8.857835453724769E+247,8.490117977552768E+175)
 (1.178852010317225E-307,2.225055204183881E+252)
 (5.817520101125294E+252,1.035480486908856E+243)
 (2.646398786088117E+199,6.103525143045638E+257)
 (1.298808940059573E+219,1.565465600591419E-076)
 (3.015794287991875E+161,2.305365297438563E+108)
 (1.665757500898629E-071,9.805903047556869E-072)
 (1.107082334375070E+074,4.363619063791335E+242)
 (1.323192987846278E+199,1.674261708694803E-047)
 (1.099368561560521E+248,5.829277640098750E+257)
 (4.893813689845159E+199,3.173709719932153E+016)
 (2.410805593608415E+098,2.868330566549112E+093)
 ---------------------------------------------------

 

 

convert dense matrix to sparse CSR form

$
0
0

previously, I have used mkl_sdnscsr to convert a dense matrix to a sparse matrix with CSR format. However, with current update(2018 update 3), this function is depreciated, and the user manual instruct me to find a replacement in matrix manipulation of the inspector-executor sparse blas routine. However, I can not find the replacement function! Do you know which function I can use now to convert my dense matrix to a sparse CSR format?

MKL_DNN convolution has the wrong output order on Intel(R) Xeon(R) CPU E5-2650 v3 (Possible bug)

$
0
0

Hello all,

I recently implemented the convolution of the intel mkl library as described in the example included with the library. Everything is fine and dandy on my Laptop with a  i5-3210M. However when I tried to run the code on the big machine, with an Intel(R) Xeon(R) CPU E5-2650 v3 i ran into some bugs/problems.

For outputs that have a channel size that is a multiple of 8 the order of the output is wrong. This is either a mistake on my side (probably with the compile options) or in the worst case a bug in the mkl. I wrote a short test script similar to the example file, that implements a standard forward convolution.

#include <iostream>
#include "mkl_dnn.h"
#include <vector>
using namespace std;


#define dimension (4)
int main() {

	dnnPrimitiveAttributes_t attributes;
	dnnPrimitive_t conv_prim = NULL;


	float* resConv1[dnnResourceNumber] = {0};

	size_t batch_num = 1;


	bool use_bias = false;

	size_t xinp = 4,
		yinp = 4,
		xout = 4,
		yout = 4,
		inpchannels = 1,
		outchannels = 8,
		xfilt = 3,
		yfilt = 3;


    size_t outputSize[dimension] = { xout, yout, outchannels, batch_num };
    size_t outputStrides[dimension] = { 1, xout, xout * yout, xout * yout * outchannels };

    size_t inputSize[dimension] = { xinp, yinp, inpchannels, batch_num };
    size_t inputStrides[dimension] = { 1, xinp, xinp * yinp, xinp * yinp * inpchannels };

    size_t filterSize[dimension] = { xfilt, yfilt, inpchannels, outchannels };
    size_t filterStrides[dimension] = { 1, xfilt, xfilt * yfilt, xfilt * yfilt * inpchannels };

    size_t biasSize[1] = { outputSize[2] };
    size_t biasStrides[1] = { outputStrides[2] };

    size_t convolutionStride[dimension - 2] = { 1, 1 };
    int inputOffset[dimension - 2 ] = { - ( (outputSize[0]/2)) - filterSize[0]/2 + inputSize[0]/2, - ( (outputSize[0]/2)) - filterSize[0]/2 + inputSize[0]/2 };

    dnnLayout_t lt_conv1_input = NULL,
                lt_conv1_filt = NULL,
                lt_conv1_bias = NULL,
                lt_conv1_output = NULL;




	if( dnnPrimitiveAttributesCreate_F32(&attributes)!= E_SUCCESS){
		std::cout << "error"<< std::endl;
	}
	dnnError_t err;
	if( use_bias ){
		err= dnnConvolutionCreateForwardBias_F32(&conv_prim, attributes,
	                    dnnAlgorithmConvolutionDirect, dimension, inputSize,
	                    outputSize, filterSize, convolutionStride, inputOffset,
	                    dnnBorderZeros);
	}else{
		err = dnnConvolutionCreateForward_F32(&conv_prim, attributes,
						dnnAlgorithmConvolutionDirect, dimension, inputSize,
						outputSize, filterSize, convolutionStride, inputOffset,
						 dnnBorderZeros);
	}

	if( err != E_SUCCESS){
		switch (err){
		case E_INCORRECT_INPUT_PARAMETER:
				std::cout << "incorrect input parameter while creating the convolution"<< std::endl;break;
		default:
			std::cout << "error while creating convolution"<< std::endl;
		}

	}

    dnnLayoutCreateFromPrimitive_F32(&lt_conv1_input, conv_prim, dnnResourceSrc);
    dnnLayoutCreateFromPrimitive_F32(&lt_conv1_filt, conv_prim, dnnResourceFilter);
    if( use_bias){
    	dnnLayoutCreateFromPrimitive_F32(&lt_conv1_bias, conv_prim, dnnResourceBias);
    }
    dnnLayoutCreateFromPrimitive_F32(&lt_conv1_output,conv_prim, dnnResourceDst);


    std::vector<float> input(xinp*yinp*inpchannels,1.0);
    std::vector<float> output(xout*yout*outchannels,1.0);
    std::vector<float> filter(xfilt*yfilt*inpchannels*outchannels,1.0);
    std::vector<float> bias(outchannels,1.0);

    resConv1[dnnResourceSrc] = &(input[0]);
    resConv1[dnnResourceFilter] = &filter[0];
    if( use_bias)  resConv1[dnnResourceBias] = &bias[0];
    resConv1[dnnResourceDst]= &output[0];

    dnnError_t err_exe = dnnExecute_F32(conv_prim, (void**) resConv1);
    if( err_exe != E_SUCCESS){
    	std::cout << "Error while forward propagation in convolutional layer"<< std::endl;
    	if( err_exe== E_MEMORY_ERROR){
    		std::cout << "Memory Error"<< std::endl;
    	}
    	if( err_exe == E_UNIMPLEMENTED){
    		std::cout << "Unimplemented"<< std::endl;
    	}
    	if( err_exe == E_UNSUPPORTED_DIMENSION){
    		std::cout << "Unsupported dimension"<< std::endl;
    	}
    	if( err_exe == E_INCORRECT_INPUT_PARAMETER){
    		std::cout << "Incorrect input parameter"<< std::endl;
    	}
    }

    std::cout << "output"<<std::endl;
    for( int i=0; i < output.size(); i++){
    	std::cout << output[i] << "";
    }
    std::cout << std::endl;
	return 0;
}

 

The desired output for a 4x4 image with 8 convolutions and an input of 1s and 3x3 filters of 1s is:

4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4 4 6 6 4 6 9 9 6 6 9 9 6 4 6 6 4

This is also what my mobile CPU gives me when i run the code. However on the big PC i get

4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 6 6 6 6 6 6 6 6 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 4 4 4 4 4 4 4

which is obviously somewhat right, but not in the right order. However when i change the output channel to not be a multiple of 8 the code runs fine even on the Xeon CPU. This might be due to the mkl switching to a slower and different algorithm as explained in this post:

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/...

Does anybody have an explanation or even a fix for this issue? Is this known behaviour on Xeon CPUs, or a bug in the software? I don't necessarily wan't to switch to the open source implementation, since it would mean a week of new implementing/testing. 

For compilation i used the following linkline for both systems :

 -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl

 -I${MKLROOT}/include -I${MKLROOT}/../lib/intel64_lin

 

any help would be appreciated.

 

 

 

Viewing all 2652 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>