MKL threading seems using only one core

October 15, 2015, 1:48 pm

Latest and popular articles on Intel Technologies

≫ Next: Does MKL has any subroutines for Kronecker Product?

≪ Previous: Intel MKL 11.3 hotfix release for BLAS, FFT, and sparse BLAS issues

When we use MKL PARDISO multi-threading on, the PARDISO output does indicate multiple number of processors but using top or ksysguard or the windows Task Manager, we see that only one core is exercised regardless of the number of threads we chose to use.

Major version: 11
Minor version: 1
Update version: 1
Product status: Product
Build: n20131010
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) Enabled Processor

↧

Does MKL has any subroutines for Kronecker Product?

October 19, 2015, 4:08 am

Latest and popular articles on Intel Technologies

≫ Next: PARDISO_64 compiler assisted offload

≪ Previous: MKL threading seems using only one core

Like the subject. Anyone know about it?

↧

PARDISO_64 compiler assisted offload

October 19, 2015, 9:57 am

Latest and popular articles on Intel Technologies

≫ Next: Installing NWchem using INTEL compilers + MKL libraries

≪ Previous: Does MKL has any subroutines for Kronecker Product?

Hello, all.

I am trying to offload a call to `pardiso_64` to a mic via compiler assisted offload.

`pardiso_64` returns error code `-12` if it is linked against 32 bit libraries. I am following the instructions from the link line advisor, but still getting `-12`. Please see the attached makefile.

MKL version 11.2.3 (`composer_xe_2015.3.187`)

Thanks.

Attachment	Size
Download pardiso_offload_example.c	7.34 KB
Download makefile	596 bytes

↧

Installing NWchem using INTEL compilers + MKL libraries

October 20, 2015, 7:08 pm

Latest and popular articles on Intel Technologies

≫ Next: Memory consumption of the symbolic factorization step (phase 11) in PARDISO

≪ Previous: PARDISO_64 compiler assisted offload

Hello Intel Team,

I am trying to install NWCHEM on the a local university cluster but I am having some problems in compiling NWchem. I am attaching my input script that I’m using to compile Nwchem.

export NWCHEM_TOP=/work/gb_lab/rahulsn/nwchem-6.6

export NWCHEM_TARGET=LINUX64

export ARMCI_NETWORK=OPENIB

export IB_HOME=/usr

export IB_INCLUDE=/usr/include/infiniband

export IB_LIB=/usr/lib64

export IB_LIB_NAME="-libumad -libverbs -lpthread"

export USE_MPI=Y

export USE_MPIF=Y

export USE_MPIF4=Y

export MPI_LOC=/shared/intel/impi_5.0.3

export MPI_INCLUDE="-I/shared/intel/impi_5.0.3/intel64/include"

export MPI_LIB="/shared/intel/impi_5.0.3/intel64/lib/release -L/shared/intel/impi_5.0.3/intel64/lib"

export LIBMPI="-lmpifort -lmpi -lmpigi -ldl -lrt -lpthread"

export NWCHEM_MODULES="all"

export LARGE_FILES=TRUE

export USE_NOFSCHECK=TRUE

export HAS_BLAS=yes

export USE_SCALAPACK=y

export MKLLIB=/opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/lib/intel64

export MKLINC=/opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/include

export BLASOPT="-L$MKLLIB -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm"

export LAPACK_LIBS="-L$MKLLIB -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm"

export SCALAPACK="-L$MKLLIB -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lmkl_blacs_intelmpi_ilp64 -lpthread -lm"

export SCALAPACK_SIZE=8

export BLAS_SIZE=8

export USE_64TO32=y

export FC=ifort

export CC=icc

make nwchem_config

make CC=icc FC=ifort FOPTIMIZE=-O3 -j4

This is the a part of the error it is showing at the end. I think it is not able to locate SCALPACK libraries on Cyence.

scalapack.F:(.text+0xd19c): undefined reference to `numroc_'

scalapack.F:(.text+0xd1c9): undefined reference to `numroc_'

scalapack.F:(.text+0xd363): undefined reference to `pdsyevd_'

scalapack.F:(.text+0xd719): undefined reference to `numroc_'

scalapack.F:(.text+0xd743): undefined reference to `numroc_'

scalapack.F:(.text+0xd76e): undefined reference to `numroc_'

scalapack.F:(.text+0xd79d): undefined reference to `numroc_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `ga_pdsyevr_':

scalapack.F:(.text+0xdfa8): undefined reference to `blacs_gridinit_'

scalapack.F:(.text+0xdff9): undefined reference to `blacs_gridinfo_'

scalapack.F:(.text+0xe164): undefined reference to `descinit_'

scalapack.F:(.text+0xe1b5): undefined reference to `descinit_'

scalapack.F:(.text+0xe247): undefined reference to `numroc_'

scalapack.F:(.text+0xe26a): undefined reference to `numroc_'

scalapack.F:(.text+0xe3c7): undefined reference to `pdsyevr_'

scalapack.F:(.text+0xe5e8): undefined reference to `pdsyevr_'

scalapack.F:(.text+0xeb21): undefined reference to `numroc_'

scalapack.F:(.text+0xeb4c): undefined reference to `numroc_'

scalapack.F:(.text+0xeb76): undefined reference to `numroc_'

scalapack.F:(.text+0xeba0): undefined reference to `numroc_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `ga_pzheevd_':

scalapack.F:(.text+0xf26a): undefined reference to `blacs_gridinit_'

scalapack.F:(.text+0xf2bb): undefined reference to `blacs_gridinfo_'

scalapack.F:(.text+0xf41e): undefined reference to `descinit_'

scalapack.F:(.text+0xf470): undefined reference to `descinit_'

scalapack.F:(.text+0xf593): undefined reference to `pzheevd_'

scalapack.F:(.text+0xf7db): undefined reference to `pzheevd_'

scalapack.F:(.text+0xfa00): undefined reference to `numroc_'

scalapack.F:(.text+0xfa2c): undefined reference to `numroc_'

scalapack.F:(.text+0xfa58): undefined reference to `numroc_'

scalapack.F:(.text+0xfa84): undefined reference to `numroc_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `ga_pzheevr_':

scalapack.F:(.text+0x10e61): undefined reference to `blacs_gridinit_'

scalapack.F:(.text+0x10eb2): undefined reference to `blacs_gridinfo_'

scalapack.F:(.text+0x1102d): undefined reference to `descinit_'

scalapack.F:(.text+0x1107d): undefined reference to `descinit_'

scalapack.F:(.text+0x11242): undefined reference to `pzheevr_'

scalapack.F:(.text+0x1152d): undefined reference to `pzheevr_'

scalapack.F:(.text+0x11835): undefined reference to `numroc_'

scalapack.F:(.text+0x11864): undefined reference to `numroc_'

scalapack.F:(.text+0x11893): undefined reference to `numroc_'

scalapack.F:(.text+0x118c2): undefined reference to `numroc_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `pxerbla_':

scalapack.F:(.text+0x11a9d): undefined reference to `blacs_gridinfo_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `slexit4_':

scalapack.F:(.text+0x11b6b): undefined reference to `blacs_gridexit_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `slexit3_':

scalapack.F:(.text+0x11b9b): undefined reference to `blacs_gridexit_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `slexit2_':

scalapack.F:(.text+0x11be8): undefined reference to `blacs_gridexit_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `slexit_':

scalapack.F:(.text+0x11c0b): undefined reference to `blacs_gridexit_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `slinit4_':

scalapack.F:(.text+0x11ddb): undefined reference to `blacs_gridinfo_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `slinit3_':

scalapack.F:(.text+0x11ee1): undefined reference to `blacs_gridinfo_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `slinit2_':

scalapack.F:(.text+0x120e3): undefined reference to `blacs_gridinit_'

scalapack.F:(.text+0x12134): undefined reference to `blacs_gridinfo_'

/work/gb_lab/rahulsn/nwchem-6.6/src/tools/install/lib/libga.a(scalapack.o): In function `slinit_':

scalapack.F:(.text+0x1221a): undefined reference to `blacs_gridinit_'

scalapack.F:(.text+0x12247): undefined reference to `blacs_gridinfo_’

It would be really helpful if somebody can help me in preparing my input script. I think I am using the wrong locations of scalpack libraries but the path mentioned in the starting part of the script has all the libraries in the folder. Output of the $MKLLIB folder

libmkl_avx.so libmkl_blacs_openmpi_lp64.a libmkl_gf_ilp64.so libmkl_lapack95_ilp64.a libmkl_vml_avx.so

libmkl_avx2.so libmkl_blacs_sgimpt_ilp64.a libmkl_gf_lp64.a libmkl_lapack95_lp64.a libmkl_vml_avx2.so

libmkl_avx512.so libmkl_blacs_sgimpt_lp64.a libmkl_gf_lp64.so libmkl_mc.so libmkl_vml_avx512.so

libmkl_avx512_mic.so libmkl_blas95_ilp64.a libmkl_gnu_thread.a libmkl_mc3.so libmkl_vml_avx512_mic.so

libmkl_blacs_ilp64.a libmkl_blas95_lp64.a libmkl_gnu_thread.so libmkl_rt.so libmkl_vml_cmpt.so

libmkl_blacs_intelmpi_ilp64.a libmkl_cdft_core.a libmkl_intel_ilp64.a libmkl_scalapack_ilp64.a libmkl_vml_def.so

libmkl_blacs_intelmpi_ilp64.so libmkl_cdft_core.so libmkl_intel_ilp64.so libmkl_scalapack_ilp64.so libmkl_vml_mc.so

libmkl_blacs_intelmpi_lp64.a libmkl_core.a libmkl_intel_lp64.a libmkl_scalapack_lp64.a libmkl_vml_mc2.so

libmkl_blacs_intelmpi_lp64.so libmkl_core.so libmkl_intel_lp64.so libmkl_scalapack_lp64.so libmkl_vml_mc3.so

libmkl_blacs_lp64.a libmkl_def.so libmkl_intel_thread.a libmkl_sequential.a locale

libmkl_blacs_openmpi_ilp64.a libmkl_gf_ilp64.a libmkl_intel_thread.so libmkl_sequential.so

Thanks for the help.

Rahul Singh

↧

Memory consumption of the symbolic factorization step (phase 11) in PARDISO

October 20, 2015, 7:32 pm

Latest and popular articles on Intel Technologies

≫ Next: error LNK2019: unresolved external symbol DGETRF_MKL95 referenced in function SUPORTBASE_mp_INVMAT

≪ Previous: Installing NWchem using INTEL compilers + MKL libraries

I am using the out-of-core `pardiso` with 3,267,775,116 non-zero entries.

Are there any estimates how much RAM phase 11 should take? I understand that this is the METIS symbolic factorization. Is there a way to get finer control for this stage?

I have tried to use a user-defined permutation (setting `iparm[4]=1`, and the permutation arrya `perm = (MKL_INT) 1..N`). This was to no avail.

Currently, I am getting 120GB of RAM used, which utterly defeats the purpose of the out-of-core part for the rest of the code.

Thank you.

↧

error LNK2019: unresolved external symbol DGETRF_MKL95 referenced in function SUPORTBASE_mp_INVMAT

October 21, 2015, 1:51 am

Latest and popular articles on Intel Technologies

≫ Next: LAPACKE_zgesv Array Limitations

≪ Previous: Memory consumption of the symbolic factorization step (phase 11) in PARDISO

I inherited some project done in Fortran 95. After upgrade from Intel Fotran Composer v2013 to v2016, I am receiving errors during linking:

1>suportbase.obj : error LNK2019: unresolved external symbol DGETRF_MKL95 referenced in function SUPORTBASE_mp_INVMAT

1>suportbase.obj : error LNK2019: unresolved external symbol DGETRI_MKL95 referenced in function SUPORTBASE_mp_INVMAT

...fatal error LNK1120: 2 unresolved externals

The code which cause the problem:

pure function invmat(a) result(b)
USE usedlapck95_mod, ONLY: getrf,getri
USE, INTRINSIC :: IEEE_ARITHMETIC
implicit none


    real(8),intent(in),dimension(:,:) :: a
    real(8),dimension(size(a,1),size(a,1)) :: b


    integer(4),dimension(size(a,1)) :: ipiv
    integer(4) :: info

    b=a

    if (size(a,1)==0) return

    call getrf(b,ipiv,info)

    call getri(b,ipiv,info)

    if (info>0) b=IEEE_VALUE (b,IEEE_POSITIVE_INF)


end function

The code produce inverse matrix and it uses MKL library rutines.

The module which is called is:

MODULE F95_PRECISION
    INTEGER, PARAMETER :: SP = KIND(1.0E0)
    INTEGER, PARAMETER :: DP = KIND(1.0D0)
END MODULE F95_PRECISION

MODULE usedlapck95_mod

INTERFACE GETRF
    PURE SUBROUTINE SGETRF_MKL95(A,IPIV,INFO)
        ! MKL Fortran77 call:
        ! SGETRF(M,N,A,LDA,IPIV,INFO)
        USE F95_PRECISION, ONLY: WP => SP
        INTEGER, INTENT(OUT), OPTIONAL :: INFO
        REAL(WP), INTENT(INOUT) :: A(:,:)
        INTEGER, INTENT(OUT), OPTIONAL, TARGET :: IPIV(:)
    END SUBROUTINE SGETRF_MKL95
    PURE SUBROUTINE DGETRF_MKL95(A,IPIV,INFO)
        ! MKL Fortran77 call:
        ! DGETRF(M,N,A,LDA,IPIV,INFO)
        USE F95_PRECISION, ONLY: WP => DP
        INTEGER, INTENT(OUT), OPTIONAL :: INFO
        REAL(WP), INTENT(INOUT) :: A(:,:)
        INTEGER, INTENT(OUT), OPTIONAL, TARGET :: IPIV(:)
    END SUBROUTINE DGETRF_MKL95
    PURE SUBROUTINE CGETRF_MKL95(A,IPIV,INFO)
        ! MKL Fortran77 call:
        ! CGETRF(M,N,A,LDA,IPIV,INFO)
        USE F95_PRECISION, ONLY: WP => SP
        INTEGER, INTENT(OUT), OPTIONAL :: INFO
        COMPLEX(WP), INTENT(INOUT) :: A(:,:)
        INTEGER, INTENT(OUT), OPTIONAL, TARGET :: IPIV(:)
    END SUBROUTINE CGETRF_MKL95
    PURE SUBROUTINE ZGETRF_MKL95(A,IPIV,INFO)
        ! MKL Fortran77 call:
        ! ZGETRF(M,N,A,LDA,IPIV,INFO)
        USE F95_PRECISION, ONLY: WP => DP
        INTEGER, INTENT(OUT), OPTIONAL :: INFO
        COMPLEX(WP), INTENT(INOUT) :: A(:,:)
        INTEGER, INTENT(OUT), OPTIONAL, TARGET :: IPIV(:)
    END SUBROUTINE ZGETRF_MKL95
END INTERFACE GETRF

INTERFACE GETRI
    PURE SUBROUTINE SGETRI_MKL95(A,IPIV,INFO)
        ! MKL Fortran77 call:
        ! SGETRI(N,A,LDA,IPIV,WORK,LWORK,INFO)
        USE F95_PRECISION, ONLY: WP => SP
        INTEGER, INTENT(OUT), OPTIONAL :: INFO
        REAL(WP), INTENT(INOUT) :: A(:,:)
        INTEGER, INTENT(IN) :: IPIV(:)
    END SUBROUTINE SGETRI_MKL95
    PURE SUBROUTINE DGETRI_MKL95(A,IPIV,INFO)
        ! MKL Fortran77 call:
        ! DGETRI(N,A,LDA,IPIV,WORK,LWORK,INFO)
        USE F95_PRECISION, ONLY: WP => DP
        INTEGER, INTENT(OUT), OPTIONAL :: INFO
        REAL(WP), INTENT(INOUT) :: A(:,:)
        INTEGER, INTENT(IN) :: IPIV(:)
    END SUBROUTINE DGETRI_MKL95
    PURE SUBROUTINE CGETRI_MKL95(A,IPIV,INFO)
        ! MKL Fortran77 call:
        ! CGETRI(N,A,LDA,IPIV,WORK,LWORK,INFO)
        USE F95_PRECISION, ONLY: WP => SP
        INTEGER, INTENT(OUT), OPTIONAL :: INFO
        COMPLEX(WP), INTENT(INOUT) :: A(:,:)
        INTEGER, INTENT(IN) :: IPIV(:)
    END SUBROUTINE CGETRI_MKL95
    PURE SUBROUTINE ZGETRI_MKL95(A,IPIV,INFO)
        ! MKL Fortran77 call:
        ! ZGETRI(N,A,LDA,IPIV,WORK,LWORK,INFO)
        USE F95_PRECISION, ONLY: WP => DP
        INTEGER, INTENT(OUT), OPTIONAL :: INFO
        COMPLEX(WP), INTENT(INOUT) :: A(:,:)
        INTEGER, INTENT(IN) :: IPIV(:)
    END SUBROUTINE ZGETRI_MKL95
END INTERFACE GETRI

INTERFACE SYEVD
        ! JOBZ='N','V'; default: 'N'
        ! UPLO='U','L'; default: 'U'
    PURE SUBROUTINE SSYEVD_F95(A,W,JOBZ,UPLO,INFO)
        ! Fortran77 call:
        ! SSYEVD(JOBZ,UPLO,N,A,LDA,W,WORK,LWORK,IWORK,LIWORK,INFO)
        USE F95_PRECISION, ONLY: WP => SP
        CHARACTER(LEN=1), INTENT(IN), OPTIONAL :: JOBZ
        CHARACTER(LEN=1), INTENT(IN), OPTIONAL :: UPLO
        INTEGER, INTENT(OUT), OPTIONAL :: INFO
        REAL(WP), INTENT(INOUT) :: A(:,:)
        REAL(WP), INTENT(OUT) :: W(:)
    END SUBROUTINE SSYEVD_F95
    PURE SUBROUTINE DSYEVD_F95(A,W,JOBZ,UPLO,INFO)
        ! Fortran77 call:
        ! DSYEVD(JOBZ,UPLO,N,A,LDA,W,WORK,LWORK,IWORK,LIWORK,INFO)
        USE F95_PRECISION, ONLY: WP => DP
        CHARACTER(LEN=1), INTENT(IN), OPTIONAL :: JOBZ
        CHARACTER(LEN=1), INTENT(IN), OPTIONAL :: UPLO
        INTEGER, INTENT(OUT), OPTIONAL :: INFO
        REAL(WP), INTENT(INOUT) :: A(:,:)
        REAL(WP), INTENT(OUT) :: W(:)
    END SUBROUTINE DSYEVD_F95
END INTERFACE SYEVD

end module

Does anybody has an idea how to fix the problem with linker? I have "light" experience with Fortran so far, my colleagues do but they don't have any idea too.

Thank!

↧

LAPACKE_zgesv Array Limitations

October 21, 2015, 3:26 am

Latest and popular articles on Intel Technologies

≫ Next: MKL DCT using fortran: scaling prefactors and complex-type arrays

≪ Previous: error LNK2019: unresolved external symbol DGETRF_MKL95 referenced in function SUPORTBASE_mp_INVMAT

I'm using LAPACKE_zgesv (MKL 11.3.01) in c# with arrays of Complex.
I have problems when the arrays exceeded about 12000x12000 elements (2.145GB)

.NET Have limitations but I overcomed it with <gcAllowVeryLargeObjects enabled="true" /> directive.

By the way it seems that the problem is with this function LAPACKE_zgesv().
this is the message: "The array size exceed the adresses limits"

Is there any array size limitation?

Thank you very much

Gianluca

↧

MKL DCT using fortran: scaling prefactors and complex-type arrays

October 24, 2015, 8:40 am

Latest and popular articles on Intel Technologies

≫ Next: c# zgesv and Array Limitations with x64

≪ Previous: LAPACKE_zgesv Array Limitations

Hi,

I'm currently using the MKL to calculate some DCT on complex-type fortran arrays. My fortran module has a first initialization routine that looks like

   subroutine initialize(this,n, ni, nd)
     class(costf_odd_t) :: this
     !-- Input variables
     integer, intent(in) :: n
     integer, intent(in) :: ni
     integer, intent(in) :: nd
     !-- Local variables:
     integer :: stat
     real(cp) :: fac
     real(cp) :: work(n)
     this%nRad = n
     allocate(this%i_costf_init(128))
     allocate(this%d_costf_init(1:5*(n-1)/2+2))
     call d_init_trig_transform(n-1,MKL_COSINE_TRANSFORM,this%i_costf_init,this%d_costf_init,stat)
     call d_commit_trig_transform(work,this%r2c_handle,this%i_costf_init,this%d_costf_init,stat)
     !fac = sqrt(half*real(this%nRad-1,cp))
     !stat = DftiSetValue(this%r2c_handle, DFTI_FORWARD_SCALE, fac)
     !stat = DftiCommitDescriptor(this%r2c_handle)
  end subroutine initialize

The actual DCT is then computed later on as follows

   subroutine costf1_complex(this,f,n_f_max,n_f_start,n_f_stop)
     class(costf_odd_t) :: this
     !-- Input variables:
     integer,  intent(in) :: n_f_start,n_f_stop ! columns to be transformed
     integer,  intent(in) :: n_f_max
     complex(cp), intent(inout) :: f(n_f_max,this%nRad)
     !-- Local variables:
     real(cp) :: work_real(this%nRad)
     real(cp) :: work_imag(this%nRad)
     integer :: stat, n_f

     do n_f=n_f_start,n_f_stop
        work_real(:) = real(f(n_f,:))
        work_imag(:) = aimag(f(n_f,:))
        call d_forward_trig_transform(work_real,this%r2c_handle,this%i_costf_init,this%d_costf_init,stat)
        call d_forward_trig_transform(work_imag,this%r2c_handle,this%i_costf_init,this%d_costf_init,stat)
        f(n_f,:)=sqrt(half*real(this%nRad-1,cp))*cmplx(work_real,work_imag,kind=cp)
     end do
  end subroutine costf1_complex

It basically does what I want but I have several performance limitations since: (i) there is a pre-factor multiplication which is computed for each DCT, (ii) there is a memory copy due to the unsupported complex-type input arrays in the MKL trig transforms.

I tried fixing the first issue using the 3 last commented lines in the initialize subroutine above, but it didn't work, any idea here? Concerning the second issue, is there a possible way to avoid the memory allocation of the work arrays in the costf1_complex subroutine (maybe using the C-type pointers)?

Thanks!

↧

c# zgesv and Array Limitations with x64

October 27, 2015, 10:48 am

Latest and popular articles on Intel Technologies

≫ Next: Cholesky factorization, traspose and inversion of sparse matrix.

≪ Previous: MKL DCT using fortran: scaling prefactors and complex-type arrays

This post is a continuation of this: https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/597077

I'm using LAPACKE_zgesv (MKL 11.3.01) in C# with arrays of Complex.

I have problems when the arrays exceeded about 12000x12000 elements

.NET Have limitations but I overcomed it with gcAllowVeryLargeObjects enabled="true" directive.

By the way the problem seems to be with LAPACKE_zgesv() or at least something linked with it.
this is the message: "The array size exceed the adresses limits"

For example I have a Matrix Complex[12145,12145] and a vector of Complex[12145], if I apply the LU decomposition (implemented by me) to resolve that system it works. It takes a lot of time but it works.

If we pass these parameters to LAPACKE_zgesv function it doesn't work. Maybe it depends of C# declaration, but with smaller System it works.

I'm using a notebook with windows 10 and 10GB of RAM memory.

I attach the simple example I used to test it.

If you want I have also the files with data but are very large (http://we.tl/ypYNEAk2ZU).

Thank you for any help you can give

Attachment	Size
Download MKLTest.zip	9.68 KB

↧

Cholesky factorization, traspose and inversion of sparse matrix.

October 27, 2015, 3:38 pm

Latest and popular articles on Intel Technologies

≫ Next: Getting different result when linking to mkl_intel_thread or mkl_sequential

≪ Previous: c# zgesv and Array Limitations with x64

Hello!

Let Sigma be a sparse matrix. I would like to compute the Cholesky factorization of Sigma (the Upper(Lt) or lower triangular (L)), transpose it, and compute the folowing terms

w = inv(L)*mu;
m = inv(Lt)*w;
v = inv(Lt)*b;

where mu, b are known.

The problem I face is that I can't find the routines (and examples) when the matrix is sparse. Any help would be really appreciated.
Thank you very much.

↧

Getting different result when linking to mkl_intel_thread or mkl_sequential

October 28, 2015, 10:30 am

Latest and popular articles on Intel Technologies

≫ Next: Strange sparse_matrix_checker() Error in MKL with Matlab MEX file

≪ Previous: Cholesky factorization, traspose and inversion of sparse matrix.

Hi, Guys,

When i use intel mkl library, i met a strange problem.

Preparation:
I built liabaray A (it would call cblas_sgemm function), using libmkl_intel_thread.so.

The routine to get the wrong result:
I built application B which depend on library A, if in the makefile I included libmkl_intel_thread.so as dependency, I ran the application B, it could not get the right result.

The routine to get the wrong right:
I built application B which depend on library A, if in the makefile I included libmkl_sequential.so as dependency. While if i built application B, it could get the right result. But in this way, i could not get the benefit from multi-thread matrix computing.

Thanks~

↧

Strange sparse_matrix_checker() Error in MKL with Matlab MEX file

October 28, 2015, 6:47 pm

Latest and popular articles on Intel Technologies

≫ Next: Array slicing with MKL

≪ Previous: Getting different result when linking to mkl_intel_thread or mkl_sequential

I am performing a simple test of a sparse matrix stores in CRS3 format with sparse_matrix_checker(). When I perform this test in the matrix_check.c MKL example code, it passes as expected. Also, I compiled the examples with GCC.

However, when I perform an identical check in a Matlab C++ MEX function (also compiled with GCC), I get a strange error that doesn't make sense. The matrix is identical to the one given in the MKL documentation at https://software.intel.com/en-us/node/471374. I am using zero-based indexing. Therefore, my row, column and value arrays are:

/* Matrix data. */
    MKL_INT n = 5;
    MKL_INT ia[6] = {0,3,5,8,11,13};
    MKL_INT ja[13] =
      {0,1,3,0,1,2,3,4,0,2,3,1,4};
    double a[13] =
      {1,-1,-3,-2,5,4,6,4,-4,2,7,8,-5};

The parameters of my matrix check are:

    pt.n = n;
    pt.csr_ia = ia;
    pt.csr_ja = ja;
    pt.indexing         = MKL_ZERO_BASED;
    pt.matrix_structure = MKL_GENERAL_STRUCTURE;
    pt.print_style      = MKL_C_STYLE;
    pt.message_level    = MKL_PRINT;

Again, when I run this check in matrix_check.c compiled from the examples, it passes. But when I compile this in a Matlab MEX, I get the error:

Matrix check result code: 21 (MKL_SPARSE_CHECKER_NON_MONOTONIC)
Matrix check details: (1, 0, 0)

However, this error doesn't make sense because ia[1] and ia[2] are not zero. Instead, they are 3 and 5. Thus, it seems there is some memory read error by the MKL library.

Note, I am compiling the MEX file with the following compiler options using G++ 4.9.2:

LDFLAGS=-m64 -I/opt/intel/compilers_and_libraries_2016.0.109/linux/mkl/include -L/opt/intel/mkl/lib/intel64 -lmkl_core -lmkl_intel_ilp64 -lmkl_gnu_thread myTest.cpp

Also, I verified that the matrix is in fact stored correctly by reading it out in the Matlab MEX file:

0: 	val: 1	cols: 0	rows: 0
1: 	val: -1	cols: 1	rows: 3
2: 	val: -3	cols: 3	rows: 5
3: 	val: -2	cols: 0	rows: 8
4: 	val: 5	cols: 1	rows: 115:			rows: 13
5: 	val: 4	cols: 2
6: 	val: 6	cols: 3
7: 	val: 4	cols: 4
8: 	val: -4	cols: 0
9: 	val: 2	cols: 2
10: 	val: 7	cols: 3
11: 	val: 8	cols: 1
12: 	val: -5	cols: 4

It seems to me that there is some issue caused by the way I am compiling this test that causes the MKL library to access memory differently than how it is being allocated. Does this make sense? Any thoughts or ideas for me to check?

↧

Array slicing with MKL

October 28, 2015, 10:26 pm

Latest and popular articles on Intel Technologies

≫ Next: LAPACKE_zgeev - Eigenvalue - Eigenvector

≪ Previous: Strange sparse_matrix_checker() Error in MKL with Matlab MEX file

Hi there,

I am currently working on a two-dimensional ADI Crank-Nicolson algorithm using the pardiso and cspblas packages. In order to use the Alternating-Direction-Implicit (ADI) method, I need to solve a linear system of equations Ax = By, where x and y are single columns and single rows from a 2D matrix. However I am having difficulty thinking of a way to extract singular rows and columns from a matrix.

Is there a fast way I can slice an array using the MKL package such that I do not have to iterate over my array every step?

In other languages this can be done by indexing in such a way Ax[1,:] = By[1,:] for example.

Thanks,

Dylan

↧

LAPACKE_zgeev - Eigenvalue - Eigenvector

October 29, 2015, 6:30 am

Latest and popular articles on Intel Technologies

≫ Next: FEAST 3.0 and MKL

≪ Previous: Array slicing with MKL

Bonjour,

J'ai quelques difficultés a bien comprendre les résultats de l'exemple "LAPACKE_zgeev".

Dans l'exemple, c'est écrit que A*v(j) = lambda(j)*v(j)

ou A est la matrice initial, v(j) le right Eigenvector et lambda(j) le Eigenvalue.

Sauf que si je prends la matrice de l'exemple:

( -3.84,  2.25) ( -8.94, -4.75) (  8.95, -6.53) ( -9.87,  4.82)
( -0.66,  0.83) ( -4.40, -3.82) ( -3.50, -4.26) ( -3.15,  7.36)
( -3.99, -4.73) ( -5.88, -6.60) ( -3.36, -0.40) ( -0.75,  5.23)
(  7.74,  4.18) (  3.66, -7.53) (  2.58,  3.60) (  4.59,  5.41)

Le EigenValue de l'exemple:

( -9.43,-12.98) ( -3.44, 12.69) ( 0.11, -3.40) ( 5.76, 7.13)

Et le right EigenVector de l'exemple:

(  0.43,  0.33) (  0.83,  0.00) (  0.60,  0.00) ( -0.31,  0.03)
(  0.51, -0.03) (  0.08, -0.25) ( -0.40, -0.20) (  0.04,  0.34)
(  0.62,  0.00) ( -0.25,  0.28) ( -0.09, -0.48) (  0.36,  0.06)
( -0.23,  0.11) ( -0.10, -0.32) ( -0.43,  0.13) (  0.81,  0.00)

Alors A*v(j) n'est pas égal a lambda(j)*v(j), ce qui devrait etre le cas.

De plus, le resultats (EigenValue et EigenVectors) ne correspondent pas non plus a ce que Matlab me donne avec [V,D] = eig(A)

Est-ce que il y a une subtilité que je ne comprends pas?

merci

MarcB

↧

FEAST 3.0 and MKL

October 30, 2015, 3:39 am

Latest and popular articles on Intel Technologies

≫ Next: Pardiso Returning Wrong Results

≪ Previous: LAPACKE_zgeev - Eigenvalue - Eigenvector

Good day!

The new FEAST solver 3.0 has been recently released. The MKL 11.3 includes the previous FEAST 2.1 version. Do you plan to include the FEAST 3.0 in the forthcoming MKL releases? When could we expect it?

Probably there is an alternative solution for this problem. Is it possible to use both MKL 11.3 and FEAST 3.0 within the same solution? Recently I tried it, but obviously the linking stage failed.

↧

Pardiso Returning Wrong Results

October 31, 2015, 5:45 pm

Latest and popular articles on Intel Technologies

≫ Next: Random numbers with vdRngGaussianMV.

≪ Previous: FEAST 3.0 and MKL

Hello dear friends,

I am trying to solve a problem using the finite volume method and I am having some troubles configurating the Pardiso to solve my linear system.

The system that I am trying to solve has 12 elements, and it is given by:

ia = (/ 1, 10, 19, 28, 40, 52, 64, 76, 88, 100, 109, 118, 127 /)

jac = (/ 1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5, 6, 4, 5, 6, &
4, 5, 6, 7, 8, 9, 7, 8, 9, 7, 8, 9, 1, 2, 3, &
1, 2, 3, 1, 2, 3, 4, 5, 6, 4, 5, 6, 4, 5, 6, &
7, 8, 9, 7, 8, 9, 7, 8, 9, 10, 11, 12, 10, 11, 12, &
10, 11, 12, 1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5, 6, &
4, 5, 6, 4, 5, 6, 7, 8, 9, 7, 8, 9, 7, 8, 9, &
10, 11, 12, 10, 11, 12, 10, 11, 12, 4, 5, 6, 4, 5, 6, &
4, 5, 6, 7, 8, 9, 7, 8, 9, 7, 8, 9, 10, 11, 12, &
10, 11, 12, 10, 11, 12 /)

a = (/ 4.19829745d-09, -4.66477495d-06, -1.04957436d-07, -7.85398163d-10, &
0.00000000d+00, 3.36766430d-05, 3.92499666d-05, 7.14315468d-18, &
-1.37394825d-03, 0.00000000d+00, 0.00000000d+00, 1.04957436d-07, &
7.85398163d-10, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 1.37394825d-03, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, -1.04957436d-07, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, -1.37394825d-03, &
4.19829745d-09, -4.66477495d-06, 6.29744618d-08, -7.85398163d-10, &
0.00000000d+00, 2.02059858d-05, 3.92499666d-05, 7.14315468d-18, &
8.24368947d-04, 0.00000000d+00, 0.00000000d+00, -0.00000000d+00, &
7.85398163d-10, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, -6.29744618d-08, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
-8.24368947d-04, 4.19829745d-09, -9.32954989d-07, 2.09914873d-08, &
-7.85398163d-10, 0.00000000d+00, 6.73532860d-06, 3.92499666d-05, &
1.78578867d-18, 2.74789649d-04, 0.00000000d+00, 0.00000000d+00, &
-0.00000000d+00, 7.85398163d-10, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, -2.09914873d-08, -7.85398163d-10, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
-2.74789649d-04, 4.19829745d-09, -9.32954989d-07, 2.09914873d-08, &
7.85398163d-10, 0.00000000d+00, 6.73532860d-06, 3.92499666d-05, &
1.78578867d-18, 2.74789649d-04 /)

b = (/ 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
-0.00031416d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00, &
0.00000000d+00, 0.00000000d+00, 0.00000000d+00, 0.00000000d+00 /)

I have analyzed and solved this system with Matlab and the conditioning number is about 1E3, in others words, it is not too high that cannot be solvedd using direct solvers.

I am setting the Pardiso with this parameters:

maxfct=1
mnum=1
mtype=11 ! real and nonsymmetric
msglvl=1 ! NOT prints statistical information to the screen.
perm = 1

iparm=0
iparm(1) = 1 ! no solver default
iparm(2) = 2 ! fill-in reordering from METIS
iparm(3) = 1 ! numbers of processors
iparm(4) = 0 ! no iterative-direct algorithm
iparm(5) = 0 ! no user fill-in reducing permutation
iparm(6) = 0 ! =0 solution on the first n compoments of x
iparm(7) = 0 ! not in use
iparm(8) = 9 ! numbers of iterative refinement steps
iparm(9) = 0 ! not in use
iparm(10) = 13 ! perturbe the pivot elements with 1E-13
iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS
iparm(12) = 0 ! not in use
iparm(13) = 0 ! maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm(13) = 1 in case of inappropriate ccuracy
iparm(14) = 0 ! Output: number of perturbed pivots
iparm(15) = 0 ! not in use
iparm(16) = 0 ! not in use
iparm(17) = 0 ! not in use
iparm(18) = -1 ! Output: number of nonzeros in the factor LU
iparm(19) = -1 ! Output: Mflops for LU factorization
iparm(20) = 0 ! Output: Numbers of CG Iterations
iparm(27) = 1

phase=13 ! Analysis, numerical factorization, solve, iterative refinement
call pardiso(pt, maxfct, mnum, mtype, phase, m, a, ia, jac, perm, 1, iparm, msglvl, b, x, error)

phase=-1
call pardiso(pt, maxfct, mnum, mtype, phase, m, A, rown, col_n, perm, 1, iparm, msglvl, b, x, error)
call mkl_free_buffers

The thing is that when I compare the Pardiso result with the Matlab one, the results does not match because Pardiso is returning strange values.

Please Help me, I cannot continue my reserach without solving this problem.

Thanks,

Wagner Barros

↧

Random numbers with vdRngGaussianMV.

November 2, 2015, 4:28 am

Latest and popular articles on Intel Technologies

≫ Next: Problem using mpirun

≪ Previous: Pardiso Returning Wrong Results

Hi!

I want to generate random numbers from a multivariate normal distribution with parameters mu = [3.0 5.0 2.0] and sigma = [ 16.0 8.0 4.0; 8.0 13.0 17.0; 4.0 17.0 62.0].

I wrote a mex-code using the the function vdRngGaussianMV and the results of the simulation are

>> mean(out)
ans =
    3.4558    2.9934    3.3013
>> cov(out)
ans =
146.5081   -1.9068   -2.1787
   -1.9068 144.7461   -0.7771
   -2.1787   -0.7771 142.3955

Could you please tell me what I am doing wrong.

Thank you very much.

The code is the following:

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <mkl.h>
#include "mkl_vml.h"
#include "mex.h"
#include "matrix.h"
#include "mkl_vsl.h"
#include <time.h>

#define SEED time(NULL)
#define BRNG    VSL_BRNG_MCG31 // VSL basic generator to be used

double normalMVN(int npars, int N, double *cov, double *mean, double *out);


/* main fucntion */
void mexFunction(int nlhs,  mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
   
    int npars,  N;
    double *cov, *mean, *out;
   
    /* make pointers to input data */
    npars = (int)mxGetScalar(prhs[0]);
    N = (int)mxGetScalar(prhs[1]);
    cov = mxGetPr(prhs[2]);
    mean = mxGetPr(prhs[3]);
   
    /* make pointers to output data */
    plhs[0] = mxCreateDoubleMatrix( N, npars, mxREAL);
    out = mxGetPr(plhs[0]);
   
   
    /* call */
    normalMVN(npars, N, cov, mean, out);
   
    return 0;
   
}


double normalMVN(int npars, int N, double *cov, double *mean, double *out)
{
   
    /* This we need it for distributions */
    VSLSSTaskPtr task;
    VSLStreamStatePtr stream;
   
    char   uplo;
    int i, j, n, lda,  info;
    double *T, fiori[3];
   
    T = (double *)mxMalloc( npars*npars*sizeof( double* ) );
   
    uplo = 'L';
    n = npars;
    lda = npars;
   
    cblas_dcopy(npars*npars, cov, 1, T, 1);
   
    dpotrf( &uplo, &n, T, &lda, &info );
    if(info != 0){mexPrintf("c++ error: Cholesky failed\n\n");}
   
    vslNewStream( &stream, BRNG, SEED );
   
    vdRngGaussianMV( VSL_METHOD_DGAUSSIANMV_BOXMULLER2, stream, N, out, npars, VSL_MATRIX_STORAGE_FULL, mean, T );
  
    vslDeleteStream( &stream );
   
    /* Free memory */
    mxFree(T);
   
    return 0;
}

↧

Problem using mpirun

November 2, 2015, 11:39 am

Latest and popular articles on Intel Technologies

≫ Next: distributed parallel sparse matrix matrix multiplication

≪ Previous: Random numbers with vdRngGaussianMV.

Hello,
It's my first time posting in these forums and I hope that you can help me. I'm trying to run VASP (which uses FORTRAN 90) using Fedora 14 (Laughlin) on an Intel Xeon E5430 Processor and I run into the error message below:

[hamad@local 1_1_O_atom]$ mpirun /home2/hamad/VASPfiles/Vasp/vasp.5.3/vasp&
[1] 19852
[hamad@local 1_1_O_atom]$ vasp.5.3.3 18Dez12 (build Sep 24 2015 19:33:17) complex

POSCAR found : 1 types and 1 ions
/home2/hamad/VASPfiles/Vasp/vasp.5.3/vasp: symbol lookup error: /home2/hamad/VASPfiles/Vasp/vasp.5.3/vasp: undefined symbol: mkl_serv_set_progress_interface
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------

Using mpirun also yields the same error message as above.
If it helps this is the Processor information I get from the command line:

[hamad@local ~]$ cat /proc/cpuinfo | grep vendor | uniq
vendor_id : GenuineIntel
[hamad@local ~]$ cat /proc/'model name' | uniq
cat: /proc/model name: No such file or directory
[hamad@local ~]$ cat /proc/cpuinfo | grep 'model name' | uniq
model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
[hamad@local ~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 8
Thread(s) per core: 1
Core(s) per socket: 4
CPU socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Stepping: 6
CPU MHz: 2659.612
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 6144K
NUMA node0 CPU(s): 0-7

Any help towards understanding the meaning of the error message and how to fix it will be greatly appreciated. Thank you in advance for all your help.

↧

distributed parallel sparse matrix matrix multiplication

November 5, 2015, 6:33 am

Latest and popular articles on Intel Technologies

≫ Next: 3D DFT

≪ Previous: Problem using mpirun

I'm searching for codes for distributed parallel sparse matrix matrix multiplication. I found that Intel MKL has shared memory parallel sparse matrix matrix multiplication and distributed sparse solver. So, I'm wondering whether there are functions for distributed parallel sparse matrix matrix multiplication in Intel MKL.

↧

3D DFT

November 5, 2015, 9:44 am

Latest and popular articles on Intel Technologies

≫ Next: WordSize of SFMT19937

≪ Previous: distributed parallel sparse matrix matrix multiplication

I am trying to do a 3D FFT in MKL 11.3, but its not working for me... most likely I am doing something wrong, but I am at a loss.

I get an error when I attempt to call DftiCommitDescriptor. I tried two different cases and got two different errors. My code is shown below... Any help would be appreciated!

#include <mkl.h>
#include <stdio.h>

int main()

{

  DFTI_DESCRIPTOR_HANDLE dh;

  float *data;
  MKL_LONG sz[3];
  MKL_LONG dim = 3;
  int val;
  int error_code;

  sz[0] = 128;
  sz[1] = 128;
  sz[2] = 128;

  dim = 2;
  error_code = DftiCreateDescriptor(&dh, DFTI_SINGLE, DFTI_REAL, dim, sz);
  error_code = DftiCommitDescriptor(dh);
  printf("error_code: %d\n", error_code);

  dim = 3;
  error_code = DftiCreateDescriptor(&dh, DFTI_SINGLE, DFTI_REAL, dim, sz);
  error_code = DftiCommitDescriptor(dh);
  printf("error_code: %d\n", error_code);

  printf("%s\n", DftiErrorMessage(error_code));

  dim = 3;
  error_code = DftiCreateDescriptor(&dh, DFTI_SINGLE, DFTI_REAL, dim, sz);
  error_code = DftiSetValue(dh, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);
  error_code = DftiCommitDescriptor(dh);
  printf("error_code: %d\n", error_code);

  printf("%s\n", DftiErrorMessage(error_code));

}

Note that the first call works (dim is set to 2).

The second call fails (only difference is that dim is 3), indicating that the functionality is not implemented.

I read in the documentation that I should use DFTI_CONJUGATE_EVEN_STORAGE <- DFTI_COMPLEX_COMPLEX, so I tried this as well (third call), but it failed with the error "Inconsistent configuration parameters".

The output I get is as follows:

error_code: 0
error_code: 6
Intel MKL DFTI ERROR: Functionality is not implemented
error_code: 3
Intel MKL DFTI ERROR: Inconsistent configuration parameters

↧