cgemm3m, cgemm_compact AND cgemm give poor results for small problem 24*64

July 8, 2019, 2:33 am

Latest and popular articles on Intel Technologies

≫ Next: mkl_jit_create_cgemm on multi cores

≪ Previous: Any routines for sparse symmetric matrix-matrix production?

Hi all,

I using sequential API and direct call to multilply matrices.

C = 1*conj(A')*A

A is 64*24 and C is 24*24 both are complex matrix (complex8).

I have arrays of matrices: A_ARR (filled with random values) and C_ARR (filled with zeros) both array have 1000 matrices.

My application is pinned to sinlge core and to corresponding RAM by NUMA id.

build cmd: icc -c -g -ipo -ipp -Ofast -DMKL_DIRECT_SEQ -xCORE-AVX2 *.c

Setup is Xeon E5-2699A v4, 64G ram on each numa

I run cblas_cgemm/cblas_cgemm3m/mkl_cgemm_compact in a loop over A_ARR and C_ARR (each time only 1 function) and I get really poor results (I'm measuring only the matrices multiplication time)

I'm aware to the MKL "warn-up" issue and running cblas_cgemm in advance with measuring it time

cblas_cgemm(CblasRowMajor, CblasConjTrans, CblasNoTrans, m, n, k, &alpha, &A_ARR[i], m, &A_ARR[i], n, &beta, &C_ARR[i], m)

Gives- AVG 6.5ms MAX 8.6ms MIN 6.3ms

cblas_cgemm3m(CblasRowMajor, CblasConjTrans, CblasNoTrans, m, n, k, &alpha, &A_ARR[[i], m, &A_ARR[[i], n, &beta, &C_ARR[[i], m)

Gives- AVG 7.5ms MAX 12ms MIN 7.3ms

mkl_cgemm_compact(CblasRowMajor, CblasConjTrans, CblasNoTrans, m, n, k, &alpha, &a_arr_compact[[i], m, &a_arr_compact[i], n, &beta, &c_arr_compact[[i], m, COMPACT_FORMAT, 1)

Gives- AVG 225ms MAX 231ms MIN 224ms

Note COMPACT_FORMAT is from mkl_get_format_compact();

Does any one can assist me with reducing with time it takes?

It is also not clear to me why the compact API that should mostly vectorize matrices multiplication it getting lowest results

Thanks

Elad

↧

mkl_jit_create_cgemm on multi cores

July 10, 2019, 6:41 am

Latest and popular articles on Intel Technologies

≫ Next: Some subroutines of lapack in mkl gives Segmentation fault for matrix dimension larger than 1020

≪ Previous: cgemm3m, cgemm_compact AND cgemm give poor results for small problem 24*64

Hi all,

I want to use mkl_jit_create_cgemm on my setup where each thread is pinned to single core.

In each thread I'll do the cgemm with the created Jitter.

Do I need to create jit kernel specific for each thread? or create just 1 and use it on all calling threads?

Elad

↧

Some subroutines of lapack in mkl gives Segmentation fault for matrix dimension larger than 1020

July 11, 2019, 1:28 am

Latest and popular articles on Intel Technologies

≫ Next: MKL FFT Error in Example Code in Linux

≪ Previous: mkl_jit_create_cgemm on multi cores

OS: Ubuntu 18.04.2 server, kernel 4.15.0-50-generic. MKL provided by intel parallel studio xe 2019.3.

The C code reproducing the error

#include <stdlib.h>
#include <stdio.h>
#include <mkl.h>

#define N 1021 

int main(){
printf("begin main\n");
double m[N],a[N*N];
lapack_int n1=N, n2=N;
for(size_t i=0;i<N*N;i++){
if(i%(N+1)==0){
a[i]=(double) rand()/RAND_MAX;
}
else {a[i]=0;}
}
lapack_int info;
printf("begin lapack\n");
info = LAPACKE_dsyevd(LAPACK_ROW_MAJOR, 'V','U', n1, a, n2, m);
printf("%d: end lapack\n", info);
return 0;
}

This code gave correct results for matrix smaller than 1020*1020, while gave Segmentation fault when N>1020. The error persists for icc, gcc, linking options as simple as -mkl for icc or full linking and compiling options as advised by linking advisor. dsteqr routine seems have similar issues.

I am new to directly using lower level routines in MKL, so there may also be some problems in my code above. Though it works well for smaller matrix.

Thanks in advance.

↧

MKL FFT Error in Example Code in Linux

July 15, 2019, 8:43 am

Latest and popular articles on Intel Technologies

≫ Next: MKL FFT Fortran in Test Code - Transformed Values are not correct

≪ Previous: Some subroutines of lapack in mkl gives Segmentation fault for matrix dimension larger than 1020

Hello!
I was trying to learn about the MKL FFT libraries and wanted to check how the example programs ran, but when I check the output files, it says there is an error, status = 2 while trying to create the descriptor for a 1D array with Double Precision.

I'm attaching the example code (basic_dp_complex_dft_1d.f90) and my executable link (link1.txt - might have to use chmod 700) for the example code. Not sure if I am linking the program properly or not.

I am not able to figure out if I am wrong in the executable link file - which I had used from the MKL Link Advisor - or if the MKL is installed properly. Could you please help me out with why I am not able to get the proper output for the example code?

Thanks!

Attachment	Size
Download basic_dp_complex_dft_1d.f90	4.67 KB
Download link1.txt	451 bytes

↧

MKL FFT Fortran in Test Code - Transformed Values are not correct

July 16, 2019, 10:41 am

Latest and popular articles on Intel Technologies

≫ Next: Cannot use MKL F95 eigensolver routines

≪ Previous: MKL FFT Error in Example Code in Linux

Hello!

I was trying to learn about the MKL's FFT Function and wrote a small 1D program to show the forward transformed values. From my understanding, the arguments for the forward function - mentioned in the comment in the program (italicised and not in bold) - should be right and it should perform fine.

When I try to run this, I get 0 as my transformed value (the array y1). Now I am confused about why that happens and what the error in the code is for me to get this error.

Could someone help me to understand the issue?

I am attaching the program file(test.f90), the executable link file (link1.txt - might have to use chmod 700) and the output text file (text.txt).

Thanks!

- Adhyanth

PROGRAM test
! IMPLICIT NONE
USE MKL_DFTI
REAL(KIND = DFTI_DPKP), DIMENSION(201) :: x,y,z
COMPLEX(KIND = DFTI_DPKP), DIMENSION(201) ::x1, y1
INTEGER :: i, status
TYPE(DFTI_DESCRIPTOR), POINTER :: plan
plan => null()
DO i=1,201
x(i) = REAL(i-101)/100
y(i)= 5*SIN(2*x(i))
END DO

OPEN(9, file='text.txt', form = 'FORMATTED')
WRITE(9,*) 'Y values'
WRITE(9,*) y
status = dfti_create_descriptor_1d(plan, DFTI_DOUBLE, DFTI_COMPLEX, 1, 200)
status = dfti_commit_descriptor_external(plan)
DO i=1,201
status = dfti_compute_forward_dz(plan, y(i), y1(i))
!Using the(desc,xreinout,ximinout) argument format
ENDDO
status = dfti_free_descriptor_external(plan)
WRITE(9,*) 'Y1 Values'
WRITE(9,*) y1
WRITE(9,*) 'Y values - After Transform'
WRITE(9,*) y

END PROGRAM test

Attachment	Size
Download text.txt	19.22 KB
Download test.f90	805 bytes
Download link1.txt	223 bytes

↧

Cannot use MKL F95 eigensolver routines

July 16, 2019, 5:50 pm

Latest and popular articles on Intel Technologies

≫ Next: Different LU factorization result between libmkl_intel_ilp64.a and libmkl_intel_lp64.a

≪ Previous: MKL FFT Fortran in Test Code - Transformed Values are not correct

I am using Intel Parallel Studio XE with Visual Studio. I would like to use the heev()/heevr() routines. I have written a simple code to test it, and I have enabled the "Use MKL libraries" option in Project Properties:

include 'lapack.f90'
program heev_test
    use lapack95
    implicit none
    integer , parameter :: dp = kind(0.0d0)
    complex(dp) :: matrix(4,4)
    real(dp) :: eigs(4)
    integer :: i
    matrix = (1.0_dp,0.0_dp)
    call heevr(matrix, eigs, info=i)
    print*, i
    print*, eigs
    read(*,*)
    stop  
end program

This code produces the error:

error LNK2019: unresolved external symbol _ZHEEVR_F95 referenced in function _MAIN__		
fatal error LNK1120: 1 unresolved externals

So the linker is not finding the ZHEEVR_F95 subroutine whose interface is provided in the lapack.f90 file. Replacing HEEVR with HEEV produces the same results. I should say that I am also using the VSL and DFT libraries without any issue.

I noticed in the documentation for ?heev it says to include the mkl.fi file (which I don't normally include for the VSL or DFT libraries), but doing so produces the following errors:

Error       error #6218: This statement is positioned incorrectly and/or has syntax errors.     C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\\mkl\include\lapack.f90    21  
Error       error #6790: This is an invalid statement; an END [PROGRAM]  statement is required.     C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\\mkl\include\lapack.f90    24  
Error       error #6785: This name does not match the unit name.   [F95_PRECISION]      C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\\mkl\include\lapack.f90    24  
Error       Compilation Aborted (code 1)        
Warning     warning #5427: Program may contain only one main entry routine

The line numbers are referring to these lines from the lapack.f90 file:

21  MODULE F95_PRECISION
22      INTEGER, PARAMETER :: SP = KIND(1.0E0)
23      INTEGER, PARAMETER :: DP = KIND(1.0D0)
24  END MODULE F95_PRECISION

which doesn't really tell me much.

I need to figure out why the linker can't find the f95 subroutines, any help appreciated.

EDIT:

I tried using the F77 interface with the following code:

include 'lapack.f90'
program heev_test
    use lapack95
    implicit none
    integer , parameter :: dp = kind(0.0d0)
    complex(dp) :: matrix(4,4)
    real(dp) :: eigs(4)
    integer :: i
    matrix = (1.0_dp,0.0_dp)
    call eigenvalues(matrix,eigs)
    print*, eigs
    read(*,*)
    stop  
    
contains
    
    subroutine eigenvalues(H,eigs)
        complex(dp) :: H(:,:)
        real(dp)    :: eigs(:)
        integer   :: d
        integer   :: info
        integer   :: lwork, liwork
        real(dp), allocatable  :: work(:)
        integer , allocatable  :: iwork(:)
        
        d = size(eigs)
        
        lwork = 4*d
        liwork = 10
        allocate(work(lwork))
        allocate(iwork(liwork))
        
        call zheevr('N','U',d,H,d,eigs,work,-1,iwork,-1,info)
        
        lwork = work(1)
        liwork = iwork(1)
        deallocate(work,iwork)        
        allocate(work(lwork))
        allocate(iwork(liwork))
        
        call zheevr('N','U',d,H,d,eigs,work,lwork,iwork,liwork,info)
        
        deallocate(work,iwork)
        
        if (info /= 0) then
            print*, "diagonalization failed, info = ", info
            read(*,*)
            stop
        end if
                
    end subroutine
end program

but this causes a seg fault as soon as zheevr is called, and now I have absolutely no idea what is going on.

↧

Different LU factorization result between libmkl_intel_ilp64.a and libmkl_intel_lp64.a

July 12, 2019, 7:03 am

Latest and popular articles on Intel Technologies

≫ Next: Different BLAS performance between CMake dynamic and manuly static link

≪ Previous: Cannot use MKL F95 eigensolver routines

I am a novice in the field of Intel Math Kernel Library. However, when I tried to use the LU factorization function from LAPACK by compiling different libraries, I got different results as the following.

My C++ code is as simple as the following.

#include <mkl.h>
#include <iostream>
#include <ctime> // For time()
#include <cstdlib>

int main()
{

srand(time(0));

double a[441];
for (int i = 0; i < 441; i++)
a[i] = (double) rand() / RAND_MAX;
int C = 21;

lapack_int m1 = C;
lapack_int n1 = C;
lapack_int lda1 = C;
lapack_int ipiv[C];
lapack_int info1 = LAPACKE_dgetrf(LAPACK_COL_MAJOR, m1, n1, a, lda1, ipiv);

std::cout << "Info1 is "<< info1 << std::endl;

info1 = LAPACKE_dgetri(LAPACK_COL_MAJOR, m1, a, lda1, ipiv);

std::cout << "Info1 is "<< info1 << std::endl;
return 0;
}

I could get no error by linking library "libmkl_intel_lp64.a", but I got a "Segmentation fault (core dumped)" by linking library "libmkl_intel_ilp64.a". I have checked the intel official document which explains the difference between the two libraries.

There are mainly two difference mentioned in that document:

1. Support large data arrays (with more than 231-1 elements)

2.Enable compiling your Fortran code with the -i8 compiler option

I do not understand why I even got error by using libmkl_intel_ilp64.a. What's more, sometimes I also meet an error "tack smashing detected" by linking library "libmkl_intel_lp64.a" but "libmkl_intel_ilp64.a" is OK. Could you please help me figure it out? Thank you so much!

↧

Different BLAS performance between CMake dynamic and manuly static link

July 12, 2019, 7:22 am

Latest and popular articles on Intel Technologies

≫ Next: How to get Cholesky diagonal?

≪ Previous: Different LU factorization result between libmkl_intel_ilp64.a and libmkl_intel_lp64.a

I tried to compile my program by using CMake at first and my CMakeLists file is as the following.

cmake_minimum_required(VERSION 3.11)
project(LMMNET LANGUAGES CXX)

include(CheckCXXCompilerFlag)
CHECK_CXX_COMPILER_FLAG("-std=c++11" COMPILER_SUPPORTS_CXX11)

set(CMAKE_CXX_FLAGS "-O2 -msse -msse2")
#set(CMAKE_CXX_COMPILER "icpc")

set(Boost_USE_STATIC_LIBS ON)
set(Boost_USE_MULTITHREADED ON)

find_package(Boost 1.58.0 COMPONENTS program_options REQUIRED)
find_package(BLAS REQUIRED)
find_package(LAPACK REQUIRED)

target_link_libraries(dataIO PUBLIC ${Boost_LIBRARIES})
target_link_libraries(DataMatrix PUBLIC DataUtils
${Boost_LIBRARIES}
${LAPACK_LIBRARIES})
target_link_libraries(LMMNET PUBLIC dataIO
DataMatrix)

I can link the Intel Math Kernel Library correctly and the program works well. However, I find it is interesting that when I link the Intel Math Kernel Library following the official instruction as

-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_gnu_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -lgomp -lpthread -lm -ldl.

The performance of my program is much better than ever. My program even run faster four times than before. (I just do the matrix-matrix multiplication). Why it comes for that?

↧

How to get Cholesky diagonal?

July 18, 2019, 7:10 am

Latest and popular articles on Intel Technologies

≫ Next: Bug in mkl_sparse_d_mm

≪ Previous: Different BLAS performance between CMake dynamic and manuly static link

Hello

I need to extract diagonal from cholesky LDLt sparse factorization. Can I do this with Intel MKL?

I can use pardiso to factor and solve sparse linear system. But cholesky factor matrix itself is unreachable for me.

Thanks

↧

Bug in mkl_sparse_d_mm

July 18, 2019, 2:45 pm

Latest and popular articles on Intel Technologies

≫ Next: Possible issue in MKL LAPACKE interface when using LAPACK_ROW_MAJOR

≪ Previous: How to get Cholesky diagonal?

Hi All,

I think that the result of "mkl_sparse_d_mm" is incorrect as evident for the following code. The code basically tries to multiply the Identity matrix stored in CSR format with a dense matrix and checks whether the result and the input matrix are the same. This also the case in ROW MAJOR layout as well.

int main() {

   MKL_INT n = 5;
   MKL_INT rows_start[5] = {0,1,2,3,4};
   MKL_INT rows_end[5]   = {1,2,3,4,5};
   MKL_INT col_indx[5]   = {0,1,2,3,4};
   double values[5]     = {1,1,1,1,1};
   sparse_matrix_t       csrA = NULL;
   sparse_index_base_t    indexing;
   struct matrix_descr    descr_type_gen;
   descr_type_gen.type = SPARSE_MATRIX_TYPE_GENERAL;
   mkl_sparse_d_create_csr ( &csrA, SPARSE_INDEX_BASE_ZERO, n, n, rows_start, rows_end, col_indx, values);
   MKL_INT row, col;
   sparse_index_base_t indextype;
   MKL_INT * bi, *ei, *indx;
   double *rv;
   mkl_sparse_d_export_csr(csrA, &indextype, &row, &col, &bi, &ei, &indx, &rv);
   printf("Input csr matrix\n");
   for(long i=0; i < n; i++) {
        printf("%ld ----> ",i);
        for(long l = 0; l < n; l++) {
                bool flag = false;
                for(long j=bi[i]; j < ei[i]; j++) {
                        if(indx[j] == l) {
                                flag = true;
                                printf("%.0lf ",rv[j]);
                                break;
                        }
                }
                if(!flag)
                        printf("%d ",0);

        }       printf("\n");
    }
    double *matrix = (double *) mkl_malloc ( n*2*sizeof(double), ALIGN);
    double *result = (double *) mkl_calloc ( n*2, sizeof(double), ALIGN);
    for(long i=0; i < n; i++) {
        matrix[i] = i+1;
        matrix[i + n] = n-i;
    }
    printf("Input dense matrix\n");
    for(long i=0; i < n; i++) {
        for(long j=0 ; j < 2; j++) {
                printf("%.2lf ", matrix[i + j*n]);
        }
        printf("\n");
    }
    mkl_sparse_d_mm( SPARSE_OPERATION_NON_TRANSPOSE, 1, csrA, descr_type_gen, SPARSE_LAYOUT_COLUMN_MAJOR, matrix, 2, n, 0, result, n);
  printf("Outpur result matrix\n");
  for(long i=0; i < n; i++) {
        for(long j=0 ; j < 2; j++) {
                printf("%.2lf ", result[i + j*n]);
        }
        printf("\n");
  }
  mkl_free(matrix);
  mkl_free(result);

  return 0;
}

The output of the above code is :-

Input csr matrix
0 ----> 1 0 0 0 0
1 ----> 0 1 0 0 0
2 ----> 0 0 1 0 0
3 ----> 0 0 0 1 0
4 ----> 0 0 0 0 1
Input dense matrix
1.00 5.00
2.00 4.00
3.00 3.00
4.00 2.00
5.00 1.00
Output result matrix
1.00 5.00
0.00 0.00
1.00 5.00
0.00 0.00
2.00 4.00

I run the above with the following options:

icc -I. -I/opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/include -DMKL_ILP64 -Wall -O3 -qopenmp -qopenmp-simd -mkl=parallel -std=c++11 -Wno-attributes mkl_sparse_d_mm.cpp -o mkl_sparse_mm -L/opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64_lin -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl

and I also tried with a different intel mkl version as follows:

icc -I. -I/opt/intel/compilers_and_libraries_2018.5.274/linux/mkl/include -DMKL_ILP64 -Wall -O3 -qopenmp -qopenmp-simd -mkl=parallel -std=c++11 -Wno-attributes mkl_sparse_d_mm.cpp -o mkl_sparse_mm -L/opt/intel/compilers_and_libraries_2018.5.274/linux/mkl/lib/intel64_lin -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl

↧

Possible issue in MKL LAPACKE interface when using LAPACK_ROW_MAJOR

July 21, 2019, 7:32 pm

Latest and popular articles on Intel Technologies

≫ Next: Discontinued Xeon Phi support

≪ Previous: Bug in mkl_sparse_d_mm

MKL provided by Intel parallel studio xe 2019.3.

The minimal reproducing code is below:

#include <stdlib.h>
#include <stdio.h>
#include "mkl_lapacke.h"

#define N 2

int main(){
double m[N];
MKL_INT n1=N, n2=N;
MKL_INT info;
double a[N*N] = {0, 1,
                 1, 0};
info = LAPACKE_dsyevd(LAPACK_ROW_MAJOR, 'V','U', n1, a, n2, m);
printf("the eigenvalue is: %f,%f\n", m[0],m[1]);
printf("the first eigenvector is: %f,%f\n",a[0],a[2]);
printf("the second eigenvector is: %f,%f\n",a[1],a[3]);
return 0;
}

The returned eigenvectors is somehow wrong when LAPACK_ROW_MAJOR is specified, however, everything is fine when changing to LAPACK_COL_MAJOR.

The stdout with export MKL_VERBOSE=1 is shown below:

1. COL_MAJOR (correct results)

MKL_VERBOSE Intel(R) MKL 2019.0 Update 3 Product build 20190125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.20GHz lp64 intel_thread
MKL_VERBOSE DSYEVD(V,U,2,0x7ffee9463680,2,0x7ffee94636a0,0x7ffee9463620,-1,0x7ffee9463628,-1,0) 27.85ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:28
MKL_VERBOSE DSYEVD(V,U,2,0x7ffee9463680,2,0x7ffee94636a0,0x22ce000,21,0x22cdf00,13,0) 243.59us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:28
the eigenvalue is: -1.000000,1.000000
the first eigenvector is: -0.707107,0.707107
the second eigenvector is: 0.707107,0.707107

2. ROW_MAJOR (wrong results)

MKL_VERBOSE Intel(R) MKL 2019.0 Update 3 Product build 20190125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.20GHz lp64 intel_thread
MKL_VERBOSE DSYEVD(V,U,2,0x7ffd6a6a3000,2,0x7ffd6a6a3020,0x7ffd6a6a2fa0,-1,0x7ffd6a6a2fa8,-1,0) 27.70ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:28
MKL_VERBOSE DSYEVD(V,U,2,0xd26100,2,0x7ffd6a6a3020,0xd26000,21,0xd25f00,13,0) 247.73us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:28
the eigenvalue is: -1.000000,1.000000
the first eigenvector is: -0.707107,1.000000
the second eigenvector is: 0.707107,0.707107

Pay attention to the difference on the first eigenvector.

↧

Discontinued Xeon Phi support

July 20, 2019, 5:36 am

Latest and popular articles on Intel Technologies

≫ Next: Questions about iterative sparse solvers

≪ Previous: Possible issue in MKL LAPACKE interface when using LAPACK_ROW_MAJOR

Hello,

Whether Intel will support Xeon in future versions of compilers(+MKL)?

Malik.

↧

Questions about iterative sparse solvers

July 22, 2019, 9:02 am

Latest and popular articles on Intel Technologies

≫ Next: MKL Packed Storage scheme for matrix with many zoroes

≪ Previous: Discontinued Xeon Phi support

Hello,

I would like to use ISS from MKL for solving linear systems of equations, originating from discretization of partial differential equations used in CFD.

I found out that CG and FGMRES are available, but not everything is clear from the documentation:

Can ISS be threaded? I am using Pardiso and it can be made parallel easily.
Is there any example besides those in \example\solverf\source that would demonstrate how the matrix of the linear system is constructed and used by ISS? I might have overlooked information about this and the matrix format (CSR?) in the documentation.
Is there any example that would show how preconditioning should be done? There is a general description of this in the documentation, but it does not feel sufficient to make sure the implementation is correct, e.g. in case of "apply the preconditioner C inverse to tmp".

By the way, can Pardiso as a direct solver work also in an iterative "mode", meaning that it would not calculate the solution, but just a user-defined number of iterations?

Thank you.

↧

MKL Packed Storage scheme for matrix with many zoroes

July 24, 2019, 2:24 am

Latest and popular articles on Intel Technologies

≫ Next: Maintaining 15+ year old software package, MKL issue support

≪ Previous: Questions about iterative sparse solvers

I have a very large matrix but many of its members are zero but it does not have an specific shape like being banded or etc.

Is there any possiblity that I can reduce memory usage for storing this matrix for using MKL to solve marix operations?

Currently for a not very large model I need more than 32GB of RAM.

↧

Maintaining 15+ year old software package, MKL issue support

July 25, 2019, 6:20 am

Latest and popular articles on Intel Technologies

≫ Next: Sparse Complex Matrix Multiplication

≪ Previous: MKL Packed Storage scheme for matrix with many zoroes

Hello,

I've taken over an older software package my company wants to add some functionality to. We don't have funds or time to update all aspects of it, so as is, with new modules is the goal.

I have two interesting situations.

Running Windows 10

1) The software package runs perfectly fine, with the exception that once it gets to an MKL call it just aborts the entire program. All aspects of the software works, up until a single button press that calls its first MKL call. Now this package is running an older version of the MKL, from 2010 I believe.

2) So attempting to update the MKl, I installed the latest release. I integrated it into Visual studio 2019, selected "Sequential" use of the intel MKL and finally rebuilt the package. Now when I run the application it runs just fine. I hit the button that previously would abort the program and it operates fine. Now, here is the issue. It runs fine only when I run it within the IDE. If I run the executable directly, outside of the IDE, then it shuts down at the same place it does in the first situation.

setup

1) in this situation I have the following configuration to reference the MKL

Environmental variable PATH pointing to where the follow dll's are located
- mkl_core
- mkl_intel_thread
- mkl_sequential
visual studio configuration
- C/C++
  - General
    - Additional Libraries
      - pointing at the \mkl\include directory
- Linker
  - General
    - Additional Libraries
      - point at the \mkl\lib\intel64 directory
    - Additional Dependencies
      - mkl_intel_lp64_dll.lib
      - mkl_intel_thread_dll.lib
      - mkl_core_dll.lib

2) In this second situation, I have all of the above. I attempted to peel away those settings but it fails to compile without them. Only difference is

configuration Properties
- Intel Performance Library
  - Use Intel MKL Sequential

I know this is probably very limited information, but I've been banging my head against a wall for some time now.

↧

Sparse Complex Matrix Multiplication

July 25, 2019, 5:24 pm

Latest and popular articles on Intel Technologies

≫ Next: Error in one LAPACK example

≪ Previous: Maintaining 15+ year old software package, MKL issue support

Hello,

Let's say A and B are sparse matrices with complex values, which are stored in sparse handles (CSR format). I want to calculate D = A x conjugate(B), where conjugate is the complex conjugate of matrix B entries. Is there an efficient way to perform this multiplication? Right now, I have to export the sparse handle that contains matrix B to a CSR 3-array format, change the values to complex conjugate, create the handle again, and then perform the multiplication. I really appreciate if you can provide a more efficient solution.

Thank you,

Afshin

↧

Error in one LAPACK example

July 25, 2019, 5:39 am

Latest and popular articles on Intel Technologies

≫ Next: How to do an element-wise multiplication between two 3D-arrays?

≪ Previous: Sparse Complex Matrix Multiplication

Hi,

I've located an error in one of your FORTRAN 77 interface to the LAPACK Single Value Decomposition routines (DGESVD). This is located in https://software.intel.com/sites/products/documentation/doclib/mkl_sa/11...

The following two lines must be added. (1) in variable declaration and (2) just before calling DGESVD.

(1)

DOUBLE PRECISION Asvd( size(A,1), size(A,2) )

(2)

Asvd = A

Best wishes,

Mohamed Ali Al-Badri

↧

How to do an element-wise multiplication between two 3D-arrays?

July 26, 2019, 5:49 am

Latest and popular articles on Intel Technologies

≫ Next: MKL_SPARSE_?_QR invalid input error (Fortran)

≪ Previous: Error in one LAPACK example

Hi, everyone.

I am trying to do some element-wise multiplication between two 3D arrays/matrices and then calculate the sum of all the elements returned. Something like this:

res = 0
do k = kstart,kend
do j = jstart, jend
do i = istart, iend
res = res + A(i, j, k) * B(i, j ,k)
end do 
end do
end do

However, such multiplication is not applied to all the elements in A or B but the elements that were involved between the iteration variables. To make the calculation faster, I tried the ddot function in MKL like this:

n = (kend - kstart + 1) * (jend - jstart + 1) * (iend - istart + 1)
res = ddot(n, A(istart:iend, jstart:jend, kstart:kend), 1, B(istart:iend, jstart:jend, kstart:kend), 1)

But in this way I couldn't get the same result as I did by using three do-loops.

Anyone who might tell me where the problem is?

Thanks :)

↧

MKL_SPARSE_?_QR invalid input error (Fortran)

July 30, 2019, 11:32 am

Latest and popular articles on Intel Technologies

≫ Next: Pardiso 32-bit vs 64-bit performance

≪ Previous: How to do an element-wise multiplication between two 3D-arrays?

Hello,

I am experimenting the Sparse QR library to solve a linear system of equations. When I run the attached file, I get the following output:

The last output "3" indicates "SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value."

Can anyone help?

Attachment	Size
Download test_sparse_qr.f90	1.2 KB

↧

Pardiso 32-bit vs 64-bit performance

July 31, 2019, 2:35 am

Latest and popular articles on Intel Technologies

≫ Next: Intel Open Source License

≪ Previous: MKL_SPARSE_?_QR invalid input error (Fortran)

Using Pardiso in an iterative routine, where the Pardiso solver is called many times, I noticed the 64-bit version is significantly slower than the 32-bit version. I'm not sure yet if it depends by Pardiso (only) or by something else, then I was wondering if do such a performance comparison exist?

Is there any way to investigate if the speed problem could concern the 64-bit version of Pardiso and if it is possible to improve its performance?

Thanks in advance,

Daniele

↧