Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 2652 articles
Browse latest View live

cgemm3m, cgemm_compact AND cgemm give poor results for small problem 24*64

$
0
0

Hi all,

I using sequential API and direct call to multilply matrices.

 C = 1*conj(A')*A

A is 64*24 and C is 24*24 both are complex matrix (complex8).

I have arrays of matrices: A_ARR (filled with random values) and C_ARR (filled with zeros) both array have 1000 matrices.

My application is pinned to sinlge core and to corresponding RAM by NUMA id.

build cmd: icc -c -g -ipo -ipp -Ofast -DMKL_DIRECT_SEQ -xCORE-AVX2 *.c

Setup is Xeon E5-2699A v4,  64G ram on each numa

I run cblas_cgemm/cblas_cgemm3m/mkl_cgemm_compact in a loop over A_ARR and C_ARR (each time only 1 function) and I get really poor results (I'm measuring only the matrices multiplication time) 

I'm aware to the MKL "warn-up" issue and running cblas_cgemm in advance with measuring it time

cblas_cgemm(CblasRowMajor, CblasConjTrans, CblasNoTrans, m, n, k, &alpha, &A_ARR[i], m, &A_ARR[i], n, &beta, &C_ARR[i], m)

Gives- AVG 6.5ms MAX 8.6ms MIN 6.3ms

cblas_cgemm3m(CblasRowMajor, CblasConjTrans, CblasNoTrans, m, n, k, &alpha, &A_ARR[[i], m, &A_ARR[[i], n, &beta, &C_ARR[[i], m)

Gives- AVG 7.5ms MAX 12ms MIN 7.3ms

mkl_cgemm_compact(CblasRowMajor, CblasConjTrans, CblasNoTrans, m, n, k, &alpha, &a_arr_compact[[i], m, &a_arr_compact[i], n, &beta, &c_arr_compact[[i], m, COMPACT_FORMAT, 1) 

Gives- AVG 225ms MAX 231ms MIN 224ms

Note COMPACT_FORMAT is from mkl_get_format_compact();

Does any one can assist me with reducing with time it takes? 

It is also not clear to me why the compact API that should mostly vectorize  matrices multiplication it getting lowest results

 

Thanks

 

Elad

 

 


mkl_jit_create_cgemm on multi cores

$
0
0

Hi all,

I want to use mkl_jit_create_cgemm on my setup where each thread is pinned to single core.

In each thread I'll do the cgemm with the created Jitter.

Do I need to create jit kernel specific for each thread? or create just 1 and use it on all calling threads?

Elad

 

Some subroutines of lapack in mkl gives Segmentation fault for matrix dimension larger than 1020

$
0
0

OS: Ubuntu 18.04.2 server, kernel 4.15.0-50-generic. MKL provided by intel parallel studio xe 2019.3.

The C code reproducing the error

#include <stdlib.h>
#include <stdio.h>
#include <mkl.h>

#define N 1021 

int main(){
printf("begin main\n");
double m[N],a[N*N];
lapack_int n1=N, n2=N;
for(size_t i=0;i<N*N;i++){
if(i%(N+1)==0){
a[i]=(double) rand()/RAND_MAX;
}
else {a[i]=0;}
}
lapack_int info;
printf("begin lapack\n");
info = LAPACKE_dsyevd(LAPACK_ROW_MAJOR, 'V','U', n1, a, n2, m);
printf("%d: end lapack\n", info);
return 0;
}

This code gave correct results for matrix smaller than 1020*1020, while gave Segmentation fault when N>1020. The error persists for icc, gcc, linking options as simple as -mkl for icc or full linking and compiling options as advised by linking advisor. dsteqr routine seems have similar issues.

I am new to directly using lower level routines in MKL, so there may also be some problems in my code above. Though it works well for smaller matrix. 

Thanks in advance.

MKL FFT Error in Example Code in Linux

$
0
0

Hello!
I was trying to learn about the MKL FFT libraries and wanted to check how the example programs ran, but when I check the output files, it says there is an error, status = 2 while trying to create the descriptor for a 1D array with Double Precision.

I'm attaching the example code (basic_dp_complex_dft_1d.f90) and my executable link (link1.txt - might have to use chmod 700) for the example code. Not sure if I am linking the program properly or not. 

I am not able to figure out if I am wrong in the executable link file - which I had used from the MKL Link Advisor - or if the MKL is installed properly. Could you please help me out with why I am not able to get the proper output for the example code?

Thanks!

MKL FFT Fortran in Test Code - Transformed Values are not correct

$
0
0

Hello!

I was trying to learn about the MKL's FFT Function and wrote a small 1D program to show the forward transformed values. From my understanding, the arguments for the forward function - mentioned in the comment in the program (italicised and not in bold) - should be right and it should perform fine. 

When I try to run this, I get 0 as my transformed value (the array y1). Now I am confused about why that happens and what the error in the code is for me to get this error. 

Could someone help me to understand the issue? 

I am attaching the program file(test.f90), the executable link file (link1.txt - might have to use chmod 700) and the output text file (text.txt).

Thanks!

- Adhyanth

PROGRAM test
!   IMPLICIT NONE
   USE MKL_DFTI
   REAL(KIND = DFTI_DPKP), DIMENSION(201) :: x,y,z
   COMPLEX(KIND = DFTI_DPKP), DIMENSION(201) ::x1, y1
   INTEGER :: i, status
   TYPE(DFTI_DESCRIPTOR), POINTER :: plan
   plan => null()
DO i=1,201
   x(i) = REAL(i-101)/100
   y(i)= 5*SIN(2*x(i))
END DO

OPEN(9, file='text.txt', form = 'FORMATTED')
WRITE(9,*) 'Y values'
WRITE(9,*) y
status = dfti_create_descriptor_1d(plan, DFTI_DOUBLE, DFTI_COMPLEX, 1, 200)
status = dfti_commit_descriptor_external(plan)
DO i=1,201
status = dfti_compute_forward_dz(plan, y(i), y1(i))

!Using the(desc,xreinout,ximinout) argument format
ENDDO
status = dfti_free_descriptor_external(plan)
WRITE(9,*) 'Y1 Values'
WRITE(9,*) y1
WRITE(9,*) 'Y values - After Transform'
WRITE(9,*) y

END PROGRAM test

AttachmentSize
Downloadtext/plaintext.txt19.22 KB
Downloadapplication/octet-streamtest.f90805 bytes
Downloadtext/plainlink1.txt223 bytes

Cannot use MKL F95 eigensolver routines

$
0
0

I am using Intel Parallel Studio XE with Visual Studio. I would like to use the heev()/heevr() routines. I have written a simple code to test it, and I have enabled the "Use MKL libraries" option in Project Properties:

include 'lapack.f90'
program heev_test
    use lapack95
    implicit none
    integer , parameter :: dp = kind(0.0d0)
    complex(dp) :: matrix(4,4)
    real(dp) :: eigs(4)
    integer :: i
    matrix = (1.0_dp,0.0_dp)
    call heevr(matrix, eigs, info=i)
    print*, i
    print*, eigs
    read(*,*)
    stop  
end program

This code produces the error:

error LNK2019: unresolved external symbol _ZHEEVR_F95 referenced in function _MAIN__		
fatal error LNK1120: 1 unresolved externals		

So the linker is not finding the ZHEEVR_F95 subroutine whose interface is provided in the lapack.f90 file. Replacing HEEVR with HEEV produces the same results. I should say that I am also using the VSL and DFT libraries without any issue. 

I noticed in the documentation for ?heev it says to include the mkl.fi file (which I don't normally include for the VSL or DFT libraries), but doing so produces the following errors: 

Error       error #6218: This statement is positioned incorrectly and/or has syntax errors.     C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\\mkl\include\lapack.f90    21  
Error       error #6790: This is an invalid statement; an END [PROGRAM]  statement is required.     C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\\mkl\include\lapack.f90    24  
Error       error #6785: This name does not match the unit name.   [F95_PRECISION]      C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\\mkl\include\lapack.f90    24  
Error       Compilation Aborted (code 1)        
Warning     warning #5427: Program may contain only one main entry routine

The line numbers are referring to these lines from the lapack.f90 file:

21  MODULE F95_PRECISION
22      INTEGER, PARAMETER :: SP = KIND(1.0E0)
23      INTEGER, PARAMETER :: DP = KIND(1.0D0)
24  END MODULE F95_PRECISION

which doesn't really tell me much.

I need to figure out why the linker can't find the f95 subroutines, any help appreciated. 

EDIT:

I tried using the F77 interface with the following code:

 

include 'lapack.f90'
program heev_test
    use lapack95
    implicit none
    integer , parameter :: dp = kind(0.0d0)
    complex(dp) :: matrix(4,4)
    real(dp) :: eigs(4)
    integer :: i
    matrix = (1.0_dp,0.0_dp)
    call eigenvalues(matrix,eigs)
    print*, eigs
    read(*,*)
    stop  
    
contains
    
    subroutine eigenvalues(H,eigs)
        complex(dp) :: H(:,:)
        real(dp)    :: eigs(:)
        integer   :: d
        integer   :: info
        integer   :: lwork, liwork
        real(dp), allocatable  :: work(:)
        integer , allocatable  :: iwork(:)
        
        d = size(eigs)
        
        lwork = 4*d
        liwork = 10
        allocate(work(lwork))
        allocate(iwork(liwork))
        
        call zheevr('N','U',d,H,d,eigs,work,-1,iwork,-1,info)
        
        lwork = work(1)
        liwork = iwork(1)
        deallocate(work,iwork)        
        allocate(work(lwork))
        allocate(iwork(liwork))
        
        call zheevr('N','U',d,H,d,eigs,work,lwork,iwork,liwork,info)
        
        deallocate(work,iwork)
        
        if (info /= 0) then
            print*, "diagonalization failed, info = ", info
            read(*,*)
            stop
        end if
                
    end subroutine
end program

but this causes a seg fault as soon as zheevr is called, and now I have absolutely no idea what is going on. 

 

Different LU factorization result between libmkl_intel_ilp64.a and libmkl_intel_lp64.a

$
0
0

I am a novice in the field of Intel Math Kernel Library. However, when I tried to use the LU factorization function from LAPACK by compiling different libraries, I got different results as the following.

My C++ code is as simple as the following.

 

#include <mkl.h>
#include <iostream>
#include <ctime>    // For time()
#include <cstdlib>

int main()
{

    srand(time(0));

    double a[441];
    for (int i = 0; i < 441; i++)
        a[i] = (double) rand() / RAND_MAX;
    int C = 21;

    lapack_int m1 = C;
    lapack_int n1 = C;
    lapack_int lda1 = C;
    lapack_int ipiv[C];
    lapack_int info1 = LAPACKE_dgetrf(LAPACK_COL_MAJOR, m1, n1, a, lda1, ipiv);

    std::cout << "Info1 is "<< info1 << std::endl;

    info1 = LAPACKE_dgetri(LAPACK_COL_MAJOR, m1, a, lda1, ipiv);

    std::cout << "Info1 is "<< info1 << std::endl;
    return 0;
}

I could get no error by linking library "libmkl_intel_lp64.a", but I got a "Segmentation fault (core dumped)" by linking library "libmkl_intel_ilp64.a". I have checked the intel official document which explains the difference between the two libraries. 

There are mainly two difference mentioned in that document:

     1. Support large data arrays (with more than 231-1 elements)

     2.Enable compiling your Fortran code with the -i8 compiler option

I do not understand why I even got error by using libmkl_intel_ilp64.a. What's more, sometimes I also meet an error "tack smashing detected" by linking library "libmkl_intel_lp64.a" but "libmkl_intel_ilp64.a" is OK. Could you please help me figure it out? Thank you so much!

Different BLAS performance between CMake dynamic and manuly static link

$
0
0

I tried to compile my program by using CMake at first and my CMakeLists file is as the following.

 

cmake_minimum_required(VERSION 3.11)
project(LMMNET LANGUAGES CXX)

include(CheckCXXCompilerFlag)
CHECK_CXX_COMPILER_FLAG("-std=c++11" COMPILER_SUPPORTS_CXX11)

set(CMAKE_CXX_FLAGS "-O2 -msse -msse2")
#set(CMAKE_CXX_COMPILER "icpc")

set(Boost_USE_STATIC_LIBS ON)
set(Boost_USE_MULTITHREADED ON)

find_package(Boost 1.58.0 COMPONENTS program_options REQUIRED)
find_package(BLAS REQUIRED)
find_package(LAPACK REQUIRED)

target_link_libraries(dataIO PUBLIC ${Boost_LIBRARIES})
target_link_libraries(DataMatrix PUBLIC DataUtils
                                        ${Boost_LIBRARIES}
                                        ${LAPACK_LIBRARIES})
target_link_libraries(LMMNET PUBLIC dataIO
                                    DataMatrix)

 

I can link the Intel Math Kernel Library correctly and the program works well. However, I find it is interesting that when I link the Intel Math Kernel Library following the official instruction as 

-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_gnu_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -lgomp -lpthread -lm -ldl.

The performance of my program is much better than ever. My program even run faster four times than before. (I just do the matrix-matrix multiplication). Why it comes for that?


How to get Cholesky diagonal?

$
0
0

Hello

I need to extract diagonal from cholesky LDLt sparse factorization. Can I do this with Intel MKL?

I can use pardiso to factor and solve sparse linear system. But cholesky factor matrix itself is unreachable for me.

Thanks

Bug in mkl_sparse_d_mm

$
0
0

Hi All,

I think that the result of "mkl_sparse_d_mm" is incorrect as evident for the following code. The code basically tries to multiply the Identity matrix stored in CSR format with a dense matrix and checks whether the result and the input matrix are the same. This also the case in ROW MAJOR layout as well.

int main() {

   MKL_INT n = 5;
   MKL_INT rows_start[5] = {0,1,2,3,4};
   MKL_INT rows_end[5]   = {1,2,3,4,5};
   MKL_INT col_indx[5]   = {0,1,2,3,4};
   double values[5]     = {1,1,1,1,1};
   sparse_matrix_t       csrA = NULL;
   sparse_index_base_t    indexing;
   struct matrix_descr    descr_type_gen;
   descr_type_gen.type = SPARSE_MATRIX_TYPE_GENERAL;
   mkl_sparse_d_create_csr ( &csrA, SPARSE_INDEX_BASE_ZERO, n, n, rows_start, rows_end, col_indx, values);
   MKL_INT row, col;
   sparse_index_base_t indextype;
   MKL_INT * bi, *ei, *indx;
   double *rv;
   mkl_sparse_d_export_csr(csrA, &indextype, &row, &col, &bi, &ei, &indx, &rv);
   printf("Input csr matrix\n");
   for(long i=0; i < n; i++) {
        printf("%ld ----> ",i);
        for(long l = 0; l < n; l++) {
                bool flag = false;
                for(long j=bi[i]; j < ei[i]; j++) {
                        if(indx[j] == l) {
                                flag = true;
                                printf("%.0lf ",rv[j]);
                                break;
                        }
                }
                if(!flag)
                        printf("%d ",0);

        }       printf("\n");
    }
    double *matrix = (double *) mkl_malloc ( n*2*sizeof(double), ALIGN);
    double *result = (double *) mkl_calloc ( n*2, sizeof(double), ALIGN);
    for(long i=0; i < n; i++) {
        matrix[i] = i+1;
        matrix[i + n] = n-i;
    }
    printf("Input dense matrix\n");
    for(long i=0; i < n; i++) {
        for(long j=0 ; j < 2; j++) {
                printf("%.2lf ", matrix[i + j*n]);
        }
        printf("\n");
    }
    mkl_sparse_d_mm( SPARSE_OPERATION_NON_TRANSPOSE, 1, csrA, descr_type_gen, SPARSE_LAYOUT_COLUMN_MAJOR, matrix, 2, n, 0, result, n);
  printf("Outpur result matrix\n");
  for(long i=0; i < n; i++) {
        for(long j=0 ; j < 2; j++) {
                printf("%.2lf ", result[i + j*n]);
        }
        printf("\n");
  }
  mkl_free(matrix);
  mkl_free(result);

  return 0;
}

The output of the above code is :-

Input csr matrix
0 ----> 1 0 0 0 0
1 ----> 0 1 0 0 0
2 ----> 0 0 1 0 0
3 ----> 0 0 0 1 0
4 ----> 0 0 0 0 1
Input dense matrix
1.00 5.00
2.00 4.00
3.00 3.00
4.00 2.00
5.00 1.00
Output result matrix
1.00 5.00
0.00 0.00
1.00 5.00
0.00 0.00
2.00 4.00

I run the above with the following options:

icc -I. -I/opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/include -DMKL_ILP64 -Wall -O3 -qopenmp -qopenmp-simd -mkl=parallel  -std=c++11 -Wno-attributes mkl_sparse_d_mm.cpp -o mkl_sparse_mm -L/opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64_lin  -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl

and I also tried with a different intel mkl version as follows:

icc -I. -I/opt/intel/compilers_and_libraries_2018.5.274/linux/mkl/include -DMKL_ILP64 -Wall -O3 -qopenmp -qopenmp-simd -mkl=parallel  -std=c++11 -Wno-attributes mkl_sparse_d_mm.cpp -o mkl_sparse_mm -L/opt/intel/compilers_and_libraries_2018.5.274/linux/mkl/lib/intel64_lin  -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl

 

 

Possible issue in MKL LAPACKE interface when using LAPACK_ROW_MAJOR

$
0
0

MKL provided by Intel parallel studio xe 2019.3.

The minimal reproducing code is below:

#include <stdlib.h>
#include <stdio.h>
#include "mkl_lapacke.h"

#define N 2

int main(){
double m[N];
MKL_INT n1=N, n2=N;
MKL_INT info;
double a[N*N] = {0, 1,
                 1, 0};
info = LAPACKE_dsyevd(LAPACK_ROW_MAJOR, 'V','U', n1, a, n2, m);
printf("the eigenvalue is: %f,%f\n", m[0],m[1]);
printf("the first eigenvector is: %f,%f\n",a[0],a[2]);
printf("the second eigenvector is: %f,%f\n",a[1],a[3]);
return 0;
}

The returned eigenvectors is somehow wrong when LAPACK_ROW_MAJOR is specified, however, everything is fine when changing to LAPACK_COL_MAJOR.

The stdout with export MKL_VERBOSE=1 is shown below:

1. COL_MAJOR (correct results)

MKL_VERBOSE Intel(R) MKL 2019.0 Update 3 Product build 20190125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.20GHz lp64 intel_thread
MKL_VERBOSE DSYEVD(V,U,2,0x7ffee9463680,2,0x7ffee94636a0,0x7ffee9463620,-1,0x7ffee9463628,-1,0) 27.85ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:28
MKL_VERBOSE DSYEVD(V,U,2,0x7ffee9463680,2,0x7ffee94636a0,0x22ce000,21,0x22cdf00,13,0) 243.59us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:28
the eigenvalue is: -1.000000,1.000000
the first eigenvector is: -0.707107,0.707107
the second eigenvector is: 0.707107,0.707107

2. ROW_MAJOR (wrong results)

MKL_VERBOSE Intel(R) MKL 2019.0 Update 3 Product build 20190125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.20GHz lp64 intel_thread
MKL_VERBOSE DSYEVD(V,U,2,0x7ffd6a6a3000,2,0x7ffd6a6a3020,0x7ffd6a6a2fa0,-1,0x7ffd6a6a2fa8,-1,0) 27.70ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:28
MKL_VERBOSE DSYEVD(V,U,2,0xd26100,2,0x7ffd6a6a3020,0xd26000,21,0xd25f00,13,0) 247.73us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:28
the eigenvalue is: -1.000000,1.000000
the first eigenvector is: -0.707107,1.000000
the second eigenvector is: 0.707107,0.707107

Pay attention to the difference on the first eigenvector.

Discontinued Xeon Phi support

$
0
0

Hello,

Whether Intel will support Xeon in future versions of compilers(+MKL)?

Malik.

 

 

Questions about iterative sparse solvers

$
0
0

Hello,

I would like to use ISS from MKL for solving linear systems of equations, originating from discretization of partial differential equations used in CFD.

I found out that CG and FGMRES are available, but not everything is clear from the documentation:

  1. Can ISS be threaded? I am using Pardiso and it can be made parallel easily.
  2. Is there any example besides those in \example\solverf\source that would demonstrate how the matrix of the linear system is constructed and used by ISS? I might have overlooked information about this and the matrix format (CSR?) in the documentation.
  3. Is there any example that would show how preconditioning should be done? There is a general description of this in the documentation, but it does not feel sufficient to make sure the implementation is correct, e.g. in case of "apply the preconditioner C inverse to tmp".

By the way, can Pardiso as a direct solver work also in an iterative "mode", meaning that it would not calculate the solution, but just a user-defined number of iterations?

Thank you.

MKL Packed Storage scheme for matrix with many zoroes

$
0
0

I have a very large matrix but many of its members are zero but it does not have an specific shape like being banded or etc.

Is there any possiblity that I can reduce memory usage for storing this matrix for using MKL to solve marix operations?

Currently for a not very large model I need more than 32GB of RAM.

Maintaining 15+ year old software package, MKL issue support

$
0
0

Hello, 

I've taken over an older software package my company wants to add some functionality to. We don't have funds or time to update all aspects of it, so as is, with new modules is the goal. 

I have two interesting situations. 

Running Windows 10

1) The software package runs perfectly fine, with the exception that once it gets to an MKL call it just aborts the entire program. All aspects of the software works, up until a single button press that calls its first MKL call. Now this package is running an older version of the MKL, from 2010 I believe. 

2) So attempting to update the MKl, I installed the latest release. I integrated it into Visual studio 2019, selected "Sequential" use of the intel MKL and finally rebuilt the package. Now when I run the application it runs just fine. I hit the button that previously would abort the program and it operates fine. Now, here is the issue. It runs fine only when I run it within the IDE. If I run the executable directly, outside of the IDE, then it shuts down at the same place it does in the first situation. 

setup

1) in this situation I have the following configuration to reference the MKL

  • Environmental variable PATH pointing to where the follow dll's are located
    • mkl_core
    • mkl_intel_thread
    • mkl_sequential
  • visual studio configuration
    • C/C++
      • General
        • Additional Libraries
          • pointing at the \mkl\include directory
    • Linker
      • General
        • Additional Libraries
          • point at the \mkl\lib\intel64 directory
        • Additional Dependencies
          • mkl_intel_lp64_dll.lib
          • mkl_intel_thread_dll.lib
          • mkl_core_dll.lib

2) In this second situation, I have all of the above. I attempted to peel away those settings but it fails to compile without them. Only difference is

  • configuration Properties
    • Intel Performance Library
      • Use Intel MKL                 Sequential

I know this is probably very limited information, but I've been banging my head against a wall for some time now. 


Sparse Complex Matrix Multiplication

$
0
0

Hello,

 

Let's say A and B are sparse matrices with complex values, which are stored in sparse handles (CSR format). I want to calculate D = A x conjugate(B), where conjugate is the complex conjugate of matrix B entries. Is there an efficient way to perform this multiplication? Right now, I have to export the sparse handle that contains matrix B to a CSR 3-array format, change the values to complex conjugate, create the handle again, and then perform the multiplication. I really appreciate if you can provide a more efficient solution.

 

Thank you,

Afshin

Error in one LAPACK example

$
0
0

Hi, 

I've located an error in one of your FORTRAN 77 interface to the LAPACK Single Value Decomposition routines (DGESVD). This is located in https://software.intel.com/sites/products/documentation/doclib/mkl_sa/11... 

The following two lines must be added. (1) in variable declaration and (2) just before calling DGESVD. 

(1)

DOUBLE PRECISION Asvd( size(A,1), size(A,2) )

(2)

Asvd = A

 

 

 

Best wishes,

Mohamed Ali Al-Badri

How to do an element-wise multiplication between two 3D-arrays?

$
0
0

Hi, everyone.

I am trying to do some element-wise multiplication between two 3D arrays/matrices and then calculate the sum of all the elements returned. Something like this:

res = 0
do k = kstart,kend
do j = jstart, jend
do i = istart, iend
res = res + A(i, j, k) * B(i, j ,k)
end do 
end do
end do

However, such multiplication is not applied to all the elements in A or B but the elements that were involved between the iteration variables. To make the calculation faster, I tried the ddot function in MKL like this:

n = (kend - kstart + 1) * (jend - jstart + 1) * (iend - istart + 1)
res = ddot(n, A(istart:iend, jstart:jend, kstart:kend), 1, B(istart:iend, jstart:jend, kstart:kend), 1)

But in this way I couldn't get the same result as I did by using three do-loops.

Anyone who might tell me where the problem is?

Thanks :)

 

MKL_SPARSE_?_QR invalid input error (Fortran)

$
0
0

Hello,

I am experimenting the Sparse QR library to solve a linear system of equations. When I run the attached file, I get the following output:

0

0

0

3

The last output "3" indicates "SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value." 

Can anyone help?

AttachmentSize
Downloadapplication/octet-streamtest_sparse_qr.f901.2 KB

Pardiso 32-bit vs 64-bit performance

$
0
0

Using Pardiso in an iterative routine, where the Pardiso solver is called many times, I noticed the 64-bit version is significantly slower than the 32-bit version. I'm not sure yet if it depends by Pardiso (only) or by something else, then I was wondering if do such a performance comparison exist?

Is there any way to investigate if the speed problem could concern the 64-bit version of Pardiso and if it is possible to improve its performance?

Thanks in advance,

Daniele

Viewing all 2652 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>