Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 2652 articles
Browse latest View live

Deprecated Sparse BLAS Level 2 and Level 3 Routines

$
0
0

Hello,

I am currently using Intel MKL 2018 Update 3 and it seems that the Sparse BLAS Level 2 and Level 3 routines have been deprecated. I used to use the mkl_?dnscsr function to convert a dense matrix to sparse format. However, I cannot find the corresponding routine from the Intel MKL Inspector-executor Sparse BLAS as mentioned in the reference manual. Can you please help? Thanks much.

Afshin


Iterative Sparse Solvers based on Reverse Communication Interface

$
0
0

Dear Sir/Madam,

I am grateful for the development of Intel Visual Fortran Compiler for Windows. It helps my work a lot.

I would like to ask about RCI ISS in the newest Intel MKL solver. I know that in Intel MKL, there is an RCI conjugate gradient solver.  

Hence, does it work well with EBE-PCG (element by element preconditioner conjugate gradient) method or can I build my code for EBE-PCG method using Intel MKL solver? 

Thank you very much!

Tien-Dat

LinX after MKL 11.2.2.010 have error!

$
0
0

Hi)

LinX after MKL 11.2.2.010 have error!

w_lpk_p_11.2.2.010.zip
 
c:\test>linpack_xeon64.exe
Input data or print help ? Type [data]/help :
 
Number of equations to solve (problem size): 8135
Leading dimension of array: 8136
Number of trials to run: 10
Data alignment value (in Kbytes): 4
Current date/time: Sun Aug 05 19:59:41 2018
 
CPU frequency:    3.399 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 12
 
Parameters are set to:
 
Number of tests: 1
Number of equations to solve (problem size) : 8135
Leading dimension of array                  : 8136
Number of trials to run                     : 10
Data alignment value (in Kbytes)            : 4
 
Maximum memory requested that can be used=1066524512, at the size=8135
 
=================== Timing linear equation system solver ===================
 
Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
8135   8136   4    2.411      148.9028 4.503162e-011 2.387924e-002   pass
8135   8136   4    2.378      150.9828 4.503162e-011 2.387924e-002   pass
8135   8136   4    2.262      158.7518 4.503162e-011 2.387924e-002   pass
8135   8136   4    2.329      154.1468 4.503162e-011 2.387924e-002   pass
8135   8136   4    2.496      143.8229 4.503162e-011 2.387924e-002   pass
8135   8136   4    2.258      159.0080 4.503162e-011 2.387924e-002   pass
8135   8136   4    2.358      152.2624 4.503162e-011 2.387924e-002   pass
8135   8136   4    2.323      154.5865 4.503162e-011 2.387924e-002   pass
8135   8136   4    2.427      147.9225 4.503162e-011 2.387924e-002   pass
8135   8136   4    2.346      153.0549 4.503162e-011 2.387924e-002   pass
 
Performance Summary (GFlops)
 
Size   LDA    Align.  Average  Maximal
8135   8136   4     152.3442 159.0080
 
Residual checks PASSED
 
End of tests
  
c:\test>
w_mklb_p_11.1.3.005.zip
 
c:\test>linpack_xeon64.exe
Input data or print help ? Type [data]/help :
 
Number of equations to solve (problem size): 8135
Leading dimension of array: 8136
Number of trials to run: 10
Data alignment value (in Kbytes): 4
Current date/time: Sun Aug 05 19:56:41 2018
 
CPU frequency:    3.399 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 6
 
Parameters are set to:
 
Number of tests: 1
Number of equations to solve (problem size) : 8135
Leading dimension of array                  : 8136
Number of trials to run                     : 10
Data alignment value (in Kbytes)            : 4
 
Maximum memory requested that can be used=529719136, at the size=8135
 
=================== Timing linear equation system solver ===================
 
Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
8135   8136   4     2.225      161.3400 7.189260e-011 3.812300e-002   pass
8135   8136   4     2.094      171.4791 7.189260e-011 3.812300e-002   pass
8135   8136   4     1.935      185.5586 7.189260e-011 3.812300e-002   pass
8135   8136   4     2.024      177.3962 7.189260e-011 3.812300e-002   pass
8135   8136   4     2.230      160.9874 7.189260e-011 3.812300e-002   pass
8135   8136   4     2.053      174.8977 7.189260e-011 3.812300e-002   pass
8135   8136   4     2.233      160.7589 7.189260e-011 3.812300e-002   pass
8135   8136   4     1.717      209.0689 7.189260e-011 3.812300e-002   pass
8135   8136   4     2.127      168.7899 7.189260e-011 3.812300e-002   pass
8135   8136   4     2.128      168.7352 7.189260e-011 3.812300e-002   pass
 
Performance Summary (GFlops)
 
Size   LDA    Align.  Average  Maximal
8135   8136   4      173.9012 209.0689
 
Residual checks PASSED
 
End of tests
 
c:\test>
w_mklb_p_2018.2.010.zip
 
c:\test>linpack_xeon64.exe
Input data or print help ? Type [data]/help :
 
Number of equations to solve (problem size): 8135
Leading dimension of array: 8136
Number of trials to run: 10
Data alignment value (in Kbytes): 4
Current date/time: Sun Aug 05 20:09:57 2018
 
CPU frequency:    3.339 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 6
 
Parameters are set to:
 
Number of tests: 1
Number of equations to solve (problem size) : 8135
Leading dimension of array                  : 8136
Number of trials to run                     : 10
Data alignment value (in Kbytes)            : 4
 
Maximum memory requested that can be used=529719136, at the size=8135
 
=================== Timing linear equation system solver ===================
 
Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
8135   8136   4     2.231      160.9364 6.790949e-011 3.601085e-002   pass
8135   8136   4     2.084      172.3152 6.790949e-011 3.601085e-002   pass
8135   8136   4     2.117      169.6332 6.790949e-011 3.601085e-002   pass
8135   8136   4     2.132      168.3802 6.790949e-011 3.601085e-002   pass
8135   8136   4     2.069      173.5236 6.790949e-011 3.601085e-002   pass
8135   8136   4     2.000      179.5242 6.790949e-011 3.601085e-002   pass
8135   8136   4     2.109      170.2193 6.790949e-011 3.601085e-002   pass
8135   8136   4     2.018      177.8865 6.790949e-011 3.601085e-002   pass
8135   8136   4     1.964      182.8301 6.790949e-011 3.601085e-002   pass
8135   8136   4     2.050      175.0993 6.790949e-011 3.601085e-002   pass
 
Performance Summary (GFlops)
 
Size   LDA    Align.  Average  Maximal
8135   8136   4      173.0348 182.8301
 
Residual checks PASSED
 
End of tests
 

If run w_mklb_p_2018.2.010.zip as 12 Thread or as 5, 9,10, 11,12 Threads, in any shell  will be this:

Intel(R) LINPACK 64-bit data - LinX 0.6.5

Current date/time: Sat Aug 18 00:36:31 2018

CPU frequency:    3.398 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 12

Parameters are set to:

Number of tests: 1

Number of equations to solve (problem size) : 8135
Leading dimension of array                  : 8136
Number of trials to run                     : 10  
Data alignment value (in Kbytes)            : 4   
Maximum memory requested that can be used=529657696, at the size=8135

=================== Timing linear equation system solver ===================

Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
8135   8136   4      1.956      183.5389 6.526518e-011 3.460863e-002   pass
8135   8136   4      2.187      164.2005 6.619277e-011 3.510051e-002   pass
8135   8136   4      2.872      125.0139 6.526518e-011 3.460863e-002   pass
8135   8136   4      2.100      170.9365 5.353940e-011 2.839072e-002   pass
8135   8136   4      2.231      160.9549 7.749093e-011 4.109167e-002   pass
8135   8136   4      2.702      132.8717 6.938300e-011 3.679222e-002   pass
8135   8136   4      2.145      167.3753 6.537959e-011 3.466930e-002   pass
8135   8136   4      2.661      134.9388 6.526518e-011 3.460863e-002   pass
8135   8136   4      2.043      175.7050 6.526518e-011 3.460863e-002   pass
8135   8136   4      2.353      152.5804 6.553075e-011 3.474946e-002   pass

Performance Summary (GFlops)

Size   LDA    Align.  Average  Maximal
8135   8136   4       156.8116 183.5389

Residual checks PASSED

End of tests

Please, fix it!

In first I random , pointed out 524288 as data aligment

How to get sparse array in CSR format from a matrix handle?

$
0
0

Hello,

I'm using Sparse BLAS routines in MKL.

For matrix-matrix multiplications using the mkl_sparse_spmm routine, I created matrix handles, and then execute the routine.

But, how can I get the result in the original CSR format? I mean, the mkl_sparse_spmm gives the result in the form of matrix handle (pointer),

so I need a further step to get the final result in CSR format.

Thanks.

A runtime error in FEAST zfeast_hcsrgv

$
0
0

Hello,

 

I would like to report a runtime error in feast zfeast_hcsrgv routine.

 

I have utlized the FEAST solver to solve generalized Hermitian eigenvalue problems using the zfeast_hcsrgv routine, but encountered an access violation during running the codes.

 

I couldn't find any problem in my input data.

 

An example code is enclosed, and both of two input files, "read_mat_info_1" and "read_mat_info_2", result in runtime errors during calling zfeast_hcsrgv.

 

Thanks.

Problem while solving Ax=B with Pardiso

$
0
0

I have a problem with solving system of linear equations using Pardiso. I have application in which one of the processes loads dll where system of linear equations Ax=B is solved using Pardiso. There are two cases:

 

- First case - Application is started. Matrix A and B are populated with some values, and after that matrix A is factorized so that equation Ax=B can be solved. After resolving, I get solution x and write it to the file. Values in x are very suspicious, there is a lot of values which are larger than 10^4. They will be used later for some calculation in iterations.

 

- Second case - After execution in first case, I kill the process (in Task Manager) which loads dll (in which equations are solved), and process is restarted and the same dll is loaded. After executing my application again, same equation is solved, but x is not the same (and this new values seem better). I don't get it why solution is not the same in both cases when everything about matrix is the same?

 

I have logged all the matrices and vectors (and attached them). What can be seen is that when I compare A, x and B there is a difference in x, and A and B are the same in both cases. How can solver for two same inputs give two different solutions? I don't now what killing the process and loading dll did, but after that x is not same when equation is solved. Maybe there is some problem with memory or something similar. 

 

Matrix A is sparse matrix and is indefinite. Matrix type is -2 and Pardiso call is as following:

 

call pardiso(matrix%handle, 1, 1, & !handle
   matrix%matrixType,      & !Matrix type
   33,                                & !3 = solve
   matrix%numRows,         & !Dimension
   matrix%values,                     & !Values
   matrix%firstInRow,      & !Row cummulative
   matrix%columns,         & !Column indices
   [0],                               & !permutation (none)
   1,                                 & !Number of RHS equations (1)
   opt,                               & !Options (default)
   0,                                 & !Message level (none)
   rhs,                               & !RHS
   solution,                          & !LHS
   err)

 

Does anyone have idea what happened or how could I try to debug the problem?

AttachmentSize
Downloadapplication/x-7z-compressedMatrices.7z13.88 KB

ifort with mkl and function mkl_sparse_d_mv

$
0
0

 

I compile my program and it gives the following error:

forrtl: severe (174): SIGSEGV, segmentation fault occurred

I isolated the problem and it is occurring at the call of the mkl_sparse_d_mv function of the following function:

function dVdt(self, t, V)
        class(MotorUnitPool), intent(inout) :: self
        real(wp), intent(in) :: t
        real(wp), intent(in) :: V(:)
        real(wp), dimension(self%totalNumberOfCompartments) :: dVdt
        integer :: i, j, stat
        real(wp), dimension(:), allocatable :: matInt

        allocate(matInt(self%totalNumberOfCompartments))
              
        do i = 1, self%MUnumber
            do j = 1, self%unit(i)%compNumber
                self%iIonic((i-1)*self%unit(i)%compNumber+j) = self%unit(i)%Compartments(j)%computeCurrent(t, V((i-1)*self%unit(i)%compNumber+j))
            end do
        end do       
        
        stat = mkl_sparse_d_mv(self%spOperation, &
                                                self%spAlpha, &
                                                self%GSp, &
                                                self%spDescr, &
                                                V, &
                                                self%spBeta, &
                                                matInt)
        
        dVdt = (self%iIonic + matInt + self%iInjected + self%EqCurrent_nA) * self%capacitanceInv       
end function

I compile with gfortran without any problems. As compiler options to the ifort compiler i tried to use the option -heap_arrays, whith no success.

This function is part of a bigger program, with some files. If you want to see the whole software, you can find it in https://github.com/rnwatanabe/projectFR

 

Thanks in advance,

 

Renato Watanabe

 

 

 

Usage of MKL Scalapack

$
0
0

Hi all,

I would like to use PZGESVD of SCALAPACK to solve my problem distributedly. Before using PZGESVD, the matrix should be distributed to all the processes involved. Therefore, I want to test the usage of PDGEMR2D in advance.

However, I have some problem in using PDGEMR2D. It generated an error from which I cannot figure the cause.

The error is:

??MR2D:Bad submatrix:i=-1,j=-1,m=50,n=10,M=50,N=10
??MR2D:Bad submatrix:i=-1,j=-1,m=50,n=10,M=50,N=10
??MR2D:Bad submatrix:i=-1,j=-1,m=50,n=10,M=50,N=10
??MR2D:Bad submatrix:i=-1,j=-1,m=50,n=10,M=50,N=10
??MR2D:Bad submatrix:i=-1,j=-1,m=50,n=10,M=50,N=10
??MR2D:Bad submatrix:i=-1,j=-1,m=50,n=10,M=50,N=10
??MR2D:Bad submatrix:i=-1,j=-1,m=50,n=10,M=50,N=10
??MR2D:Bad submatrix:i=-1,j=-1,m=50,n=10,M=50,N=10
??MR2D:Bad submatrix:i=-1,j=-1,m=50,n=10,M=50,N=10
??MR2D:Bad submatrix:i=-1,j=-1,m=50,n=10,M=50,N=10
Assertion failed in c:\bt\479\private\mpich2\src\pm\smpd\smpd_handle_command.cpp(640): proc != 0
unable to read the cmd header on the left child context, Other MPI error, error stack:
ReadFailed(1298): An existing connection was forcibly closed by the remote host.  (errno 10054).

Aborting: mpiexec on TEMFPC1005 failed to communicate with smpd on TEMFPC1005
Other MPI error, error stack:
ReadFailed(1298): An existing connection was forcibly closed by the remote host.  (errno 10054)

 

 

The Code is:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <iostream>
#include <iomanip>
#include <string>
#include <fstream>
#include <sstream>
#include <complex>
#include <algorithm>
#include <vector>
//#define MKL_Complex8 std::complex<float>
//#define MKL_Complex16 std::complex<double>
#include <mkl_blacs.h>
#include <mkl_scalapack.h>
#include <mkl_pblas.h>
#include <mkl.h>
#include "petsc.h"

#define MAX(a,b)((a)<(b)?(b):(a))
#define MIN(a,b)((a)>(b)?(b):(a))

using namespace std;
static char help[] = "Test SCALAPACK";
int main(int argc, char **argv)
{

    PetscErrorCode            ierr;
    const MKL_INT            intmkl_negone = -1, intmkl_zero = 0;
    MKL_INT                    intmkl_rank, intmkl_size, intmkl_info, intmkl_ctxt, intmkl_nProcRows, intmkl_nProcCols, intmkl_myRow, intmkl_myCol;
    MKL_INT                    intmkl_MA, intmkl_NA, intmkl_MBA, intmkl_NBA, intmkl_lldA, intmkl_nRowProc, intmkl_nColProc;
    MKL_INT                    intmkl_MB, intmkl_NB, intmkl_MBB, intmkl_NBB, intmkl_lldB;
    MKL_INT                    descA[9], descB[9];
    PetscInt                int_size, int_rank, int_numLocalRowMatA, int_numGlobalRowMatA, int_numGlobalColMatA;
    Mat                        mat_A;
    double                    *doubleAr_A, *doubleAr_B;
    
    /* MPI initialization */
    ierr = PetscInitialize(&argc, &argv, (char*)0, help);                                                                                            CHKERRQ(ierr);
    MPI_Comm_size(PETSC_COMM_WORLD, &int_size);
    MPI_Comm_rank(PETSC_COMM_WORLD, &int_rank);

    /* Generate a random matrix */
    int_numLocalRowMatA = 5;
    int_numGlobalRowMatA = int_numLocalRowMatA*int_size;
    int_numGlobalColMatA = 10;

    /* initialize blacs */
    BLACS_PINFO(&intmkl_rank, &intmkl_size);
    intmkl_nProcRows = intmkl_size;                
    intmkl_nProcCols = 1;
    BLACS_GET(&intmkl_negone, &intmkl_zero, &intmkl_ctxt);
    BLACS_GRIDINIT(&intmkl_ctxt, "C", &intmkl_nProcRows, &intmkl_nProcCols);
    BLACS_GRIDINFO(&intmkl_ctxt, &intmkl_nProcRows, &intmkl_nProcCols, &intmkl_myRow, &intmkl_myCol);

    /* compute precise length of local pieces and allocate array on each process for parts of distributed matrices */
    intmkl_MA = (MKL_INT)int_numGlobalRowMatA;
    intmkl_NA = (MKL_INT)int_numGlobalColMatA;
    intmkl_MBA = (MKL_INT)int_numLocalRowMatA;
    intmkl_NBA = (MKL_INT)int_numGlobalColMatA;
    intmkl_nRowProc = NUMROC(&intmkl_MA, &intmkl_MBA, &intmkl_myRow, &intmkl_zero, &intmkl_nProcRows);
    intmkl_nColProc = NUMROC(&intmkl_NA, &intmkl_NBA, &intmkl_myCol, &intmkl_zero, &intmkl_nProcCols);
    intmkl_lldA = MAX(1, intmkl_nRowProc);
    DESCINIT(descA, &intmkl_MA, &intmkl_NA, &intmkl_MBA, &intmkl_NBA, &intmkl_zero, &intmkl_zero, &intmkl_ctxt, &intmkl_lldA, &intmkl_info);
    std::cout << "MA = "<< intmkl_MA << "; NA = "<< intmkl_NA << "; intmkl_nRowProc = "<< intmkl_nRowProc << "; intmkl_nColProc = "<< intmkl_nColProc << "; lldA = "<< intmkl_lldA << std::endl;
    doubleAr_A = (double*)mkl_calloc(intmkl_nRowProc*intmkl_nColProc, sizeof(double), 64);
    for (int int_cnt1 = 0; int_cnt1 < intmkl_nRowProc*intmkl_nColProc; int_cnt1++)  doubleAr_A[int_cnt1] = 1.0;
    

    /* compute precise length of local pieces and allocate array on each process for parts of distributed matrices */
    intmkl_MB = (MKL_INT)int_numGlobalRowMatA;
    intmkl_NB = (MKL_INT)int_numGlobalColMatA;
    intmkl_MBB = (MKL_INT)int_numLocalRowMatA;
    intmkl_NBB = (MKL_INT)int_numGlobalColMatA;
    intmkl_nRowProc = NUMROC(&intmkl_MB, &intmkl_MBB, &intmkl_myRow, &intmkl_zero, &intmkl_nProcRows);
    intmkl_nColProc = NUMROC(&intmkl_NB, &intmkl_NBB, &intmkl_myCol, &intmkl_zero, &intmkl_nProcCols);
    intmkl_lldB = MAX(1, intmkl_nRowProc);
    DESCINIT(descB, &intmkl_MB, &intmkl_NB, &intmkl_MBB, &intmkl_NBB, &intmkl_zero, &intmkl_zero, &intmkl_ctxt, &intmkl_lldB, &intmkl_info);
    std::cout << "MB = "<< intmkl_MB << "; NB = "<< intmkl_NB << "; intmkl_nRowProc = "<< intmkl_nRowProc << "; intmkl_nColProc = "<< intmkl_nColProc << "; lldB = "<< intmkl_lldB << std::endl;
    doubleAr_B = (double*)mkl_calloc(intmkl_nRowProc*intmkl_nColProc, sizeof(double), 64);

    /* copy value from matrix A to matrix B */
    PDGEMR2D(&intmkl_MA, &intmkl_NA, doubleAr_A, &intmkl_zero, &intmkl_zero, descA, doubleAr_B, &intmkl_zero, &intmkl_zero, descB, &intmkl_ctxt);

    /* destroy variables */
    mkl_free(doubleAr_A);
    mkl_free(doubleAr_B);
    /* finalize blacs */
    BLACS_GRIDEXIT(&intmkl_ctxt);
    /* finalize petsc*/
    PetscFinalize();

}

 

I would appreciate if you can have any advice or comment on solving this problem.

Thank a lot,

 

 


Big Performance Problem with PARDISO 2018 Update 3

$
0
0

Hello folks,

I've a strange performance problem with PARDISO on Windows. Before I open a support call I'll hope to get some feedback in this forum.

I'm using Intel® Parallel Studio XE 2018 Update 3 Composer Edition for Fortran Windows*, Version 18.0.0040.

I have noticed that parallel processing in PARDISO in MKL version 2018.0.3 does not work at all and processing with only one thread is significantly slower than in version 2016.

Attached I've a small C++ test program and sample data to solve a small system multiple time.

When I run the program using the MKL DLLs from version 2018.0.3 I get following result:

>Release\pardiso.exe _data\mat.mm _data\b.mm
Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for 32-bit applications
Solving matrix file _data\mat.mm with vector data _data\b.mm.
Data: rows=445, cols=445, values=1339
MKL threads: 6
Performance: Loops=10000, Time=2.785514 sec

And now the funny stuff starts. The same program executed with MKL DLLs from version 2016 (11.3.3) create the following result:

>Release\pardiso.exe _data\mat.mm _data\b.mm
Intel(R) Math Kernel Library Version 11.3.3 Product Build 20160413 for 32-bit applications
Solving matrix file _data\mat.mm with vector data _data\b.mm.
Data: rows=445, cols=445, values=1339
MKL threads: 6
Performance: Loops=10000, Time=1.171534 sec

And it's gonna get worse. The new PARDISO version 2018.0.3 uses a big amount of CPU time for multiple threads but it is slower compared with execution with only one single thread!

According to my understanding I've configured all stuff correct. And as it can be seen, using the old MKL stuff from 2016 it works fine.

 

AttachmentSize
Downloadapplication/zipsample.zip28.61 KB

How do I attach MKL routines from Visual studio ?

$
0
0

I was trying to use DGESV to solve a set of linear equations, an MKL routine.

but it tells me its an unknown entry point.

Steve mentioned using a LINKER option, but I don't see any way to do that from Visual Studio.

I looked thru all the drop down menus.

at least if its there, it is sure deeply buried somewhere.

No problem using the routine, if I can attach it somehow.

 

any clues ?

Can't compute 1D discrete sine transforms of size >=INT_MAX

$
0
0

n=INT_MAX: Intel MKL TRIG TRANSFORMS ERROR: Fatal error (error message=Intel MKL DFTI ERROR: Inconsistent configuration parameters

n=INT_MAX-1: Finishes fine

n=INT_MAX+1: Segmentation fault

I am using ILP64. Tracking through the FFTW interface, I I know the segfault occurs in s_init_trig_transform(). 

Compile line:

icc transform.cpp -std=c++11 -O3 -DMKL_ILP64 -xcore-avx2 -I${MKLROOT}/include/fftw -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl -qopenmp

 

Minimal reproducible example:

#include <fftw3.h>
#include <fftw3_mkl.h>
#include <random>
#include <omp.h>

int main()
{
  MKL_INT n = 2147483647; //INT_MAX                                                                                    

  fftwf_init_threads();

  float* inout = (float*)fftwf_malloc(sizeof(float)*n);

  //Initialize inout array with random values                                                                          
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_real_distribution<float> dist(0.0f, 1.0f);
  #pragma omp parallel for
  for (size_t i=0; i<n; ++i) {
    inout[i] =  dist(gen);
  }

  //FFTW Guru parameters                                                                                               
  const fftwf_r2r_kind kind = FFTW_RODFT00;
  int rank=1;
  int howmany_rank=0; //Not used by MKL                                                                                
  fftwf_iodim64* howmany_dims; //Not used by MKL                                                                       
  fftwf_iodim64* dims = (fftwf_iodim64*)fftwf_malloc(rank*sizeof(fftwf_iodim64));
  dims[0].n=n;
  dims[0].is=1;
  dims[0].os=1;

  fftwf_plan_with_nthreads(omp_get_max_threads());

  //in-place real-to-real sine transform                                                                               
  fftwf_plan p = fftwf_plan_guru64_r2r(rank, dims, howmany_rank, howmany_dims, inout, inout, &kind, FFTW_MEASURE);

  fftwf_execute(p);

  fftwf_destroy_plan(p);
  fftwf_free(inout);
  fftwf_cleanup_threads();

  return 0;
}

Stack overflow in Pardiso solver Error

$
0
0

Dear all,

I would like to ask for your help to solve this error when I used Pardiso for my Finite element analysis.

I am using Intel® Parallel Studio XE 2017 update 6 for Windows with Intel Fortran compiler integrated in Visual Studio 2015. My laptop has 8 Gb RAM, 4 processors 2.0 GHz. 

I do not know why when I solved the equation Ax=b where A size [90,000 x 90,000] with the setting of Pardiso solver as the attached picture 3, there was an error relating to stack overflow as the attached pictures 1&2. When running only 45% of RAM memory was used.

Please help me find the reason and the way to fix this problem. 

Thank you very much.

 

AttachmentSize
Downloadimage/jpege.JPG136.82 KB
Downloadimage/jpegerror2.JPG218.27 KB
Downloadimage/jpegerror3.JPG52.85 KB

Mac High sierra zdotu test not working

$
0
0

Hi,

I am struggling with the recompilation of R using the Intel MKL. I have identified the problem to the following:

gfortran -c testf.f 
gcc -c test.c
gcc -L${MKL_LIB_PATH} -o test test.o testf.o -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential  

When running the program, I get a sigfault 11 error message.

I tried debugging with gdb but couldn't really pinpoint the problem. It seems all the good libraries are being used (otool -L) points to the MKL libraries...

I am running out of options...

 

thanks,

 

Bernd

AttachmentSize
Downloadtext/x-csrctest.c259 bytes
Downloadapplication/octet-streamtestf.f520 bytes

Still cannot attach MKL library from Visual Studio

$
0
0

I went back to an earlier version (VS 2010) which does the Fortran source code routines fine -

 

But I still cannot make it attach an entry point routine in the MKL library.

Steve mentioned an option in the LINKER (project properties) to tell it to attach the MKL library,

but for some reason, that option is not available (at least not to me), I cant find it anywhere.

are you saying I should give up on Visual Studio entirely ?

 

I was kinda commited to using that for program development and debugging up till now - - - -

 

 

Direct Sparse Solver for Clusters scaling with OpenMP threads

$
0
0

I am trying to determine whether the Intel Direct Sparse Solver for Clusters is a good parallel solver for our application. I have implemented the sparse solver in Fortran to solve a linear FEA problem. For the call to the sparse solver, I am seeing speed-up with increasing number of MPI processes, but not seeing good speed-up with increasing number of threads per MPI process. 

In this case, the A matrix is generated from a finite-difference-type grid in distributed format (DCSR). Node ordering is such that distributing the matrix results in gaps in the sparse matrix storage of each part, similar to the example given here: https://software.intel.com/en-us/articles/intel-math-kernel-library-parallel-direct-sparse-solver-for-clusters

This case is a linear time-domain problem so we factorize the matrix once and then solve it many times with evolving boundary conditions. Should I expect to see good scaling over MPI processes and OpenMP threads in the solve phase of the direct sparse solver for clusters?

I have benchmarked a 10 million DOF model on a linux cluster with the number of MPI processes ranging from 2 to 128, with 1 process per hardware node, and the number of OpenMP threads per node ranging from 2 to 16. I see speed-up in increasing the number of MPI processes up to about 32 processes, but very little improvement in using more OpenMP threads. I see about the same speed-up on 8 or 16 threads as I see on 2 threads.

I am using the following iparm variables:

iparm(1) = 1 !no default values

iparm(2) = 2

iparm(10) = 8

iparm(18) = -1

iparm(27) = 0

iparm(28) = 1

iparm(40) = 2

iparm(41) = ibegin

iparm(42) = iend

iparm(60) = 1

 

Is the Direct Sparse Solver for Clusters suitable here and what issues should I look at to try to improve scaling with number of OpenMP threads? Thanks for your help.


linking to mkl pardiso on windows using pgfortran

$
0
0

hello, 

i am new to computing on windows operating systems. i have pgfortran 18.4 installed and MKL 2018.3.210 installed, including the mkl_pgi_thread library. 

i am trying to statically link to the libraries needed to use PARDISO_D_64 (mkl_intel_ilp64.lib mkl_pgi_thread.lib mkl_core.lib) but the best outcome i can get is: 

pgfortran -i8 -pgf90libs -mp -o GEMINI GEMINI.obj MD.obj MR.obj M1.obj M2.obj MW.obj -I"C:\IntelMKL\compilers_and_libraries_2018.3.210\windows\mkl\include" -LC:\Inte 
lMKL\compilers_and_libraries_2018.3.210\windows\mkl\lib\intel64_win\mkl_intel_ilp64.lib -LC:\IntelMKL\compilers_and_libraries_2018.3.210\windows\mkl\lib\intel64_win\mk 
l_pgi_thread.lib -LC:\IntelMKL\compilers_and_libraries_2018.3.210\windows\mkl\lib\intel64_win\mkl_core.lib -LC:\IntelMKL\compilers_and_libraries_2018.3.210\windows\com 
piler\lib\intel64_win 
GEMINI.obj : error LNK2019: unresolved external symbol mkl_set_num_threads_ referenced in function MAIN_ 
GEMINI.obj : error LNK2019: unresolved external symbol mkl_set_dynamic_ referenced in function MAIN_
M1.obj : error LNK2019: unresolved external symbol pardiso_d_64_ referenced in function m1_s1_ 
GEMINI.exe : fatal error LNK1120: 3 unresolved externals 

the main problem, i hope, is that i just don't know how to link to multiple libraries using pgfortran on windows. i previously compiled and executed this same code, without issue, on linux mint using gfortran.

i MKL link line advisor also states that MKL support in windows with pgi fortran is "limited."  what does that mean?  could it be that i just cannot use pgfortran to link to the pardiso routines?  or, could it be that i need to add the "mkl service" to my code, which i found to not be necessary when i was working on linux with gfortran.

Installing MKL on MUSA IS with no NIC (no MAC address)

$
0
0

I've generated and received a license file for my downloaded free version of the Math Kernel Library, and I understand I can install it on other computers for my use only on other machines.

I have a work requirement (actually the primary reason for obtaining the MKL) to install the software for development in a closed area, upon a Windows 7 machine that is completely non-networked for security reasons. It does not have a Network Interface Card (NIC), and therefore has no MAC address ("Host ID").

When I attempt to install MKL I get the following error: "Package signature verification failed. Click Help for details."

Clicking "Help" attempts to open a web page, which of course cannot connect to anything.

Is it possible to do what I am trying to do?

Thanks

HPL Linpack with Intel MKL -- Single Node, Two Processors

$
0
0

Hi everyone,

 

I've compiled HPL with Intel's MPI and Intel MKL but I'm confused about the best way to run it on a single node with two processors (12 cores each). Should I be using mpirun even though I only have a single node? If I try to set my PxQ grid to be 24 then I get an error saying that there are not enough processes. This is alleviated when I use mpirun. But I'm worried about communication overhead. Is it really necessary to use MPI when I've got a single node but with two processors? One of the other things I was confused about was what I should set the mkl option as when linking the MKL Library (cluster vs parallel). I've chosen parallel for the time being.

Any thoughts?

Thanks for Reading

 

 

Storing method for the coefficient matrix for the Preconditioned GMRES

$
0
0

Hi All;

Quick Question; How can i use GMRES for a preconditioned very large linear system without storing the entire coefficient matrix?. As i understand; GMRES asks for the RHS vector, the entire coefficient matrix and  the Preconditioner. But for the memory limitation and for the compuational cost i will not be able to assemble and store the entire coefficient matrix. Any thought is very appreciated!.

MKL for Deep Learning?

$
0
0

Hello together,

I am a PhD student researching in the area of parallel programming. In my next research paper, I aim to present some high-performance (OpenCL) implementations for the Basic Linear Algebra Subroutines (BLAS) -- especially for the matrix multiplication routine GEMM -- on matrix sizes as used in the area of deep learning; my targeted hardware is Intel Xeon CPU. To strengthen my evaluation, I want to compare to the fastest state-of-the-art implementation for BLAS that targets Intel Xeon CPU.

My question is: Which is the currently fastest BLAS implementation for Intel Xeon CPU on matrix sizes as used in deep learning -- the Intel Math Kernel Library (MKL)?

Many thanks in advance.

Best,
Richard

Viewing all 2652 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>