FEAST Mathematica

September 16, 2015, 3:46 pm

Latest and popular articles on Intel Technologies

≫ Next: A strange result of sparse matrix addition with mol_sparse_s_add

≪ Previous: different results for LAPACKE_dgels between MKL and LAPACKE

Hi,

I am running the FEAST algorithm on Mathematica/10.0. I have sparse matrices of dimension around 500,000, looking for around 50 eigenvalues. I have noticed the runtime depends massively on what the entries are.

For instance, if I have two sparse matrices with precisely the same non-zero entries, I just modify them slightly, the runtime can slow down 5x or more. The specified intervals are the same and the results return very nearly the same number of eigenvalues.

Is this a known issue with FEAST?

Thanks,

Stephen

↧

A strange result of sparse matrix addition with mol_sparse_s_add

September 16, 2015, 11:51 pm

Latest and popular articles on Intel Technologies

≫ Next: What does the means of struct matrix_descr in mol_sparse_XXX routines?

≪ Previous: FEAST Mathematica

Hi all，

I have a new question about sparse matrix addition routine mkl_sparse_s_add, it return a strange result. And when I using double precision routine ml_sparse_d_add, everything is ok.

Following is the example program.

#include "mkl_spblas.h"
#include <stdio.h>

void print_csr_sparse_s(const sparse_matrix_t csrA)
{
    // Read Matrix Data and Print it
    int row, col;
    sparse_index_base_t indextype;
    int * bi, *ei;
    int * j;
    float *rv;
    sparse_status_t status = mkl_sparse_s_export_csr(csrA, &indextype, &row, &col, &bi, &ei, &j, &rv);
    if (status==SPARSE_STATUS_SUCCESS)
    {
        printf("SparseMatrix(%d x %d) [base:%d]\n", row, col, indextype);
        for (int r = 0; r<row; ++r)
        {
            for (int idx = bi[r]; idx<ei[r]; ++idx)
            {
                printf("<%d, %d> \t %f\n", r, j[idx], rv[idx]);
            }
        }
    }
    return;
}

void print_csr_sparse_d(const sparse_matrix_t csrA)
{
    // Read Matrix Data and Print it
    int row, col;
    sparse_index_base_t indextype;
    int * bi, *ei;
    int * j;
    double *rv;
    sparse_status_t status = mkl_sparse_d_export_csr(csrA, &indextype, &row, &col, &bi, &ei, &j, &rv);
    if (status==SPARSE_STATUS_SUCCESS)
    {
        printf("SparseMatrix(%d x %d) [base:%d]\n", row, col, indextype);
        for (int r = 0; r<row; ++r)
        {
            for (int idx = bi[r]; idx<ei[r]; ++idx)
            {
                printf("<%d, %d> \t %f\n", r, j[idx], rv[idx]);
            }
        }
    }
    return;
}

// test addition of sparse matrix
void test_add_s()
{
    // Define sparse-matrix M
    int mi[5] = {0, 2, 5, 8, 10};
    int mj[10] = {0, 1, 0, 1, 2, 1, 2, 3, 2, 3};
    float mv[10] = {2.0f, 1.0f, 1.0f, 2.0f, 1.0f, 1.0f, 2.0f, 1.0f, 1.0f, 2.0f};
    sparse_matrix_t M;

    // Define sparse-matrix N
    int ni[5] = {0, 1, 2, 3, 4};
    int nj[4] = {0, 1, 2, 3};
    float nv[4] = {3.0f, 2.0f, 1.0f, -1.0f};
    sparse_matrix_t N;

    // create csr matrix
    mkl_sparse_s_create_csr(&M, SPARSE_INDEX_BASE_ZERO, 4, 4, mi, mi+1, mj, mv);
    mkl_sparse_s_create_csr(&N, SPARSE_INDEX_BASE_ZERO, 4, 4, ni, ni+1, nj, nv);
    // output matrix
    print_csr_sparse_s(M);
    print_csr_sparse_s(N);

    // do addition
    sparse_matrix_t C;
    mkl_sparse_s_add(SPARSE_OPERATION_NON_TRANSPOSE ,M, 2, N, &C);

    // output result
    print_csr_sparse_s(C);

    // free memory
    mkl_sparse_destroy(M);
    mkl_sparse_destroy(N);
    mkl_sparse_destroy(C);
}

void test_add_d()
{
    // Define sparse-matrix M
    int mi[5] = {0, 2, 5, 8, 10};
    int mj[10] = {0, 1, 0, 1, 2, 1, 2, 3, 2, 3};
    double mv[10] = {2.0f, 1.0f, 1.0f, 2.0f, 1.0f, 1.0f, 2.0f, 1.0f, 1.0f, 2.0f};
    sparse_matrix_t M;

    // Define sparse-matrix N
    int ni[5] = {0, 1, 2, 3, 4};
    int nj[4] = {0, 1, 2, 3};
    double nv[4] = {3.0f, 2.0f, 1.0f, -1.0f};
    sparse_matrix_t N;

    // create csr matrix
    mkl_sparse_d_create_csr(&M, SPARSE_INDEX_BASE_ZERO, 4, 4, mi, mi+1, mj, mv);
    mkl_sparse_d_create_csr(&N, SPARSE_INDEX_BASE_ZERO, 4, 4, ni, ni+1, nj, nv);
    // output matrix
    print_csr_sparse_d(M);
    print_csr_sparse_d(N);

    // do addition
    sparse_matrix_t C;
    mkl_sparse_d_add(SPARSE_OPERATION_NON_TRANSPOSE ,M, 2, N, &C);

    // output result
    print_csr_sparse_d(C);

    // free memory
    mkl_sparse_destroy(M);
    mkl_sparse_destroy(N);
    mkl_sparse_destroy(C);
}

int main()
{
    test_add_d();
	test_add_s();
    return 1;
}

I got results twice, the first is correct, and the second is wrong.

SparseMatrix(4 x 4) [base:0]
<0, 0> 2.000000
<0, 1> 1.000000
<1, 0> 1.000000
<1, 1> 2.000000
<1, 2> 1.000000
<2, 1> 1.000000
<2, 2> 2.000000
<2, 3> 1.000000
<3, 2> 1.000000
<3, 3> 2.000000
SparseMatrix(4 x 4) [base:0]
<0, 0> 3.000000
<1, 1> 2.000000
<2, 2> 1.000000
<3, 3> -1.000000
SparseMatrix(4 x 4) [base:0]
<0, 0> 7.000000
<0, 1> 2.000000
<1, 0> 2.000000
<1, 1> 6.000000
<1, 2> 2.000000
<2, 1> 2.000000
<2, 2> 5.000000
<2, 3> 2.000000
<3, 2> 2.000000
<3, 3> 3.000000
SparseMatrix(4 x 4) [base:0]
<0, 0> 2.000000
<0, 1> 1.000000
<1, 0> 1.000000
<1, 1> 2.000000
<1, 2> 1.000000
<2, 1> 1.000000
<2, 2> 2.000000
<2, 3> 1.000000
<3, 2> 1.000000
<3, 3> 2.000000
SparseMatrix(4 x 4) [base:0]
<0, 0> 3.000000
<1, 1> 2.000000
<2, 2> 1.000000
<3, 3> -1.000000
SparseMatrix(4 x 4) [base:0]
<0, 0> 3.000000
<0, 1> 0.000000
<1, 0> 0.000000
<1, 1> 2.000000
<1, 2> 0.000000
<2, 1> 0.000000
<2, 2> 1.000000
<2, 3> 0.000000
<3, 2> 0.000000
<3, 3> -1.000000

Best regrads,

Tianxiong Lu

↧

What does the means of struct matrix_descr in mol_sparse_XXX routines?

September 17, 2015, 12:32 am

Latest and popular articles on Intel Technologies

≫ Next: FEAST algorithm : feastinit input parameter setting problem

≪ Previous: A strange result of sparse matrix addition with mol_sparse_s_add

Hi all,

In many mkl_sparse_XXX routines (version 11.3), there is a a parameter of type matrix_descr, just like :

sparse_status_t mkl_sparse_s_trsv (sparse_operation_t operation, float alpha, const sparse_matrix_t A, struct matrix_descr descr, const float *x, float *y);

The explanation of this parameter in manual is:

descr : Structure specifying sparse matrix properties. ......But it is confusing. Following is the code snippet in MKL example sparse_trsv.c,

   //*******************************************************************************
    //     Declaration and initialization of parameters for sparse representation of
    //     the matrix A in the compressed sparse row format:
    //*******************************************************************************
#define M 5
#define N 5
#define NNZ 13
    //*******************************************************************************
    //    Sparse representation of the matrix A
    //*******************************************************************************
    double csrVal[NNZ]    = { 1.0, -1.0,     -3.0,
                             -2.0,  5.0,
                                         4.0, 6.0, 4.0,
                             -4.0,       2.0, 7.0,
                                    8.0,          -5.0 };
    MKL_INT    csrColInd[NNZ] = { 0,      1,        3,
                              0,      1,
                                           2,   3,   4,
                              0,           2,   3,
                                      1,             4 };
    MKL_INT    csrRowPtr[M+1] = { 0, 3, 5, 8, 11, 13 };
    // Descriptor of main sparse matrix properties
    struct matrix_descr descrA;
    // Structure with sparse matrix stored in CSR format
    sparse_matrix_t       csrA;

......

    // Create matrix descriptor
    descrA.type = SPARSE_MATRIX_TYPE_TRIANGULAR;
    descrA.mode = SPARSE_FILL_MODE_LOWER;
    descrA.diag = SPARSE_DIAG_UNIT;

    // Compute y = alpha * A^{-1} * x
    mkl_sparse_d_trsv ( SPARSE_OPERATION_NON_TRANSPOSE,
                        alpha,
                        csrA,
                        descrA,
                        x,
                        y );

Obviously, the matrix csrA is not a triangular matrix, and the diagnoal is not unit. Why set the properties of descrA to TRIANGULAR and DIAG_UNIT?
If I change the properties to other option, for example:

    descrA.type = SPARSE_MATRIX_TYPE_GENERAL;
    descrA.diag = SPARSE_DIAG_NON_UNIT;

Now it can not get correct result.

How to setting the properties of matrix_descr correctly?

Best regards,

Tianxiong Lu

↧

FEAST algorithm : feastinit input parameter setting problem

September 17, 2015, 7:44 am

Latest and popular articles on Intel Technologies

≫ Next: Build R with Intel MKL shared library

≪ Previous: What does the means of struct matrix_descr in mol_sparse_XXX routines?

Hi,

I have a problem with setting the input parameters at MKL FEAST algorithm library.

(I currently use MKL 11.3 version and Intel parallel studio xe 2016 for c++ at Linux)

I was trying to use the dfeast_syev function, so I set the extended eigensolver input parameters by using the array fpm, as written in the reference manual.

When I modify the fpm values to change the options for dfeast_syev function, I get errors for setting illegal number in fpm.

But since the function dfeast_syev runs well and gives the right eigenvalues and vectors for the default option, the array fpm initialized with feastinit(fpm);, I checked the default fpm values initialized with feastinit. The initialized values are the followings :

fpm[0] = 0, fpm[1] = 12, fpm[2] = 0, fpm[3] = 5, fpm[4] = 0, fpm[5] =1, fpm[6] = 0, ... (rest fpm[i] values are all zeros.),

which is different from the default value written in the reference manual.

Please help me if you know what is wrong with me.

Thanks.

Sincerely,

Chiho Yoon

Here's my code.

----------

#include "mkl.h"
#include "mkl_solvers_ee.h"

void FEAST_eigen(int iN, double *A, double Emin, double Emax, double *E, double *X)
{
    /* Matrix A in dense format, size N by N*/
	/* Lower/upper bound of search interval [Emin,Emax] */

	char          UPLO = 'F'; /* Type of matrix: (F=full matrix, L/U - lower/upper triangular part of the matrix) */
    const MKL_INT N = (MKL_INT) iN;

    /* Declaration of FEAST variables */
    MKL_INT      fpm[128];      /* Array to pass parameters to Intel MKL Extended Eigensolvers */

    double       epsout;        /* Relative error on the trace */
    MKL_INT      loop;          /* Number of refinement loop */

    MKL_INT      M0 = N;            /* Initial guess for subspace dimension to be used */
    MKL_INT      M;             /* Total number of eigenvalues found in the interval */

    double       *res = (double *)  malloc (sizeof(double) * iN);       /* Residual */

    /* Declaration of local variables */
    MKL_INT      info;          /* Errors */
    double       *R = (double *) malloc (sizeof(double) * iN);         /* R = |E-Eig| */
    double       **Y = (double **)  malloc (sizeof(double*) * iN);     /* Y=(X')*X-I */

    MKL_INT      i, j;

    for (i=0; i<N*N; i++)
        X[i] = 0.0;

	for (i = 0; i < N; i++)
		Y[i] = (double *)  malloc (sizeof(double) * iN);

    /* Step 1. Call  FEASTINIT to define the default values for the input FEAST parameters */

	printf("ho!\n");
	feastinit(fpm);

//	fpm[0] = 1;
//	fpm[1] = 8;
//	fpm[2] = 15;
//	fpm[3] = 100;

///////////////////////////////////////////////////////////////////
//--------------> This code runs well, but once I set the numbers (fpm[0]~fpm[4]), I get these messages :
//Intel MKL Extended Eigensolvers: double precision driver
//Intel MKL Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default
//Intel MKL Extended Eigensolvers: fpm(1)=1
//Intel MKL Extended Eigensolvers: fpm(2)=0
//Intel MKL Extended Eigensolvers: fpm(3)=8
//Intel MKL Extended Eigensolvers: fpm(4)=0
//Intel MKL Extended Eigensolvers: fpm(5)=15
//Intel MKL Extended Eigensolvers: fpm(7)=100
//Search interval [-1.000000000000000e+01;1.000000000000000e+01]
//Intel MKL Extended Eigensolvers ERROR: Problem with array parameters
//==>INFO code =: 102
//Routine dfeast_syev returns code of ERROR: 102
////////////////////////////////////////////////////////////////////

	for (i = 0; i < 64 ; i++)
	{
		printf("%i\n",fpm[i]);
	}

    /* Step 2. Solve the standard Ax = ex eigenvalue problem. */

	dfeast_syev(
        &UPLO,   /* IN: UPLO = 'F', stores the full matrix */&N,      /* IN: Size of the problem */
        A,       /* IN: dense matrix A */&N,      /* IN: The first dimension of the matrix A */
        fpm,     /* IN/OUT: Array is used to pass parameters to Intel MKL Extended Eigensolvers */&epsout, /* OUT: Relative error of on the trace */&loop,   /* OUT: Contains the number of refinement loop executed */&Emin,   /* IN: Lower bound of search interval */&Emax,   /* IN: Upper bound of search interval */&M0,     /* IN: The initial guess for subspace dimension to be used. */
        E,       /* OUT: The first M entries of Eigenvalues */
        X,       /* IN/OUT: The first M entries of Eigenvectors */&M,      /* OUT: The total number of eigenvalues found in the interval */
        res,     /* OUT: The first M components contain the relative residual vector */&info    /* OUT: Error code */
        );

    if ( (int)info != 0 )
    {
        printf("Routine dfeast_syev returns code of ERROR: %i\n", (int)info);
        return;
    }

}

↧

Build R with Intel MKL shared library

September 18, 2015, 4:26 am

Latest and popular articles on Intel Technologies

≫ Next: pardiso error=-1, input inconsistent, what does it mean?

≪ Previous: FEAST algorithm : feastinit input parameter setting problem

I build sucessfully R 3.2.2 with Intel MKL and ICC. Now I am wondering if library linking are indeed correct. Here is the output:

% R CMD ldd BUILD/R-3.2.2/bin/exec/R
	linux-vdso.so.1 (0x00007ffdbddba000)
	libR.so => /usr/lib64/R/lib/libR.so (0x00007fc76f767000)
	libRblas.so => not found
	libm.so.6 => /lib64/libm.so.6 (0x00007fc76f452000)
	libiomp5.so => /opt/intel/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64_lin/libiomp5.so (0x00007fc76f111000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fc76eef9000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc76ecdd000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fc76e91d000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fc76e718000)
	libblas.so.3 => /lib64/libblas.so.3 (0x00007fc76c674000)
	libgfortran.so.3 => /lib64/libgfortran.so.3 (0x00007fc76c349000)
	libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007fc76c109000)
	libreadline.so.6 => /lib64/libreadline.so.6 (0x00007fc76bebf000)
	libtre.so.5 => /lib64/libtre.so.5 (0x00007fc76bcaf000)
	libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fc76ba3e000)
	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fc76b818000)
	libbz2.so.1 => /lib64/libbz2.so.1 (0x00007fc76b608000)
	libz.so.1 => /lib64/libz.so.1 (0x00007fc76b3f1000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fc76b1e9000)
	libicuuc.so.54 => /lib64/libicuuc.so.54 (0x00007fc76ae58000)
	libicui18n.so.54 => /lib64/libicui18n.so.54 (0x00007fc76aa00000)
	libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fc76a7de000)
	/lib64/ld-linux-x86-64.so.2 (0x000055986b173000)
	libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fc76a5b3000)
	libicudata.so.54 => /lib64/libicudata.so.54 (0x00007fc768b88000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fc768806000)

As you can see, libblas.so.3 is linked to /lib64/libblas.so.3 (0x00007fc76c674000), the library installed on my system. I was expecting something like this (the output comes from another machine with an already built package for R with MKL installed)

libmkl_gf_lp64.so => /opt/intel/mkl/lib/intel64/libmkl_gf_lp64.so (0x00007f3bdfef9000)
libmkl_core.so => /opt/intel/mkl/lib/intel64/libmkl_core.so (0x00007f3bde38c000)
libmkl_gnu_thread.so => /opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so (0x00007f3bdd635000)

Here is part of my package build configuration:

%global _omp_lib /opt/intel/lib/intel64
%global _mkllibs " -fopenmp -wl,--no-as-needed -L%{_mkllibpath} -L%{_omp_lib} -lmkl_intel_ilp64 -lmkl_core -lmkl_intel _intel_thread -liomp5 -lpthread"
%global _mklroot /opt/intel/compilers_and_libraries_2016.0.109/linux/mkl


# Not sure it is needed as all libs are defined in ld.so.conf.d/
source /opt/intel/bin/compilervars.sh intel64
source /opt/intel/bin/ifortvars.sh intel64

export CC="icc -std=c99"
export F77="ifort"
export CXX="icpc"
export FC="ifort"
export AR="xiar"
export LD="xild"

export CFLAGS="-ip -ipo -opt-mem-layout-trans=3 -xHost -mavx -fp-model precise -wd188 -DMKL_ILP64 -qopenmp -parallel -I%{_mklroot}/include"

%configure \
.......
         --with-blas=%{_mkllibs} \
         --with-lapack \
         --enable-R-shlib \
         --enable-memory-profiling \
         --enable-BLAS-shlib \
................

What am I doing wrong (if doing wrong) and why this liblas.so.3 in my ldd output? Why no libmkl_* => /opt/intel/...

I have tried various settings, with always same result.

Thank you for help

↧

pardiso error=-1, input inconsistent, what does it mean?

September 18, 2015, 4:45 am

Latest and popular articles on Intel Technologies

≫ Next: OpenMP MKL DGEMM Performance Issue

≪ Previous: Build R with Intel MKL shared library

I am feeding pardiso a diagonal matrix with several zeros on the main diagonal. It is returning error=-1, "input inconsistent", when I was expecting -7 for "diagonal matrix is singular".

What exactly does "input inconsistent" mean?

↧

OpenMP MKL DGEMM Performance Issue

September 18, 2015, 12:47 pm

Latest and popular articles on Intel Technologies

≫ Next: Unexpected behavior of cluster_sparse_solver

≪ Previous: pardiso error=-1, input inconsistent, what does it mean?

Hello,

I am doing development on a 24-core machine (E5-2697-v2). When I launch a single DGEMM where the matrices are large (m=n=k=15,000), the performance improves as I increase the number of threads used, which is expected. For reference, I get about 467 GFLOPs/sec using 24 cores.

Next, in an OpenMP parallel region, I have each thread launch an independent call to DGEMM where the matrices are large (m=n=k=15,000). Each thread has its own matrices which are used in its DGEMM. In this case, the overall performance improves as I increase the number of threads, up to a point. With higher numbers of threads, the overall performance decreases. What hardware limitation could be causing this? For reference, here are the performance results I got:

#threads         	Compute Speed Overall (GFLOP/sec)
1	                26.3
2	                52.6741
3	                76.6518
4	                102.413
5	                124.401
6	                148.394
7	                168.022
8	                190.557
9	                210.165
10	               232.156
11	               249.77
12	               271.149
13	               291.211
14	               313.747
15	               327.467
16	               349.917
17	               361.444
18	               377.498
19	               346.558
20	               368.453
21	               356.597
22	               319.446
23	               301.81
24	               277.273

↧

Unexpected behavior of cluster_sparse_solver

September 18, 2015, 2:12 pm

Latest and popular articles on Intel Technologies

≫ Next: does pardiso work for a 1 element matrix?

≪ Previous: OpenMP MKL DGEMM Performance Issue

Hello.

I am trying to solve a system with `cluster_sparse_solver`, and am getting unexpected results. https://gist.github.com/ivan-krukov/157372d9a55db244c4b4

In my case, given a sparse matrix `A`, we want to solve `Ax=e`, where `e` is the first column of the identity matrix with the appropriate size (A column where the first entry is 1, rest are zeros).

With small systems, there are no issues, but large systems end up returning unreasonable results. I suspect this might have to do with numerical stability of the solver depending on the number of zeros in the right hand side. For example, a 60 x 60 system returns the expected result, while a 70 x 70 system returns something that looks like overflow. I would also like to note that for different systems, where the right hand side has many non-zero entries, the solutions generated are quite reasonable.

Running the code linked above with: `mpirun -n 10 ./solver_bugs 60` gives the expected answer, while `mpirun -n 10 ./solver_bugs 70` provides something completely bogus.

In the long run, we were hoping to use this for _very_ large systems (100,000 x 100,000).

I am not certain how this problem can be diagnosed. While I hate to post "my code does not work" kind of questions, it seems like the only option here. Any help will be much appreciated.

The version of MKL in question is mkl 11.2u3, which was bundled with composer_xe 2015 3.187.

↧

does pardiso work for a 1 element matrix?

September 18, 2015, 10:00 pm

Latest and popular articles on Intel Technologies

≫ Next: need help updating MKL software

≪ Previous: Unexpected behavior of cluster_sparse_solver

I am finding that pardiso does not work for a 1 element matrix. Calling pardiso with phase=11 returns error=0, which means success, but the pt(:) pointer array is all zero's. So when I subsequently call pardiso with phase=23, it fails with error=-7.

Can someone please confirm that pardiso does or does not work for a 1 element matrix?

I want to run my code for a one element matrix because I am trying to run a validation case to compare to sample calculations in a finite element textbook. The case is a transient integration of a single dof system, thus the system matrix is a single nonzero element.

↧

need help updating MKL software

September 19, 2015, 4:26 am

Latest and popular articles on Intel Technologies

≫ Next: Will Intel give me MKL 11.0.5 to replace 11.0.4 ?

≪ Previous: does pardiso work for a 1 element matrix?

The MKL version that I have has bugs, and I need to update it. I need help getting it updated.

I am confused about what version of Intel Fortran I have installed.

The Help About box in Visual Studio says: Intel(R) Visual Fortran Composer XE 2013 Update 4 Integration for Microsoft Visual Studio* 2012, 13.0.3624.11

When compiling code, the output window says: Compiling with Intel(R) Visual Fortran Compiler XE 13.1.2.190

The MKL library that I have calls itself: Intel(R) Math Kernel Library Version 11.0.4 Product Build 20130517

I downloaded the "community" MKL library 11.3.0.110. w_mkl_11.3.0.110.exe about 565 megabytes. I installed it and rebuilt my project, but my project still uses 11.0.4.

The folder where 11.3.0.110 installed to was nearly 2 GB, and completely separate from where Composer XE 2013 is installed.

Help!!!

↧

Will Intel give me MKL 11.0.5 to replace 11.0.4 ?

September 19, 2015, 4:53 pm

Latest and popular articles on Intel Technologies

≫ Next: Data fitting of vector-valued function

≪ Previous: need help updating MKL software

I have encountered a confirmed bug with routine mkl_zcsrcoo in 11.0.4. It is my understanding that this bug has been fixed in 11.0.5. Would be it possible for me to get 11.0.5 ?

↧

Data fitting of vector-valued function

September 21, 2015, 2:12 am

Latest and popular articles on Intel Technologies

≫ Next: DSS Fortran Error

≪ Previous: Will Intel give me MKL 11.0.5 to replace 11.0.4 ?

Hello,

I have a problem using the data fitting for a linear interpolation of a vector valued function. It seems that the format rhint is ignored by the function dfdInterpolate1D.

#include <iostream>
#include "mkl.h"

#define ALIGNMENT 64

int main(int argc, char** argv){

  DFTaskPtr task;
  MKL_INT NX = 3;
  MKL_INT NY = 2;

  double* x = (double*)mkl_malloc(NX*sizeof(double), ALIGNMENT);
  double* y = (double*)mkl_malloc(NX*NY*sizeof(double), ALIGNMENT);

  x[0] = 0; x[1] = 1; x[2] = 2;

  /* y is 2 dimensional in col-major format (y(0)=(0;1), y(1)=(5;1), y(3)=(10;1)) */
  y[0] = 0; y[1] = 1;
  y[2] = 5; y[3] = 1;
  y[4] = 10; y[5] = 1;
  MKL_INT yHint = DF_MATRIX_STORAGE_COLS;

  int status = dfdNewTask1D( &task, NX, x, DF_NO_HINT, NY, y, yHint);
  std::cout << "Status after dfdNewTask1D: "<< status << std::endl;


  MKL_INT s_order = DF_PP_LINEAR;
  MKL_INT s_type = DF_PP_DEFAULT;

  MKL_INT ic_type = DF_NO_IC;
  double* ic = NULL;

  MKL_INT bc_type = DF_NO_BC;
  double* bc = NULL;

  double* scoeff = (double*)mkl_malloc(NY*(NX-1)* s_order * sizeof(double), ALIGNMENT);
  MKL_INT scoeffhint = DF_NO_HINT;

  status = dfdEditPPSpline1D( task, s_order, s_type, bc_type, bc, ic_type,
                              ic, scoeff, scoeffhint );
  std::cout << "Status after dfdEditPPSpline1D: "<< status << std::endl;


  status = dfdConstruct1D( task, DF_PP_SPLINE, DF_METHOD_STD );
  std::cout << "Status after dfdConstruct1D: "<< status << std::endl;

  MKL_INT NSITE = 2;
  double* site =  (double*)mkl_malloc(NSITE*sizeof(double), ALIGNMENT);
  site[0] = 0.5; site[1] = 1.5;
  MKL_INT sitehint = DF_NO_HINT;

  double* r = (double*)mkl_malloc(NSITE*NY*sizeof(double), ALIGNMENT);
  MKL_INT rhint = DF_MATRIX_STORAGE_COLS;

  MKL_INT ndorder = 1;
  MKL_INT dorder = 1;
  double* datahint = NULL;
  MKL_INT* cell = NULL;

  status = dfdInterpolate1D( task, DF_INTERP, DF_METHOD_PP, NSITE, site,
                             sitehint, ndorder, &dorder, datahint, r, rhint, cell );
  std::cout << "Status after dfdInterpolate1D: "<< status << std::endl;


  status = dfDeleteTask( &task );
  std::cout << "Status after dfDeleteTask: "<< status << std::endl;


  /* Print output */
  std::cout << "scoeff = ( ";
  for (int i = 0; i<NY*(NX-1)* s_order; i++){
    std::cout << scoeff[i] << "";
  }
  std::cout << ")"<< std::endl;

  std::cout << "r = ( ";
  for (int i = 0; i<NSITE*NY; i++){
    std::cout << r[i] << "";
  }
  std::cout << ")"<< std::endl;

  std::cout << "r_expected = ( 2.5 1 7.5 1 )"<< std::endl;

  mkl_free(x);
  mkl_free(y);
  mkl_free(scoeff);
  mkl_free(site);
  mkl_free(r);

  return 0;
}

Output is (Compiled on RedHat 64bit with parallel studio 2016):

Status after dfdNewTask1D: 0

Status after dfdEditPPSpline1D: 0

Status after dfdConstruct1D: 0

Status after dfdInterpolate1D: 0

Status after dfDeleteTask: 0

scoeff = ( 0 5 5 5 1 0 1 0 )

r = ( 2.5 7.5 1 1 )

r_expected = ( 2.5 1 7.5 1 )

RUN FINISHED; exit value 0; real time: 30ms; user: 0ms; system: 0ms

The output is written in row-major format to r instead of the choosen col-major format using rhint. Is there a problem with my code or is it not possible to get the expected format.

Thanks,

Mario

↧

DSS Fortran Error

September 21, 2015, 2:22 pm

Latest and popular articles on Intel Technologies

≫ Next: need help understanding error=-4 from pardiso

≪ Previous: Data fitting of vector-valued function

Hi all,

Part of my Ph.D. research involves solving coupled reaction diffusion equations on arbitrary closed surfaces using Finite Element Method. In the past I had been using the ?GESV subroutine to solve the resulting systems of equations, but due to the large resolution of my meshes, this became impractical to use, especially since the matrix is sparse. I then discovered Intel's sparse solvers and have been attempting to implement the DSS routine.

This is where my problem lies. For the matrix equation Ax=b, my A matrix is symmetric indefinite. I have no difficulty solving a symmetric system of equations like the one below:

...

INTEGER, PARAMETER :: nNonZeros=9
INTEGER, PARAMETER :: nRows=5,nCols=5
DOUBLE PRECISION, DIMENSION(nNonZeros) :: rValues
DOUBLE PRECISION, DIMENSION(nRows) :: rRhsValues
DOUBLE PRECISION, DIMENSION(nRows) :: rSolValues
INTEGER, DIMENSION(nRows+1) :: rowIndex
INTEGER, DIMENSION(nNonZeros) :: columns
INTEGER :: error
INTEGER*8 :: fhandle

rValues=(/9.,1.5,6.,0.75,3.,0.5,12.,0.625,16./)
rRhsValues=(/1.,2.,3.,4.,5./)
rowIndex=(/1,6,7,8,9,10/)
columns=(/1,2,3,4,5,2,3,4,5/)

error=DSS_CREATE(fhandle,MKL_DSS_DEFAULTS)

error=DSS_DEFINE_STRUCTURE( fhandle, MKL_DSS_SYMMETRIC, rowIndex, nRows, nCols, columns, nNonZeros )

error=DSS_REORDER(fhandle, MKL_DSS_DEFAULTS,0)

error=DSS_FACTOR_REAL(fhandle,MKL_DSS_POSITIVE_DEFINITE, rValues)

error=DSS_SOLVE_REAL(fhandle, MKL_DSS_DEFAULTS, rRhsValues, 1, rSolValues)

WRITE(*,*) rSolValues

...

The solution correctly gives -326.333333333331 982.999999999994 163.416666666666
397.999999999998 61.4999999999996

However, if I change one value in the matrix, say convert the 12 on the diagonal to a -12, the matrix will now be symmetric indefinite. I also change the opt parameter in the DSS_FACTOR_REAL subroutine from MKL_DSS_POSITIVE_DEFINITE to MKL_DSS_INDEFINITE. The updated code is displayed below:

...

rValues=(/9.,1.5,6.,0.75,3.,0.5,-12.,0.625,16./)
rRhsValues=(/1.,2.,3.,4.,5./)
rowIndex=(/1,6,7,8,9,10/)
columns=(/1,2,3,4,5,2,3,4,5/)

error=DSS_CREATE(fhandle,MKL_DSS_DEFAULTS)

error=DSS_DEFINE_STRUCTURE( fhandle, MKL_DSS_SYMMETRIC, rowIndex, nRows, nCols, columns, nNonZeros )

error=DSS_REORDER(fhandle, MKL_DSS_DEFAULTS,0)

error=DSS_FACTOR_REAL(fhandle,MKL_DSS_INDEFINITE, rValues)

error=DSS_SOLVE_REAL(fhandle, MKL_DSS_DEFAULTS, rRhsValues, 1, rSolValues)

WRITE(*,*) rSolValues

...

The program throws an "MKL-DSS-DSS-Error, Zero pivot detected" error during the DSS_FACTOR_REAL subroutine.

Why does it say this? The matrix does not have any zero pivots since in reduced row echelon form it is simply the identity matrix.

Any help would be appreciated. What am I doing wrong?

↧

need help understanding error=-4 from pardiso

September 21, 2015, 3:05 pm

Latest and popular articles on Intel Technologies

≫ Next: CNR mode reporting incorrect SIMD version via KVM

≪ Previous: DSS Fortran Error

I am calling pardiso with mtype=11 (real nonsymmetric).

I first call it with phase=11, iparm(1)=0, and the result is error=0. So all is good.

I then call it with phase=22, and error=0. So all is still good.

I then call it multiple times with phase=33 with different rhs's. The first two calls return error=0, but the third call returns error=-4. What does this mean? Does -4 mean I have a bug somewhere, or can -4 be generated by something it doesn't like about the rhs? The rhs for the third call looks normal to me (i.e. no outlier elements).

↧

CNR mode reporting incorrect SIMD version via KVM

September 23, 2015, 12:55 am

Latest and popular articles on Intel Technologies

≫ Next: cluster_sparse_solver cause segmentation fault in mkl 11.3

≪ Previous: need help understanding error=-4 from pardiso

We had a problem with inconsistent results across some of our grid nodes, which I thought was worth sharing. After investigation we pinned this down to two different OS configurations returning different results:

Baremetal windows 2008
Virtual windows 2008 running in KVM on RHEL

Both of the machines are identical in terms of hardware (Xeon E7-4870), which supports SSE4.1/2. At the time we were using MKL v11.1.2.

We use MKL’s CNR mode to force SIMD to use only SSE3 instructions, thus achieving numerical consistency across a range of hardware. What we discovered was that on the VM, the call to MKL_CBWR_Get_Auto_Branch was returning SSE3, and as a result we were not calling ::MKL_CBWR_Set(SSE3). Subsequently calculations on that machine were actually using SSE4 instructions, and this turned out to be the source of the numerical differences we were seeing.

The only numerical differences we saw between SSE3/SSE4 emanated from BLAS, although this may be circumstantial.

Although this was easily fixed (by always calling ::MKL_CBWR_Set(SSE3) regardless of what MKL_CBWR_Get_Auto_Branch returns) it took a great deal of investigation to pinpoint the problem.

Whether this issue stems from KVM rather than MKL itself I simply do not know, but thought it was worth sharing.

Thanks,

↧

cluster_sparse_solver cause segmentation fault in mkl 11.3

September 23, 2015, 12:08 pm

Latest and popular articles on Intel Technologies

≫ Next: Pardiso hangs in phase 33 when called from an OMP critical region in IVF 16.0

≪ Previous: CNR mode reporting incorrect SIMD version via KVM

Hi:

My environment: linux64, mpicxx for MVAPICH2 version 2.0b, icpc version 13.1.3 (gcc version 4.7.0 compatibility). In order not to confuse with the mkl library in icpc version 13.1.3, I put the mkl 11.3 in /home/intel.

I use the following command:

mpic++ cluster_sparse_solverc/source/cl_solver_unsym_c.c -Wl,-rpath=/home/intel/mkl/lib/intel64 -Wl,-rpath=/home/intel/compiler/lib/intel64 -L/home/intel/mkl/lib/intel64 -L/home/intel/compiler/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_lp64 -liomp5

to compile and cause segmentation fault. But in mkl 11.2.4, it is totally correct. So is it a bug in mkl 11.3?

↧

Pardiso hangs in phase 33 when called from an OMP critical region in IVF 16.0

September 22, 2015, 12:41 am

Latest and popular articles on Intel Technologies

≫ Next: Getting the content of *pt in Pardiso

≪ Previous: cluster_sparse_solver cause segmentation fault in mkl 11.3

Hi all,

I suspect this is a problem with my code, but I can't figure out what the issue is...

I've been using similar code for a while now, and its worked fine until I installed 16.0, however it now hangs in the call to Pardiso with phase 33. Basically I import a matrix, factorize then solve in an OMP parallel loop for many RHS. Because of the layout of my main program (I provide a much stripped down version here, as well as a data file), the same temporary array is used by each thread, and the solution vector overwrites the right hand side. I'm unsure if Pardiso actually uses this temporary array - but in case it does I place the call to Pardiso within an OMP critical section to prevent multiple threads accessing it at the same time.

I compile with parallel OMP (/Qopenmp) and parallel MKL (/Qmkl:parallel) flags, otherwise everything is as Visual Studio sets it by default.

If OMP is set to one thread it runs fine. If I comment out the OMP parallel directive on the loop it also seems to run fine. However on my 8 (4 with hyper threading) core machine when running in parallel no joy.

If anyone has any suggestions/thoughts/solutions it would be appreciated,

Thanks,

Michael

include 'mkl_pardiso.f90'
    program TestPardiso2
    use OMP_LIB
    use MKL_PARDISO

    implicit none

    double precision, allocatable :: A(:), toSolve(:,:)
    integer, allocatable :: ia(:), ja(:)
    integer :: N

    integer, parameter :: numberToSolve = 20

    ! Read matrix from file
    call GetMatrixFromFile("Output.txt", N, ia, ja, A)

    ! Invent some data to solve
    allocate(toSolve(N,numberToSolve))
    call RANDOM_NUMBER(toSolve)

    ! Solve using Pardiso
    call SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)

    contains

    subroutine GetMatrixFromFile(name, N, ia, ja, A)
        character(len=*), intent(in) :: name
        double precision, intent(out), allocatable :: A(:)
        integer, intent(out), allocatable :: ia(:), ja(:)
        integer, intent(out) :: N

        character(len=255) :: buffer
        integer :: nnz, status, iVal, prevIval, cnt

        ! open file
        open(UNIT=21, FILE=name, STATUS="OLD", IOSTAT=status)

        ! Get matrix size from first lines of file
        read(UNIT=21, FMT='(A)') buffer ! Ignore title
        read(UNIT=21, FMT='(A5, I)') buffer, N
        read(UNIT=21, FMT='(A7, I)') buffer, nnz
        read(UNIT=21, FMT='(A)') buffer ! Ignore column heading
        read(UNIT=21, FMT='(A)') buffer ! Ignore space

        ! Allocate matrix - assume no errors!
        allocate(ia(N + 1), ja(nnz), A(nnz))

        ! Loop thru file. Assume only end of file error will occur
        status = 0
        cnt = 0
        prevIval = 0
        do while (status == 0)
            cnt = cnt + 1
            read(UNIT=21, FMT='(X, I, X, I, X, E)', IOSTAT=status) ival, ja(cnt), A(cnt)

            if ( ival /= prevIval ) then
                ia(ival) = cnt
                prevIval = ival
            end if
        end do

        ia(N + 1) = nnz + 1

    end subroutine GetMatrixFromFile


    subroutine SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)
        integer, intent(in) :: N, numberToSolve
        integer, intent(in) :: ia(:), ja(:)
        double precision, intent(in) :: A(:)

        double precision, intent(inout) :: toSolve(:,:)

        type(MKL_PARDISO_HANDLE) :: pt(64)
        integer :: param(64), perm(N), error, i
        double precision:: tmpArray(N)

        ! Initialize Pardiso options
        CALL pardisoinit(pt, 2, param)
        param( 6) = 1 ! Solver stores the solution in the right-hand side b.
        param(27) = 1 ! Check input data

        ! call omp_set_num_threads(1) ! Uncommenting this line Works

        ! Solve
        call pardiso(pt, 1, 1, 2, 12, N, A, ia, ja, perm, 1, param, 1, toSolve(:, 1), tmpArray, error)

        !$OMP PARALLEL DO DEFAULT(SHARED) PRIVATE(i)
        DO i = 1, numberToSolve
            WRITE(*,*) (omp_get_thread_num() + 1), " : at critical"
            !$OMP CRITICAL (criticalPardisoSection2109)
            WRITE(*,*) (omp_get_thread_num() + 1), " : solving"

            ! Solve
            call pardiso(pt, 1, 1, 2, 33, N, A, ia, ja, perm, 1, param, 1, toSolve(:, i), tmpArray, error)

            WRITE(*,*) (omp_get_thread_num() + 1), " : complete"
            !$OMP END CRITICAL (criticalPardisoSection2109)
        END DO
        !$OMP END PARALLEL DO

    end subroutine SolveWithPardiso

    end program TestPardiso2

Attachment	Size
Download Output.txt	8.32 MB

↧

Getting the content of *pt in Pardiso

September 24, 2015, 12:49 am

Latest and popular articles on Intel Technologies

≫ Next: Inverse of very small matrix

≪ Previous: Pardiso hangs in phase 33 when called from an OMP critical region in IVF 16.0

is there a way to see the Content of Workspace for pardiso ( moslty call void PT[64] ), It seems that the Permutation Matrix of PAx=Py preserves quite well the sparsity in LU, or LL* , in my app I need only L (for Hermitian mtx) . I would like to get L and P from pt

Any pointer will be gretly appreciated !

Thanks

↧

Inverse of very small matrix

September 24, 2015, 3:46 am

Latest and popular articles on Intel Technologies

≫ Next: unexpected ?potrf subroutine failure

≪ Previous: Getting the content of *pt in Pardiso

Dear all,

I have a piece of code in Fortran90 in which I have to solve both a non-linear (for which I have to invert the Jacobian matrix) and a linear system of equations. When I say very small I mean n unknowns for both operations, with n<=4. Unfortunately, n is not known a priori. What do you think is the fastest option? I thought of writing explicit formulas for cases with n=1,2 and using other methods for n=3,4 (e.g. some functions of the Intel MKL libraries), for the sake of performance. Is this sensible or should I write explicit formulas for the inverse matrix also for n=3,4?

The code is going to be called very many times in a Finite Element Method analysis, for this reason I was looking for the fastest solution. I was also looking for a comparison chart of MKL matrix inversion routines for very small matrices versus explicit methods.

Regards,

↧

unexpected ?potrf subroutine failure

September 24, 2015, 4:03 pm

Latest and popular articles on Intel Technologies

≫ Next: undefined symbol mkl_blas_avx2_cgemm_copyb_ext

≪ Previous: Inverse of very small matrix

AHi,

I need to generate a gaussian correlated noise based on a covariance matrix with fortran. I am using spotrf subroutine to do the cholesky decomposition. However, the decomposition is always failed at a specific row of input matrix. As the reference indicates, it means the leading minors of that row is not positive definite. But previous work with the same data in matlab chol function shows that the matrix is positive definite.

This problem is killing me now. Can anyone help me get out of the swap ? Or just if there is a better way to generate correlated noise without getting involved in the covariance matrix.The code for generate the covariance matrix and cholesky decomposition is listed below. Thanks a lot!!

-------------------------------------------------------------------------------------------

Program DensityRealization

Implicit None

Real::lc

Integer::std,cases,Layer_Num,i,j,CholFlag

Real,Allocatable,Dimension(:,:)::c,sigma,rhostd

Real,Allocatable,Dimension(:)::z,Rnd

REAL,Parameter::p3=38.2273

Layer_Num=13827

std=20

cases=500

Allocate(z(Layer_Num),c(Layer_Num,Layer_Num),sigma(Layer_Num,Layer_Num),rhostd(Layer_Num,1),Rnd(cases*Layer_Num))

DO i=1,10000

z(i)=(i-1)*0.01

END DO

DO i=10001,11799

z(i)=100+(i-10000-1)*0.5

END DO

DO i=11800,Layer_Num

z(i)=1000+(i-11799-1)

END DO

! coveriance matrix

Do i=1,Layer_Num

Do j=1,i

c(i,j)=abs(z(i)-z(j))

c(j,i)=c(i,j)

End DO

c=-c/lc

c=exp(c)

rhostd(:,1)=std*exp(-z/p3)

sigma=matmul(rhostd,TRANSPOSE(rhostd))

sigma=sigma*c

!cholesky decomposition

CALL SPOTRF('U',Layer_Num,sigma,Layer_Num,CholFlag)

print*,CholFlag

End

↧