Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 2652 articles
Browse latest View live

Batched dgemm performance plateaus?

$
0
0

I have a problem where I need to compute many (1e4 - 1e6) small matrix-matrix and matrix-vector products (matrix dimensions around ~15 - 35). This problem seems "embarrassingly parallel" to me, and so I am confused as to why I am seeing the following performance issue: on a Google Cloud compute server with 48 physical cores (96 logical cores), performance plateaus at 10-16 threads. Adding additional threads does not reduce computation time. I have tried several different approaches: (1) cblas_dgemm_batch; (2) calling cblas_dgemm within a tbb::parallel_for, with both sequential and TBB-threaded MKL; (3) JIT-compiled problem-specific dgemm kernel (created with mkl_jit_create_dgemm) within a parallel_for; (4) mkl_dgemm_compact (along with mkl_dgepack and mkl_dgeunpack).

All of these yield roughly comparable performance (except for the compact functions--there, packing and unpacking time completely dominates computation time), but none of them seems to yield performance that scales linearly with the number of threads I specify, as I would expect. The maximum performance I see is around 50 GFLOPS on a system capable of around 1-2 TFLOPS. (Indeed, multiplying two large matrices achieves performance in the teraflop range.) Is this the best I can expect? Why do I not see performance scaling linearly with thread count on this embarrassingly parallel problem?


Duplicated value in VSL error codes

$
0
0

mkl_vsl_defines.h (2019.1.144) contains two defines:

#define VSL_CC_ERROR_PRECISION (-2400)
#define VSL_CC_ERROR_METHOD (-2400)

with the same value. Is this an error?

Taking into account these defines:

#define VSL_CC_ERROR_TYPE (-2130)
#define VSL_CC_ERROR_EXTERNAL_PRECISION (-2141)
#define VSL_CC_ERROR_INTERNAL_PRECISION (-2142)

VSL_CC_ERROR_PRECISION should probably be equal to 2140.

cblas_dgemm crashing when thread is terminated

$
0
0

Hello,

I have a multithread code and I using MKL multithreaded as well.

I have created a unit test that always crashes MKL.

Let me explain the test:

  • From the main thread (MT), I create another thread (T1) to multiply a 4096x4096 matrix.
  • T1 calls cblas_dgemm
  • It's a heavy processing, then I let MT sleep for 1 second
  • Then, I invoke T1 termination
  • MKL crashes 100% of my attempts so far

Would someone know how can I work this around? I mean, to turn cblas_dgemm thread safer.

Thanks.

Documentation - access denied

Problems with mkl_cluster_sparse_solver

$
0
0

Dear all,

unfortunately, I have again some troubles with the mkl_cluster_sparse_solver as in my previous topic. I have taken one of the examples intel provides in the example dir of mkl and modified it in two ways: on the one hand the code can now read an arbitrary matrix stored in the file fort.110 and on the other hand I perform a loop over the routines since I want to change the matrix within one cycle later on. The first problem arises when treating large system sizes.

In this case, you can find the matrix in fort1.zip. The program aborts with a segmentation fault after 18%: forrtl: severe (174): SIGSEGV, segmentation fault occurred. Unfortunately, this is somehow hard to track down what is the issue but it must be in the subroutine since it starts. As I said this happens for large matrices. Unfortunately I dont know how to get rid of this problem.

The next problem occurs for small matrices as found in fort.zip. The problem seems to be the loop: the first cycle everything works fine but the second cycle aborts with an error message I have already seen in one of my last topics:

Fatal error in PMPI_Reduce: Message truncated, error stack:
PMPI_Reduce(2334).................: MPI_Reduce(sbuf=0x7d7d7f8, rbuf=0x7f0b900, count=22912, MPI_DOUBLE, MPI_SUM, root=0, comm=0x84000004) failed
MPIR_Reduce_impl(1439)............: fail failed
I_MPIR_Reduce_intra(1533).........: Failure during collective
MPIR_Reduce_intra(1201)...........: fail failed
MPIR_Reduce_Shum_ring(833)........: fail failed
MPIDI_CH3U_Receive_data_found(131): Message from rank 1 and tag 11 truncated; 14000 bytes received but buffer size is 1296

I have tried what I did the last time: provide all parameters (nhrs, msglevel, iparm, ..) for all ranks again but it does not seem to fix the issue.

This is the program code (cl_solver_f90.f90):

program cluster_sparse_solver
use mkl_cluster_sparse_solver
implicit none
include 'mpif.h'
integer, parameter :: dp = kind(1.0D0)
!.. Internal solver memory pointer for 64-bit architectures
TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE)  :: pt(64)

integer maxfct, mnum, mtype, phase, nrhs, error, msglvl, i, ik, l1, k1, idum(1), DimensionL, Nsparse
integer*4 mpi_stat, rank, num_procs
double precision :: ddum(1)

integer, allocatable :: IA( : ),  JA( : ), iparm( : )
double precision, allocatable :: VAL( : ), rhodot( : ), rho( : )

integer(4) MKL_COMM


MKL_COMM=MPI_COMM_WORLD
call mpi_init(mpi_stat)
call mpi_comm_rank(MKL_COMM, rank, mpi_stat)


do l1 = 1, 64
  pt(l1)%dummy = 0
end do

 error       = 0   ! initialize error flag
 msglvl      = 1   ! print statistical information
 mtype       = 11  ! real, non-symmetric
 nrhs        = 1
 maxfct      = 1
 mnum        = 1

allocate(iparm(64))
 
do l1 = 1, 64
 iparm(l1) = 0
end do

!Setup PARDISO control parameter
 iparm(1)  = 1   ! do not use default values
 iparm(2)  = 3   ! fill-in reordering from METIS
 iparm(8)  = 100 ! Max. number of iterative refinement steps on entry
 iparm(10) = 13  ! perturb the pivot elements with 1E-13
 iparm(11) = 1   ! use nonsymmetric permutation and scaling MPS
 iparm(13) = 1   ! Improved accuracy using nonsymmetric weighted matching
 iparm(27) = 1   ! checks whether column indices are sorted in increasing order within each row

read(110,*) DimensionL, Nsparse

allocate(VAL(Nsparse),JA(Nsparse),IA(DimensionL))

if (rank.eq.0) then
do k1=1,Nsparse
read(110,*) VAL(k1)
end do
do k1=1,DimensionL+1
read(110,*) IA(k1)
end do
do k1=1,Nsparse
read(110,*) JA(k1)
end do
end if

allocate(rhodot(DimensionL), rho(DimensionL))

if (rank.eq.0) then
rhodot=0.0d0
rhodot(1) = 1.0d0
rho=0.0d0
end if

if (rank.eq.0) write(*,*) 'INIT PARDISO'

ik = 0
Pardisoloop: do

ik = ik + 1

phase = 12
call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error )
if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error

phase = 33
call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, MKL_COMM, error )
if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error

if (ik.ge.4) exit Pardisoloop

end do Pardisoloop


call MPI_BARRIER(MKL_COMM,mpi_stat)

phase = -1
call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error )
if (error.ne.0.and.rank.eq.0) write(*,*) 'Release of memory: ', error


call mpi_finalize(mpi_stat)

end

I compile with

mpiifort -i8 -I${MKLROOT}/include -c -o mkl_cluster_sparse_solver.o ${MKLROOT}/include/mkl_cluster_sparse_solver.f90
mpiifort -i8 -I${MKLROOT}/include -c -o cl_solver_f90.o cl_solver_f90.f90
mpiifort mkl_cluster_sparse_solver.o cl_solver_f90.o -o MPI.out  -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

and run the program with mpiexec -n 2 ./MPI.out. Our cluster has 16 cores per node and I request two of them. Ram should not be the problem (64gb), since it perfectly runs with the normal pardiso on just one node. I set export MKL_NUM_THREADS=16. Am I right that the slave MPI process should automatically obtain parts of the LL^T factors or do I have to use the distributed version in order to do so? The reason why I ask is that I cannot observe any process running on the second node.

The Versions are: MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239, but my college can also reproduce the issue on other versions/clusters.

Thanks in advance,

Horst

AttachmentSize
Downloadapplication/zipfort1.zip52.63 MB
Downloadapplication/zipfort.zip356.13 KB

Questions about sgels example

$
0
0

I have two questions about sgels example:

(1) why need to call sgels twice?

/* Query and allocate the optimal workspace */
    lwork = -1;
    sgels( "No transpose", &m, &n, &nrhs, a, &lda, b, &ldb, &wkopt, &lwork, &info );
    lwork = (MKL_INT)wkopt;
    work = (float*)malloc( lwork*sizeof(float) );
    /* Solve the equations A*X = B */
    sgels( "No transpose", &m, &n, &nrhs, a, &lda, b, &ldb, work, &lwork,  &info );

(2) where to decide row-major matrix or column -major matrix?

Thank you very much!

Antivirus reports trojan in MKL2019-4

$
0
0

When installing MKL for windows(w_mkl_2019.4.245) my antivirus reports the following:

temcat.tcat is infected with Gen:Trojan.Heur.LP.rS8@ayyXnNek

Happens when installer is almost finished during VS integration step, but not sure it is related.

 

on a side note alot of the download links on the performance libraries page for windows results in 404s (may 26th)

Intel® MKL version 2019 Update 4 is now available

$
0
0

Intel® Math Kernel Library (Intel® MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance.

Intel MKL 2019 Update 3 packages are now ready for download.

Intel MKL is available as part of the Intel® Parallel Studio XE and Intel® System Studio. Please visit the Intel® Math Kernel Library Product Page.

Please see What's new in Intel MKL 2019 and in MKL 2019 Update 4 follow this link - https://software.intel.com/en-us/articles/intel-math-kernel-library-rele...

and here is the link to the MKL 2019 Bug Fix list - https://software.intel.com/en-us/articles/intel-math-kernel-library-2019...


Intel MKL make R lm() fails if observations are high enough

$
0
0

I have installed Intel MKL on my Kubuntu 19.04. I have R 3.6.0.

Using Intel MKL, R's linear regression failes if the number of samples is somehow high (20k). Here is the code:

```
rm(list=ls())
N = 20000
xvar <- runif(N, -10, 10) 
e <- rnorm(N, mean=0, sd=1)
yvar <- 1 + 2*xvar + e
plot(xvar,yvar)
lmMod <- lm(yvar~xvar)
print(summary(lmMod))
```

The coefficients are just random numbers and are not significant, R-squared is low. Instead for lower N (like 2000) it works.

Just uninstalling Intel MKL and thus relying back on OpenBLAS solved the problem completely.
 

Also check here

Download link leads to error 404

$
0
0

Hi everyone,

I tried to register and download the MKL library from the links below. However both links lead to an error 404-page not found.

Thanks in advance for any help.

 

Visual Studio Integration 2017 with 2015 toolset

$
0
0

I have a C++ project in Visual Studio 2017, which is using the VS2015 toolset

I selected the VS2017 integration when I installed the MKL 2019.4.245, but I only see the integration options when the project is set to use the 2017 toolset.

 

To get the integration options in VS2017 for the 2015 toolset, I had to uninstall the MKL, install VS2015 (which took an hour), reinstall the MKL with the 2015 and 2017 integration options.  I now see the MKL integration in VS2017 when using the 2015 toolset.

 

Can you fix the MKL installer to work with the 2015 toolset when it is installed as part of 2017?

 

 

 

 

Problem when solving large system using Scalapack PDGESV

$
0
0

A parallel fortran code that solves a set of linear simultaneous equations Ax = b using the scalapack routine PDGESV fails (exiting with segmentation fault) when the no. of equations, N,  becomes large.  I have not identified the exact value of N at which problems arise, but, for example, the code works for all the values I have tested up to N= 50000, but fails at N=94423.

In particular, the failure appears to occur during the call to the scalapack routine (i.e. not when allocating / deallocating memory);
it enters routine PDGESV, but does not leave this routine.

I have prepared a simple small Fortran example code (see attachment below) that exhibits this problem.  This code simply 1) allocates space for the matrix A and vector b, 2) fills their entries with random entries 3) calls PDGESV and then 4) deallocates the memory. The code has been tested on a variety of different matrix sizes (NxN) and with various BLACS processor arrays without any errors until N becomes large. 

The problem does not seem to be a problem with lack of memory; on the machine I execute the code 192 GB is available,

whereas the code only uses 65 GB when N=94423. I have tried using the 'ulimit -s unlimited' command , but this did not resolve the problem. My feeling is that instead there is some problem with maybe exceeding some default limit on what memory is available to a single process in mpi? i.e. perhaps I am simply missing some appropriate FLAGS at compilation / run time?

I am running the program on a linux cluster using  Red Hat Enterprise Linux Server release 7.3 (Maipo)

I compiled the following code with:

mpiifort -mcmodel=medium    -m64  -mkl=cluster  -o para.exe  solve_by_lu_parallelmpi_simple_light2.for

 

and run it using (for example when N= 9445)

mpiexec.hydra  -n 4 ./para.exe  9445 2 2 32

the command line arguments here denote selecting N=9445 and using a 2x2 BLACS process array with block size 32

For this smaller matrix size the program runs w/out any problems producing the output

WE ARE SOLVING A SYSTEM OF         9445  LINEAR EQUATIONS
 PROC:            0           0 HAS  MLOC, NLOC =        4736        4736
 PROC:            0           0  ALLOCATING SPACE ...
 PROC:            0           0  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            0           1 HAS  MLOC, NLOC =        4736        4709
 PROC:            0           1  ALLOCATING SPACE ...
 PROC:            0           1  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            1           0 HAS  MLOC, NLOC =        4709        4736
 PROC:            1           0  ALLOCATING SPACE ...
 PROC:            1           0  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            1           1 HAS  MLOC, NLOC =        4709        4709
 PROC:            1           1  ALLOCATING SPACE ...
 PROC:            1           1  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            1           1
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            1           0
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            0           1
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            0           0
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 
 INFO code returned by PDGESV =            0

SO far so good. But when I try to solve a larger system using

mpiexec.hydra -n $NUM_PROCS ./para.exe  9445 2 2 32

the program crashes during the call to PDGESV with the output

WE ARE SOLVING A SYSTEM OF        94423  LINEAR EQUATIONS
 PROC:            0           0 HAS  MLOC, NLOC =       47223       47223
 PROC:            0           0  ALLOCATING SPACE ...
 PROC:            0           0  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            0           1 HAS  MLOC, NLOC =       47223       47200
 PROC:            0           1  ALLOCATING SPACE ...
 PROC:            0           1  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            1           0 HAS  MLOC, NLOC =       47200       47223
 PROC:            1           0  ALLOCATING SPACE ...
 PROC:            1           1 HAS  MLOC, NLOC =       47200       47200
 PROC:            1           1  ALLOCATING SPACE ...
 PROC:            1           0  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            1           1  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            0           1
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            0           0
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            1           1
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            1           0
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..

forrtl: 致命的なエラー (154): 配列インデックスが境界外です。
Image              PC                Routine            Line        Source             
libifcore.so.5     00002B0D716C19AF  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002B0D712335D0  Unknown               Unknown  Unknown
libmkl_avx512.so   00002B11A45E5A47  mkl_blas_avx512_x     Unknown  Unknown
libmkl_intel_lp64  00002B0D68E8BB55  dger_                 Unknown  Unknown
libmkl_scalapack_  00002B0D69F972AE  pdger_                Unknown  Unknown
libmkl_scalapack_  00002B0D69E53541  pdgetf3_              Unknown  Unknown
libmkl_scalapack_  00002B0D69E53688  pdgetf3_              Unknown  Unknown
libmkl_scalapack_  00002B0D69C2E13B  pdgetf2_              Unknown  Unknown
libmkl_scalapack_  00002B0D69C2E836  pdgetrf2_             Unknown  Unknown
libmkl_scalapack_  00002B0D6A014F6E  pdgetrf_              Unknown  Unknown
libmkl_scalapack_  00002B0D69C29C7D  pdgesv_               Unknown  Unknown
para.exe           0000000000401F8C  Unknown               Unknown  Unknown
para.exe           00000000004011BE  Unknown               Unknown  Unknown
libc-2.17.so       00002B0D73DFC3D5  __libc_start_main     Unknown  Unknown
para.exe           00000000004010C9  Unknown               Unknown  Unknown

the first error line beginning forrtl: can be translated as

forrtl: Fatal error (154): Array index out of bounds.

The problem seems to be ocurring somewhere in the scalapack routines.

Does anyone have any recommendations / possible solutions ?

 Any advice or pointers will be gratefully received,

     Many thanks,

             Dan.

 

 

Packed versus compact versus normal routines versus jit

$
0
0

Hi

I need to do a tons of

C = A*B^T

Where all matrices n times n big. n is typical 256 but could be smaller or bigger. Both A and B are used multiple times. Moreover the C is used in later multiplications i.e. C replace A or B. 

NOTE I am only interested in the sequential case. I do not want MKL to parallelize anything.

It seems the matrices are too large for the compact type. In any case compact seems to be for multiple matrices i.e. liked batched.

In the packed type the C will not be packed so I have to pack it. 

There is also the mkl_jit_create* routines.

Now my question is what I should go for among the possible matrix multiplication methods?

PS: An interesting alternative is to use BLASFEO(https://github.com/giaf/blasfeo) which course you cannot say anything about but give an idea about my use case.

 

 

 

 

 

 

 

 

 

 

 

 

Is PARDISO for cluster available in composer edition?

$
0
0

Can i use the "Parallel Direct Sparse Solver for Clusters" in "Intel Parallel Studio XE Composer Edition" ?

If not, what version is available?

2017 version of MKL?

$
0
0

Is MKL 2017 is still available?

I have a Phi 3120a and would like to try the automatic offload feature - which was removed in the MKL 2018 release.


dfeast_scsrgv and zfeast_hcsrgv Segmentation fault Error

$
0
0

Hello,

I am trying solve a generalized eigenvalues problem Ax = λBx by means of dfeast_scsrgv, but, I get SIGSEV error, I test the individual matrices A and B with dfeast_scsrev and all works fine, so I think that is not a problem of data representation.

Attached goes a code example for the problem described.

 

pd: The hermitian version zfeast_hcsrgv has the same issue. Sorry for the english.

 

AttachmentSize
Downloadtext/x-c++srceigenvalues.cpp6.08 KB

Is it available MKL 2017 for free?

ScaLAPACK crash using different block sizes

$
0
0

Hi,

I would like to use pzgesv routine to solve system of linear equations but it crashes if use block size 64 and e.g. with two processes  

it doesn't crash if i run the program with mpirun -np 1 ./myapp, or the block size is 4 and any number of processes. 

number of rows = 116, nrhs = 4; openmpi, mpicxx -v: gcc version 7.4.0 

here is the backtrace:

from the gdb

process 1

Thread 1 "myapp" received signal SIGSEGV, Segmentation fault.
0x00007ffff67483b2 in PMPI_Comm_size ()
   from /usr/lib/x86_64-linux-gnu/libmpi.so.20
(gdb) bt 
#0  0x00007ffff67483b2 in PMPI_Comm_size ()
   from /usr/lib/x86_64-linux-gnu/libmpi.so.20
#1  0x000055555897967a in MKLMPI_Comm_size ()
#2  0x000055555568884c in PB_CpgemmMPI ()
#3  0x000055555564e916 in pzgemm_ ()
#4  0x00005555556355b0 in pzgetrf2_ ()
#5  0x0000555555634aaf in pzgetrf_ ()
#6  0x000055555562c60d in pzgesv_ ()
#7  0x00005555555ed944 in main (argc=1, argv=0x7fffffffd6e8)
    at Main.cpp:159

process 0

Thread 1 "myapp" received signal SIGSEGV, Segmentation fault.
0x00007ffff67483b2 in PMPI_Comm_size ()
   from /usr/lib/x86_64-linux-gnu/libmpi.so.20
(gdb) bt
#0  0x00007ffff67483b2 in PMPI_Comm_size ()
   from /usr/lib/x86_64-linux-gnu/libmpi.so.20
#1  0x000055555897967a in MKLMPI_Comm_size ()
#2  0x000055555568884c in PB_CpgemmMPI ()
#3  0x000055555564e916 in pzgemm_ ()
#4  0x00005555556355b0 in pzgetrf2_ ()
#5  0x0000555555634aaf in pzgetrf_ ()
#6  0x000055555562c60d in pzgesv_ ()
#7  0x00005555555ed944 in main (argc=1, argv=0x7fffffffd6e8)
    at Main.cpp:159

the descriptors and data looks OK.

any idea what is going on?

Regards,

sk
 

 

 

SVD weird perfomance issues

$
0
0

Hi,

I am facing performance issues with the function dgesvd when running in 64bit with AVX2 (MKL_CBWR=AVX2)

For some sizes of matrix the SVD duration is 25 times longer in 64bit than in 32bit !

You may reproduce with the test in attachment. On my side I get thoses durations for 1 svd on an mXn matrix:

  • 101x63 : 32bit = 2ms, 64bit = 1.4ms; 
  • 101x64 : 32bit = 2ms, 64bit = 20ms;
  • 102x64 : 32bit = 2ms, 64bit = 1.4ms;
  • 103x103 : 32bit = 4ms, 64bit = 100ms;

There is no problem with MKL_CBWR=AVX.

Could you please have a look ?

My configuration:

  • Composer 2019 update 4 (same behaviour with 2018 up4)
  • BasePlatformToolSet : vc12
  • Win 10 Enterprise 64bit
  • CPU: i7-6820HQ

Regards,

Guillaume A.

 

AttachmentSize
Downloadtext/x-csrcsvdPerfTest.c3.05 KB

Understanding arguments of mkl_sparse_?_qr

$
0
0

Hello there

So I am trying to solve a sparse linear least squares min||Ax - b|| where the matrix A is sparse.

The  MKL 2019 introduced QR solver with the documentation available at

https://software.intel.com/en-us/mkl-developer-reference-c-mkl-sparse-qr

Now I cannot utilize such function and my guess is that I have not fully understood the parameters specially ldx & ldb since once I call the function nothing happens or the program crashes!

Specifically assuming

  • Matrix A is is "m x n"& specified in CSR (since only CSR is supported at the moment)
  • b is an aligned array (so 1 column only for b) and the length is m
  • The solution array x is aligned (allocated) & has a length of "n"

I call (apologies for pseudo code!)

 

success_solve = mkl_sparse_s_qr(

operation = SPARSE_OPERATION_NON_TRANSPOSE,

A = CSR description,

descr = SPARSE_MATRIX_TYPE_GENERAL,

layout = SPARSE_LAYOUT_ROW_MAJOR,

columns = 1,

x,

ldx = 1, //I have tried with both 0 & 1 and I failed at both

b,

ldb = 1 ); //again tried with both 0 & 1 and failed at both

 

Appreciate any help && cheers

Viewing all 2652 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>