Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all 2652 articles
Browse latest View live

Computing the Schur-complement with MKL_PARDISO

$
0
0

Hello,

we are currently trying to integrate mkl_pardiso into our software and are facing some questions regarding mkl_pardiso and the computation of the Schur-complement.
Given a real symmetric matrix we wanted to compute the Schur-complement of a certain size and afterwards use the partial factorization from that computation to solve a part of the original linear equation system. We are using sparse matrices and thus also iparm[35] = -2/-1.

First of all we started out by modifying the supplied pardiso_schur_c.c example and used the pseudocode found here as an orientation. You can find the used code as an attachment. The compile command can be found as a comment in the upper part of the c-File.

First thing was, that when setting

    int iparm35 = -1;
    
    iparm[1-1] = 1;         /* No solver default */
    iparm[2-1] = 2;         /* Fill-in reordering from METIS */
    iparm[5-1] = 0;
    iparm[10-1] = 8;        /* Perturb the pivot elements with 1E-13 */
    iparm[11-1] = 0;        /* Use nonsymmetric permutation and scaling MPS */
    iparm[13-1] = 0;        /* Maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm[12] = 1 in case of inappropriate accuracy */
    iparm[14-1] = 0;        /* Output: Number of perturbed pivots */
    iparm[18-1] = -1;       /* Output: Number of nonzeros in the factor LU */
    iparm[19-1] = -1;       /* Output: Mflops for LU factorization */
    iparm[24 - 1] = 10;
    iparm[31 - 1] = 0;
    iparm[36 - 1] = iparm35;        /* Use Schur complement */

we ran into a segmentation fault thrown by the pardiso_export function. We do not know why this happens, but setting iparm[23] = 1 instead of 10 resolved the issue. The documentation on iparm[23] simply says that it cannot be 0 when setting iparm[35] to either -1 or -2. Did we do something wrong here?
Another thing we found is that when setting iparm[35] = -2 in the above settings (so with iparm[23] = 10) we don't get the number of non-zeros as output in iparm[35] but instead -2 again (even though error == 0).

After 'resolving' the above issue by setting iparm[23] = 1 we were able to compute the Schur-complement and get it returned in sparse format (that was all correct, even though it was quite surprising to us, that pardiso, despite getting all matrices supplied in 1-based indexing, returned the Schur-complement with zero based indexing - is there  a parameter to control this ?).
Still we were not quite sure on how to use the parameter perm. If we set perm to something like

perm = {1, 0, 0, 0, 1} 

what exactly does "perm specifies elements for a Schur complement" in the documentation mean? Will pardiso perform a Schur-complement computation equivalent to one where we specify perm as

perm = {0, 0, 0, 1, 1} 

but swap row 1 and 4? If yes, will pardiso always "stable" sort the rows?

A last question we have that is somewhat more general:

Given a linear system

[A11 A12] [x1]     [b1]
[A21 A22] [x2] = [b2]

we want to do two things:

a) compute the Schur-complement S = A22 - A21 A11^-1 A12

b) solve the linear system A11 x1 = b1

We assumed that, similar to pardiso from the pardiso-project, we could do that by computing the Schur-complement with iparm[35] = -2 (so that the factorization is kept for solving phase) and afterwards use that partial factorization in pardiso and phase=33 to compute x1.
From our tests we found that pardiso instead solves the complete system for x1 and x2. Is that the expected behaviour?
If so, is there a way to efficiently only solve A11 x1 = b1?

 

With best regards,

Nils

AttachmentSize
Downloadtext/x-csrcpardiso_schur_c.c9.88 KB

Floating Point Exception in MKL FFT from 18.0.4 onwards.

$
0
0

A floating point overflow is raised in the code below giving the following backtrace.

Program received signal SIGFPE, Arithmetic exception.
0x0000000000d4a1e6 in mkl_dft_avx2_coDFTColTwid_Compact_Fwd_v_10_s ()
(gdb) backtrace
#0  0x0000000000d4a1e6 in mkl_dft_avx2_coDFTColTwid_Compact_Fwd_v_10_s ()
#1  0x00000000005e6e0d in compute_colbatch_fwd ()
#2  0x00000000004057dc in MAIN__ ()

 

The same code runs fine with a previous version of mkl (11.1.1) or if the CNR mode is set to SSE4_2.

Seems something specific to the avx2 code path.

 

program mkl_test

   USE MKL_DFTI

  include  'mkl.fi'

  integer, parameter :: len_i = 1025
  integer, parameter :: len_j = 1920
  complex :: values_in(len_i * len_j)
  complex :: values_out(len_i * len_j)
  real :: temp_r, temp_i
  integer :: ieee_flags
  character*16 :: out
  integer :: i, j, unit,  status
  integer stride_in(2)
  integer stride_out(2)

  type(dfti_descriptor), pointer :: My_Desc1_Handle
!---------------------------------------------------------------------------------------------------  
  values_out(:) = cmplx(0,0)

  print*, "Started and reading in data..."
  open(unit, file='data2_CFFT.txt')
  do j=1, len_j
    do i=1, len_i
      read(unit, '(2f15.8)') temp_r, temp_i
      values_in((j-1) * len_i + i) = cmplx(temp_r,temp_i)
    enddo
  enddo
  close(unit)
  print*, "Done reading data"

!  status = mkl_cbwr_set(MKL_CBWR_SSE4_2)
!  if(status .ne. MKL_CBWR_SUCCESS ) then
!     print *, 'unable to set the mkl environment'

!  endif

  i = ieee_flags('set', 'exception', 'overflow', out)

  stride_in(0)=0;
  stride_in(1)=1025;
  stride_out(0)=0;
  stride_out(1)=1025;

  status = DftiCreateDescriptor(My_Desc1_Handle,DFTI_SINGLE,DFTI_COMPLEX,1,1920)
  status = DftiSetValue(My_Desc1_Handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
  status = DftiSetValue(My_Desc1_Handle, DFTI_NUMBER_OF_TRANSFORMS, 1025);
  status = DftiSetValue(My_Desc1_Handle, DFTI_INPUT_DISTANCE, 1);
  status = DftiSetValue(My_Desc1_Handle, DFTI_OUTPUT_DISTANCE, 1);
  status = DftiSetValue(My_Desc1_Handle, DFTI_INPUT_STRIDES, stride_in);
  status = DftiSetValue(My_Desc1_Handle, DFTI_OUTPUT_STRIDES, stride_out);
  status = DftiCommitDescriptor(My_Desc1_Handle);

  status = DftiComputeForward( My_Desc1_Handle, values_in, values_out )

  print*, "Finished successfully."

end program mkl_test

Compile as follows:

$INTEL_HOME/ifort -I$MKL_HOME/include/ cpbtrs.f90 -Wl,--start-group -Wl,-Bstatic -L$MKL_HOME_LIB/lib -lmkl_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_lapack95_lp64 -liomp5 -Wl,--end-group

The input file data2_CFFT.txt that is read and passed to the FFT funtion is attached. Input all looks normal.

MKL 2019 u2 seems to have the same issue.

I am using linux debian 9 on a Intel(R) Xeon(R) CPU E3-1240 v3. Can someone please have a look.

AttachmentSize
Downloadapplication/zipdata2_CFFT.zip5.24 MB

Installation fails

$
0
0

I'm on a GCP Ubuntu 14.04 VM and I can't get apt-get to build the mkl. The steps I took, following the instructions here:

> sudo sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list'

> sudo sh -c 'echo deb https://apt.repos.intel.com/mpi all main > /etc/apt/sources.list.d/intel-mpi.list'

> sudo apt-get update

Error message:

W: Conflicting distribution: https://apt.repos.intel.com binary/ InRelease (expected binary but got )W: Duplicate sources.list entry https://apt.repos.intel.com/mkl/ all/main amd64 Packages (/var/lib/apt/lists/apt.repos.intel.com_mkl_dists_all_main_binary-amd64_Packages)W: Duplicate sources.list entry https://apt.repos.intel.com/mpi/ all/main amd64 Packages (/var/lib/apt/lists/apt.repos.intel.com_mpi_dists_all_main_binary-amd64_Packages)

Does anyone have instructions for how to build properly?

Segmentation fault in PARDISO phase 33 when changing parameters in iparm

$
0
0

Hello,

we are currently integrating PARDISO into our software and stumbled upon some (from our point of view) weird behaviour.

First the setting:

Given a sparse matrix of type -2 we start with phase 12, so reordering and symbolic factorization. After that we wanted to use phase 33 with a sparse right hand side and thus set iparm[30] = 1.

In the documentation it says that for using iparm[23] = 1 (what we wanted to do), we should

Disable iparm[10] (scaling) and iparm[12] = 1 (matching) when using the two-level factorization algorithm. Otherwise Intel MKL PARDISO uses the classic factorization algorithm.

First of all, we are not quite sure what that is supposed to mean: Does it say set iparm[10] and iparm[12] both to zero? Or does it say set iparm[10] = 0 and iparm[12] = 1?
Ok, so, nevertheless, we set both to one and in addition set iparm[23] = 1, because we didn't quite read the hint at that time and anyway, we expected that this would lead to a behaviour similar to iparm[23] = 0.
The reason for setting iparm[10] and iparm[12] in the first place was that, indeed, our matrices come from a interior point method.

Now the weird part.
During phase 12 we did not set iparm[30] (since pardisoinit did not and we did not hand over a right hand side at that point - also, the documentation says iparm[30]

controls the solve step of Intel MKL PARDISO.

For phase 33 we then set iparm[30] = 1 and got a segfault back from pardiso.
Now, we did find two ways to get rid of the segfault neither of which seem to make any sense:
a) when setting iparm[30] = 1 for phase 12 and 33 the segfault disappears
b) setting iparm[23] = 0 or iparm[23] = 10 during solution phase lets the segfault disappear too

Frankly we are not quite sure what is going on here, I assume we misunderstood something.

Another thing we saw happening: during phase 12 PARDISO sets paramter iparm[33] = -1. Iparm[33] is described as an input parameter and thus should not be set by PARDISO? Or does this indicate some kind of error?

 

We hope you can help us.

With best regards,

Nils

 

PS:
I attached the example we were using, it is a modified version of the pardiso_sym_c.c example. Compiler command is on the top of the file in some comment. You can also find the complete setting of iparm we were using during phase 12 and the on we were using during phase 33 (iparm2).

AttachmentSize
Downloadtext/x-csrcpardiso_sym_c.c79.6 KB

Pardiso low-rank update question

$
0
0

Hi,

I recently realized this functionality is available with pardiso - great!

I am testing it now and I have some questions. I have read the instructions in https://software.intel.com/en-us/mkl-developer-reference-c-intel-mkl-par....

In particular, I am using matching as I am solving a highly indefinite symmetric system: iparm[12]=1. With iparm[12]=1 the instructions for iparm say A must be filled with relevant values during phase 11. This makes me wonder whether iparm[12]=1 is compatible with the low-rank update functionality (i.e. it suggests phase 11 must be run whenever A is updated as matching is enabled). 

I also note that the improved two-level factorization algorithm must be used with the low rank update (iparm[23]=10). But it is unclear to me whether matching works together with (improved) two-level factorization.

Could you please clarify?

Best,

Jens

Parallel pardiso solve step

Bug in dsyev row-major

$
0
0

Hello, 

When I compile and run the MKL examples `lapacke_dsyev_row.c` I get the following output: 

LAPACKE_dsyev (row-major, high-level) Example Program Results

 Eigenvalues
 -11.07  -6.23   0.86   8.87  16.09

 Eigenvectors (stored columnwise)
  -0.30  -0.61   0.40  -0.37   0.49
   0.00  -0.29  -0.41  -0.36  -0.61
   0.00   0.00  -0.66   0.50   0.40
   0.00   0.00   0.00   0.62  -0.46
   0.00   0.00   0.00   0.00   0.16

The lower part of the matrix is missing. 

This looks like a bug, is it the right place to report it ? 

It seems that the row  col major version is working. 

This happens with the version "2019.3.199" on both linux and windows.

Sincerely, 

Marc Lasson. 

 

running "cl_solver_unsym_c.c"

$
0
0

Hi. 

I need to run the code in examples/cluster_sparse_solverc/source of the Intel installation directory with different input data. 
Just as in the example, I am using CSR sparse format. Arrays a, ja and ia are read from files that I generated, and they are as follows:

a : double array of 52 nonzero elements. 

ja : 1     2     3    12     1     2     3     9     1     2     3     4     5     4     5  6    12     4     5     6     8     2     4     5     6    11     7     8     9     1     7     8     9     2     7     8     9    11    12     5     6    10    11    12     1   10    11    12     8    10    11    12

ja[i] = column index of a[i], counting from 1. 

ia :  1     5     9    14    18    22    27    30    34    40    45    49    53

assuming that the full matrix is 12x12, ia[i] = pointer to the first element of the i-th row. ia[12] = 53, that is the number of elements of a and ja + 1. 

when I execute it, I get the following message that appear in phase 22:

 

*** Error in PARDISO  (incorrect input matrix  ) error_num= 21

*** Input check: i=12, ia(i)=49, ia(i+1)=53 are incompatible

ERROR during symbolic factorization: -1

 

I don't see where is the mistake here. Why are those value of ia incompatible? Apart from reading these files, the only thing I changed was n, that is now 12.
Thank you in advance.

 


eigen value and vector for N=200x200

$
0
0

How do I find the eigenvalues ​​and eigenvectors of a matrix in a fortranda 200 * 200

[HPCG] "QuickPath" option always selected

$
0
0

Hello everyone,

 

A brief summary of the issue:

  • The Intel MKL and Intel MPI libraries are installed on the cluster I am using (its compute nodes embed 2 sockets (E5-2620 v4));
  • The AVX2 pre-built binary of HPCG delivered with the Intel MKL (xhpcg_avx2) executes successfully, both on a single node and on multiple nodes;
  • I tried to set a target execution time by using either the --rt=180 command line option, or the hpcg.dat configuration file (which is in the same directory as the binary), and by using both at the same time. However, the setting of the target execution time simply seems to be ignored, since the QuickPath option is always used (confirmed by the YAML output file);
  • However, setting the size of the local 3D compute grid works perfectly, thanks to both the command-line options and the configuration file.

 

I could not find any information related to such an issue (however, there is a closed issue in the GitHub repository of HPCG concerning the fact that if the --rt command-line option was not specified, the QuickPath option was enforced).

 

Any idea/workaround? If you need some more information, or if you want me to test something, just ask.

 

Thank you in advance for your help,

--Mathieu.

Problems with mkl_sparse_convert_csr

$
0
0

I'm trying to use the Intel MKL Inspector/Executor Sparse BLAS library and I've been struggling with faulty memory use in the `mkl_sparse_convert_csr` subroutine. The simple program below can reproduce my problem:

program debug
use mkl_spblas
use omp_lib
use, intrinsic :: iso_c_binding, only: c_int, c_double
implicit none
integer, parameter :: DIM = 10000
integer :: stat, i
integer(kind = c_int), dimension(DIM) :: irn, jcn
real(kind = c_double), dimension(DIM) :: val
type(sparse_matrix_t) :: mat1, mat2

do i = 1, DIM
  irn(i) = i
  jcn(i) = i
  val(i) = 1.0d0
end do

call omp_set_num_threads(1)
stat = mkl_sparse_d_create_coo (A = mat1, indexing = SPARSE_INDEX_BASE_ONE, &
  rows = DIM, cols = DIM, nnz = DIM, row_indx = irn, col_indx = jcn, values = val)
if (stat /= 0) stop 'Error in mkl_sparse_d_create_coo'

stat = mkl_sparse_convert_csr (source = mat1, &
  operation = SPARSE_OPERATION_NON_TRANSPOSE, dest = mat2)
if (stat /= 0) stop 'Error in mkl_sparse_convert_csr'

stat = mkl_sparse_destroy (A = mat1)
if (stat /= 0) stop 'Error in mkl_sparse_destroy (mat1)'

stat = mkl_sparse_destroy (A = mat2)
if (stat /= 0) stop 'Error in mkl_sparse_destroy (mat2)'

call mkl_free_buffers
end program debug

When I check with Valgrind I get the following report of memory leaks:

==27267== Memcheck, a memory error detector
==27267== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==27267== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==27267== Command: ../bin/LINKS_debug
==27267== 
==27267== 
==27267== HEAP SUMMARY:
==27267==     in use at exit: 495 bytes in 6 blocks
==27267==   total heap usage: 47 allocs, 41 frees, 463,031 bytes allocated
==27267== 
==27267== 8 bytes in 1 blocks are still reachable in loss record 1 of 6
==27267==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27267==    by 0x504CA98: gomp_malloc (alloc.c:37)
==27267==    by 0x505BA56: gomp_init_num_threads (proc.c:91)
==27267==    by 0x504B06A: initialize_env (env.c:1244)
==27267==    by 0x4010732: call_init (dl-init.c:72)
==27267==    by 0x4010732: _dl_init (dl-init.c:119)
==27267==    by 0x40010C9: ??? (in /lib/x86_64-linux-gnu/ld-2.27.so)
==27267== 
==27267== 8 bytes in 1 blocks are still reachable in loss record 2 of 6
==27267==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27267==    by 0x152F22: mkl_serv_malloc (in /home/rcarvalho/repos/debug/bin/LINKS_debug)
==27267==    by 0x1261B4: mkl_sparse_d_create_coo_i4_avx2 (in /home/rcarvalho/repos/debug/bin/LINKS_debug)
==27267==    by 0x112AF8: MAIN__ (main.f90:49)
==27267==    by 0x112C07: main (main.f90:31)
==27267== 
==27267== 32 bytes in 1 blocks are still reachable in loss record 3 of 6
==27267==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27267==    by 0x590C7E4: _dlerror_run (dlerror.c:140)
==27267==    by 0x590C050: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==27267==    by 0x150F32: mkl_serv_inspector_suppress (in /home/rcarvalho/repos/debug/bin/LINKS_debug)
==27267==    by 0x150E8C: mkl_serv_lock (in /home/rcarvalho/repos/debug/bin/LINKS_debug)
==27267==    by 0x14EFA1: mkl_serv_cpu_detect (in /home/rcarvalho/repos/debug/bin/LINKS_debug)
==27267==    by 0x112EC4: mkl_sparse_d_create_coo_i4 (in /home/rcarvalho/repos/debug/bin/LINKS_debug)
==27267==    by 0x112AF8: MAIN__ (main.f90:49)
==27267==    by 0x112C07: main (main.f90:31)
==27267== 
==27267== 47 bytes in 1 blocks are still reachable in loss record 4 of 6
==27267==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27267==    by 0x4017880: _dl_exception_create (dl-exception.c:77)
==27267==    by 0x6996250: _dl_signal_error (dl-error-skeleton.c:117)
==27267==    by 0x4009812: _dl_map_object (dl-load.c:2384)
==27267==    by 0x4014EE3: dl_open_worker (dl-open.c:235)
==27267==    by 0x69962DE: _dl_catch_exception (dl-error-skeleton.c:196)
==27267==    by 0x40147C9: _dl_open (dl-open.c:605)
==27267==    by 0x590BF95: dlopen_doit (dlopen.c:66)
==27267==    by 0x69962DE: _dl_catch_exception (dl-error-skeleton.c:196)
==27267==    by 0x699636E: _dl_catch_error (dl-error-skeleton.c:215)
==27267==    by 0x590C734: _dlerror_run (dlerror.c:162)
==27267==    by 0x590C050: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==27267== 
==27267== 192 bytes in 1 blocks are still reachable in loss record 5 of 6
==27267==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27267==    by 0x504CA98: gomp_malloc (alloc.c:37)
==27267==    by 0x5059B65: gomp_get_thread_pool (pool.h:42)
==27267==    by 0x5059B65: get_last_team (team.c:146)
==27267==    by 0x5059B65: gomp_new_team (team.c:165)
==27267==    by 0x5050DDB: GOMP_parallel_start (parallel.c:126)
==27267==    by 0x17D0A4: mkl_sparse_d_coo_csr_new_omp_i4 (in /home/rcarvalho/repos/debug/bin/LINKS_debug)
==27267==    by 0x17D4A7: mkl_sparse_d_convert_coo_to_csr_i4 (in /home/rcarvalho/repos/debug/bin/LINKS_debug)
==27267==    by 0x17D554: mkl_sparse_d_export_csr_data_i4 (in /home/rcarvalho/repos/debug/bin/LINKS_debug)
==27267==    by 0x126E68: mkl_sparse_d_convert_csr_i4_avx2 (in /home/rcarvalho/repos/debug/bin/LINKS_debug)
==27267==    by 0x112B38: MAIN__ (main.f90:52)
==27267==    by 0x112C07: main (main.f90:31)
==27267== 
==27267== 208 bytes in 1 blocks are still reachable in loss record 6 of 6
==27267==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27267==    by 0x504CA98: gomp_malloc (alloc.c:37)
==27267==    by 0x505AFFA: gomp_new_icv (team.c:968)
==27267==    by 0x504CF24: omp_set_num_threads (libgomp.h:681)
==27267==    by 0x112AB3: MAIN__ (main.f90:47)
==27267==    by 0x112C07: main (main.f90:31)
==27267== 
==27267== LEAK SUMMARY:
==27267==    definitely lost: 0 bytes in 0 blocks
==27267==    indirectly lost: 0 bytes in 0 blocks
==27267==      possibly lost: 0 bytes in 0 blocks
==27267==    still reachable: 495 bytes in 6 blocks
==27267==         suppressed: 0 bytes in 0 blocks
==27267== 
==27267== For counts of detected and suppressed errors, rerun with: -v
==27267== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

It seems that this kind of problem has been also reported before and, as suggested in https://stackoverflow.com/questions/37395541/mkl-sparse-blas-segfault-wh..., I'm already setting the number of threads to 1 and also using the `call mkl_free_buffers` subroutine. However, the problem is still there and, in a bigger project I have, this memory leak leads leads to a program crash due to invalid writes. Any idea on how to solve this?

core dumped when using DftiCreateDescriptor(desc,prec,domain,dim,sizes) with dim=2 and sizes more than {1000, 1000}

$
0
0

I'm trying to perform fft on linux system(specifically cent OS 7) by Intel MKL. After writing a successfully running code sample on windows i moved it to linux and met with segmentation fault(core dumped). I carefully checked the code and found that it is the sizes parameter specified in DftiCreateDescriptor(desc,prec,domain,dim,sizes) causes this bug. Once any number in sizes is larger than 1000 with dims=2 will cause this bug. I shifted different versions of MKl but it remains.

Does anyone have any idea about this bug?

the compile arg: g++ comparison.cpp `pkg-config opencv --cflags --libs` -I/opt/intel/vtune/compilers_and_libraries_2018.3.222/linux/mkl/include/  -L/opt/intel/compilers_and_libraries_2019/linux/mkl/lib/intel64/ -lmkl_rt -g

Here is my code

#include <opencv2/core/core.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <iostream>
#include <omp.h>

#include "mkl_dfti.h"


int main() {
	MKL_LONG len[2] = { 1080, 1920 }, status;
	float x_in[1080][1920];
	DFTI_DESCRIPTOR_HANDLE fft;
	status = DftiCreateDescriptor(&fft, DFTI_SINGLE, DFTI_REAL, 2, len);
	status = DftiSetValue(fft, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
	status = DftiCommitDescriptor(fft);

	//float x[100* 100];
	float x_out[1080][1920];
	for (int i = 0; i < 10; i++) {
		double totalcputime = (double)cv::getTickCount();
		//std::cout << status << std::endl;
		status = DftiComputeForward(fft, x_in, x_out);
		//std::cout << status << std::endl;
		totalcputime = ((double)cv::getTickCount() - totalcputime) / cv::getTickFrequency();
		std::cout << "MKL-DFT Time: "<< totalcputime << std::endl;
	}
	cv::Mat sizedimage = cv::Mat::zeros(1080, 1920, CV_32FC1);
	cv::Mat opencvtransform = cv::Mat(1080, 1920 / 2 + 1, CV_32FC1);
	for (int i = 0; i < 10; i++) {
		double totalcputime = (double)cv::getTickCount();
		cv::dft(sizedimage, opencvtransform);
		totalcputime = ((double)cv::getTickCount() - totalcputime) / cv::getTickFrequency();
		std::cout << "opencv-DFT Time: "<< totalcputime << std::endl;
	}
	
	return 0;
}

 

bibtex for Intel MKL Developer Reference

$
0
0

Very silly question, how do I cite the developer reference for MKL, 2019 version? Thank you in advance.

How to tranfer saved "scoeff" to 'dfdInterpolate1D' to calculate interpolated results?

$
0
0

Dear all,

In curve fitting, suppose I already have "scoeff" from previous data fitting and the data fitting task  is already deleted.

How can I pass the saved "scoeff" to "dfdInterpolate1D" later to calculate interpolated results whenever necessary ?  Can it be simpler and don't need to supply x and y,  and don't need to calculate "scoeff" again?

Thank you very much!

 

 

Tensor multiplication using MKL

$
0
0

Hello,

Is it possible to perform a multidimensional array multiplication using Intel MKL? If it is, could you please provide the name of the function I should use or a simple example? 

Thank you


Zero Matrix CSR Format

$
0
0

Hi,

in order to implement a generic algorithm, I need to define the zero matrix in CSR format. Is it possible?

I have defined it with null arrays for elements, rows, etc.... but when I try to call the function mkl_sparse_d_add it crashes with  SPARSE_STATUS_NOT_INITIALIZED error.

How can I define the zero matrix in csr format?

Thanks in advance,

Joaquin

CLUSTER_SPARSE_SOLVER segmentation fault

$
0
0

Hi all!

I am getting forrtl: severe (174): SIGSEGV, segmentation fault occurred during factorization step of CPARDISO. It happens when i try to solve Nonsymmetric system of ~ 8.5 million equations. It usually occurs when factorization is complete at about 80%. Everything is fine if number of equations is ~ 1 million.

My setup phase is

    NRHS = 1
    MAXFCT = 1
    MNUM = 1
    IPARM(1) = 1 ! NO SOLVER DEFAULT
    IPARM(2) = 3 ! FILL-IN REORDERING FROM METIS
    IPARM(4) = 0 ! NO ITERATIVE-DIRECT ALGORITHM
    IPARM(6) = 0 ! =0 SOLUTION ON THE FIRST N COMPONENTS OF X
    IPARM(8) = 2 ! NUMBERS OF ITERATIVE REFINEMENT STEPS
    IPARM(10) = 13 ! PERTURB THE PIVOT ELEMENTS WITH 1E-13
    IPARM(11) = 1 ! USE NONSYMMETRIC PERMUTATION AND SCALING MPS
    IPARM(13) = 1 ! MAXIMUM WEIGHTED MATCHING ALGORITHM IS SWITCHED-ON (DEFAULT FOR NON-SYMMETRIC)
    IPARM(35) = 1 ! ZERO BASE INDEXING
    ERROR = 0 ! INITIALIZE ERROR FLAG
    MSGLVL = 0 ! PRINT STATISTICAL INFORMATION
    MTYPE = 11 ! REAL UNSYMMETRIC

I use

mpiifort for the Intel(R) MPI Library 2018 Update 1 for Linux*
Copyright(C) 2003-2017, Intel Corporation.  All rights reserved.
ifort version 18.0.1

My compilation line is

mpiifort -o mpi -O3 -I${MKLROOT}/include -qopenmp MPI_3D.f90  -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

I tried to run on cluster on 8 nodes.  Each node has 2x Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz and 768Gb of RAM.

I also leave here link to my matrix in CSR3 format. Attention: unzipped files are more than 2 Gb.

https://drive.google.com/file/d/1HOeg-eF0iAlSx533AOi91pfCA6YMFHNd/view?u...

The ordinal 242 could not be located in the dynamic link library mkl_intel_thread.dll

$
0
0

Hi everyone

I have been doing fortran programs with intel mkl for several years, but this time I faced a runtime error as:

The ordinal 242 could not be located in the dynamic link library C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\redist\intel64\mkl\mkl_intel_thread.dll 

when I was trying to test sparse matrix function MKL_SPARSE_D_MV , and the test code is exactly the one I download from Intel official site. For convenience I paste the code here, and Line 66 is the function of  MKL_SPARSE_D_MV , if I comment this function it runs well.

My environment is listed as follows

Windows 10 x64, Visual Studio 2017 + Intel® Parallel Studio XE 2019 Update 3, and compiled with x64 Release version.

PROGRAM SPMV
!   *****************************************************************************
!   Declaration and initialization of parameters for sparse representation of
!   the matrix A in CSR format:
!   *****************************************************************************
    USE MKL_SPBLAS
    IMPLICIT NONE

    INTEGER M, N, NNZ, i, info
!   *****************************************************************************
!   Sparse representation of the matrix A
!   *****************************************************************************
    INTEGER, ALLOCATABLE :: csrColInd(:), csrRowPtr(:)
    DOUBLE PRECISION, ALLOCATABLE :: csrVal(:)
!   Matrix descriptor
    TYPE(MATRIX_DESCR) descrA     ! Sparse matrix descriptor
!   CSR matrix representation 
    TYPE(SPARSE_MATRIX_T) csrA    ! Structure with sparse matrix
!   *****************************************************************************
!   Declaration of local variables:
!   *****************************************************************************
    DOUBLE PRECISION, ALLOCATABLE :: x(:), y(:)
    DOUBLE PRECISION alpha, beta

    M = 5
    N = 5
    NNZ = 13
    ALLOCATE(csrColInd(NNZ))
    ALLOCATE(csrRowPtr(M+1))
    ALLOCATE(csrVal(NNZ))
    ALLOCATE(x(M))
    ALLOCATE(y(M))
    csrVal = (/ 1.0,-1.0,-3.0,-2.0,5.0,4.0,6.0,4.0,-4.0,2.0,7.0,8.0,-5.0 /)
    csrColInd = (/ 0,1,3,0,1,2,3,4,0,2,3,1,4 /)
    csrRowPtr = (/ 0, 3, 5, 8, 11, 13 /)
    x = (/ 1.0, 5.0, 1.0, 4.0, 1.0 /)
    y = (/ 0.0, 0.0, 0.0, 0.0, 0.0 /)
    alpha = 1.0
    beta  = 0.0

    print*,'EXAMPLE PROGRAM FOR MKL_SPARSE_D_MV'
    print*,'---------------------------------------------------'
    print*,''
    print*,'INPUT DATA FOR MKL_SPARSE_D_MV'
    print*,'WITH GENERAL SPARSE MATRIX'
    print*,'ALPHA =',alpha,'BETA =',beta
    print*,'SPARSE_OPERATION_NON_TRANSPOSE'
    print*,'Input vector'
    do i = 1, M
        print*,x(i)
    enddo

!   Create CSR matrix
    i = MKL_SPARSE_D_CREATE_CSR(csrA,SPARSE_INDEX_BASE_ZERO,M,N,csrRowPtr,csrRowPtr(2),csrColInd,csrVal)

!   Create matrix descriptor
    descrA % TYPE = SPARSE_MATRIX_TYPE_GENERAL

!   Analyze sparse matrix; chose proper kernels and workload balancing strategy
    info = MKL_SPARSE_OPTIMIZE(csrA)

!   Compute y = alpha * A * x + beta * y
!! #############################################################

!! error from here, but when I comment following function, it runs well;
    info = MKL_SPARSE_D_MV(SPARSE_OPERATION_NON_TRANSPOSE,alpha,csrA,descrA,x,beta,y)
!! #############################################################

!   Release internal representation of CSR matrix
    info = MKL_SPARSE_DESTROY(csrA)

    print*,''
    print*,'OUTPUT DATA FOR sparseDcsrmv'
    do i = 1, M
        print*,y(i)
    enddo

    print*,'---------------------------------------------------'

END PROGRAM SPMV

 

mkl_solver.lib is missing

$
0
0

Hi, every body,

In fact, I have a task to review working code on visual studio 2010, c++.

........... I have linking error

I need to have this library: mkl_solver.lib

I searched for it for two days long ... and  no results!!!

Even i succeed to find those files (MKL2018): mkl_core_dll.lib mkl_intel_thread_dll.lib mkl_intel_c_dll.lib

But I still can't find mkl_solver.lib or even (MKL 10.2.6.037) which was mentioned in the project property (additional libraries)

 

Please, help by sending this file: mkl_solver.lib or the MKL 10.2.6.037 location on the web.

Thanks and best regards,

hussein

 

Feast with g++

$
0
0

Hi,

 

I am thinking of using feast to get a small set (around 300 lowest) of eigenvectors of a huge sparse matrix (more than a 10 million basis functions) in my c++ code. As a first step towards that I was trying to make the example code "dexample_sparse_c.c" work for me  and here is the issue that I am facing:

When I am compiling the code with gcc everything is fine. I use the following command to compile it:

 gcc dexample_sparse_c.c -DMKL_ILP64 -m64 -I${MKLROOT}/include -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_cdft_core.a ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a  -Wl,--end-group -lpthread -lm -ldl

 

But when I try to compile it with g++ i am getting an error

dexample_sparse_c.c: In function ‘int main()’:
dexample_sparse_c.c:239:42: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
     printf("FEAST OUTPUT INFO %d \n",info);
                                          ^
dexample_sparse_c.c:248:50: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
     printf("Number of eigenvalues found %d \n", M);
                                                  ^
dexample_sparse_c.c:280:9: error: ‘dgemm’ was not declared in this scope
         );
         ^
dexample_sparse_c.c:292:50: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
     printf("#mode found/subspace %d %d \n", M, M0);
                                                  ^
dexample_sparse_c.c:292:50: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long long int’ [-Wformat=]
dexample_sparse_c.c:293:37: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
     printf("#iterations %d \n", loop);
                                     ^
dexample_sparse_c.c:304:55: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
         printf("   %d  %.15e %.15e \n",i, E[i], res[i]);
                                                       ^
dexample_sparse_c.c:356:43: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
     printf("FEAST OUTPUT INFO %d \n" ,info);
                                           ^
dexample_sparse_c.c:365:50: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
     printf("Number of eigenvalues found %d \n", M);
                                                  ^
dexample_sparse_c.c:418:49: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
     printf("# mode found/subspace %d %d \n",M,M0);
                                                 ^
dexample_sparse_c.c:418:49: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘long long int’ [-Wformat=]
dexample_sparse_c.c:419:37: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
     printf("# iterations %d \n",loop);
                                     ^
dexample_sparse_c.c:431:56: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
         printf("   %d  %.15e %.15e \n", i, E[i], res[i]);

 

I am using the following command to compile with g++, which is exactly the same as above but with g++ instead of gcc:

g++ dexample_sparse_c.c -DMKL_ILP64 -m64 -I${MKLROOT}/include -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_cdft_core.a ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel6
4/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a  -Wl,--end-group -lpthread -lm -ldl

 

Also, as a completely separate question, we have been using Lanczos algorithm uptil now to achieve the same objective (although I didn't write that code, but an earlier postdoc in my group did). If someone has experience/knows about this, what are the advantages of feast over lanczos or vice versa? 

 

 

AttachmentSize
Downloadtext/x-csrcdexample_sparse_c.c18.26 KB
Viewing all 2652 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>