Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

Symmetric sparse matrix - dense matrix multiplication

$
0
0

I need to multiply a symmetric sparse matrix A with a dense matrix X (Y = A*X) using multi-thread/core. The matrices I'm using are the adjacency matrix of graphs, with large number of nodes (up to 2 million nodes).

I have tried two approaches:

  1. mkl_dcsrmm() with matdescra[0] set to 's'.
  2. mkl_dcsrsymv() in a for-loop, looping over the column vectors of X. Below is the code I used.
#pragma omp parallel for schedule(static)
for(int i=0; i<n; i++) {
  mkl_dcsrsymv(&matdescra[1], &m, values, rowIndex, columns, X[i], Y[i]);
}

Initially, I thought that the first option (Sparse BLAS level 3) should be faster than the second one. But, I'm getting the opposite timing results.

Below is an example of a symmetric sparse matrix A with about 1.7M rows/columns and 42M non-zero entries and a dense matrix X with the same number of rows and 100 columns. Running on number of threads set to 2, 4, and 8, respectively.

  • option1: 19.17sec, 9.38sec, 5.20sec
  • option2: 13.26sec, 6.83sec, 3.84sec

Is there any particular reason for this or am I missing something? Because, it seems that mkl_dcsrmm() should be doing things more efficiently than my for-loop.

 

I compiled the code with the following command; icpc -mkl=parallel -I$(MKLROOT)/include -O3 -openmp -o test test.cpp -L$(MKLROOT)/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lm

 


Viewing all articles
Browse latest Browse all 2652

Trending Articles