Hi all,
I am using the sparse matrix-vector multiplication operation in the MKL library.
I started with a CSR representation (the classical three arrays of the CSR format) and use the mkl_sparse_d_create_csr() function to create a "sparse_matrix_t" handle. Then I ran the mkl_sparse_optimize () function using the handle, and finally the mkl_sparse_d_mv() function for the desired operation.
It works. So far so good. The answers I am getting are correct.
I am able to manipulate the number of threads used in the solution by setting the environmental variable "OMP_NUM_THREADS". This also work as expected.
My question is:
How the sparse matrix is distributed among the treads?
is the distribution based on a similar number of rows per thread?
or
is it based on a similar number of non-zeros per thread?
or something else?
One more question: Can the user manipulate the distribution?
Thanks