GESV and PARDISO give different solutions

June 30, 2018, 2:52 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL DOCKER

≪ Previous: Problems with environment variables and using code examples

Hello!

I use

Windows 10

Intel(R) Core(TM) i5-3320M

Intel(R) Visual Fortran Compiler Professional Edition 11.1.072 Update 9 for Windows*

Intel(R) Math Kernel Library 10.2 Update 7 for Windows* OS

I was solving a functional equation, involving a repeated solution of a big system of linear equations. GESV does the job reliably well. The system is sparse with a very small proportion of non-zero elements, therefore I wanted to try the sparse solver, expecting it would work much more efficiently. This was my first experience using PARDISO, I cannot be sure that I set up everything correctly. It is indeed much more efficient, but gives an incorrect solution.

I tested the methods on a simpler problem, the coefficient matrix from the MKL manual, and both solvers give the same solution. It is a bit confusing. I did not dig into computational details of PARDISO, I chose default values for iparm.

I attach a zip file with Visual Studio project. To illustrate the case, I give an example of an incorrect solution from PARDISO in the simplest case that I could make (14 equations), the real problem that I was solving had 5,000 equations (but can have much more, depends on the discretization of the domain). In this example, there is a significant difference in the seventh element of solution vectors. The code also has a commented "Alternative simple example", based on the coefficient matrix from the MKL manual, that gives the same solutions from both methods.

Thank you in advance for help.

Attachment	Size
Download Pardiso.zip	1.18 MB

↧

MKL DOCKER

July 3, 2018, 11:59 am

Latest and popular articles on Intel Technologies

≫ Next: Tensorflow-MKL giving Errors on 3D data

≪ Previous: GESV and PARDISO give different solutions

I would like to install mkl in a docker image, specifically the 2018 version 3 update using wget.

My Dockerfile is as follows:

RUN apt update && apt install git make cmake gcc g++

# Install MKL
RUN cd /tmp && \
# Download MKL install package
wget -q http://registrationcenter-download.intel.com/akdlm/irc_nas/8374/l_mkl_20...&& \
# Install MKL
tar -xzf l_mkl_2018.1.163 && cd l_mkl_11.3.1.150 && \
sed -i 's/ACCEPT_EULA=decline/ACCEPT_EULA=accept/g' silent.cfg && \
sed -i 's/ACTIVATION_TYPE=exist_lic/ACTIVATION_TYPE=trial_lic/g' silent.cfg && \
./install.sh -s silent.cfg && \
# Clean up
cd .. && rm -rf *

# Add to path
# ENV PATH ${CUDA_PATH}/bin:${PATH}
# Configure dynamic link
RUN echo "${MKL_PATH}/mkl/lib/intel64">> /etc/ld.so.conf.d/intel.conf && ldconfig && \
echo ". /opt/intel/bin/compilervars.sh intel64">> /etc/bash.bashrc

My question what is the link to the 2018 update 3, the like in wget is the 2018.1 update?

↧

Tensorflow-MKL giving Errors on 3D data

July 5, 2018, 12:03 pm

Latest and popular articles on Intel Technologies

≫ Next: subutilization of processor resources by fgmres

≪ Previous: MKL DOCKER

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545; min-height: 14.0px}
span.s1 {color: #e4af0a}

I am training a model to perform volumetric segmentation (3D data). I am training on CPU (two Xeon E5 v4 2699) due to the size of the input data that will not fit in vram. I am using a anaconda environment with tensorflow-mkl and keras. When I train the model, I get an error:

"tensorflow.python.framework.errors_impl.InvalidArgumentError: Value for attr 'data_format' of "NDHWC" is not in the list of allowed values: "NHWC", "NCHW"

However, on Intel's github it says it works on volumetric segmentation (https://github.com/intel/mkl-dnn). How may I resolve this issue so I can train my 3D-Unet with mkl?

↧

subutilization of processor resources by fgmres

July 5, 2018, 12:21 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL part of code is not Parallelized

≪ Previous: Tensorflow-MKL giving Errors on 3D data

Hi Everyone,

We are developing an application that uses the FGMRES function on the MKL library to solve systems of linear equations as part of Newton iterations. Recently we did a bit of benchmarking and found that, as the number of equations increases, the processor utilization goes down.

We instrumented the code and realized that calls to dfgmres take a progressively larger amount of the total time in the solution operation as the number of equations increases. Basically, we modified the "fgmres_full_fnct_c.c" file provided in the mkl examples directory and computed elapsed timed for different operations such as the calls to fgmres and the time to solve reverse communication callbacks such as RCI_request=1 (matrix-vector product), RCI_request=3 (application of preconditioner), etc. Here are a few numbers:

number of equations = 480k

total solution time = 8.6 s

(rci_request = 1) = 0.7 s

(rci_request = 3) = 2.2 s

calls to dfgmres = 4.9 s

number of equations = 950k

total solution time = 27 s

(rci_request = 1) = 1.8 s

(rci_request = 3) = 5.7 s

calls to dfgmres = 18 s

number of equations = 7,150k

total solution time = 820 s

(rci_request = 1) = 15 s

(rci_request = 3) = 83 s

calls to dfgmres = 700 s

We also took pictures of the resource manager and noted that processor utilization is very low for large periods of time, as low as 4%, despite the fact that mkl correctly sets the maximum number of threads to the number of cores (16) in the system.

Does anybody have an idea of what is happening?

Sincerely,

Gonzalo

PS: We have several, current licenses of Intel Parallel Studio but Intel's support site is not letting me submit this question to priority support because I am not associated with the account that was used to register the product in our office.

↧

MKL part of code is not Parallelized

July 8, 2018, 1:21 am

Latest and popular articles on Intel Technologies

≫ Next: vslsConvExecX performance

≪ Previous: subutilization of processor resources by fgmres

Hi all,

I installed VS 2017 community with parallel_studio_xe_2018_update3_cluster_edition, student.

I use Fortran compiler for my programming. I successfully link MKL with my code and there is no problem in compiling and running. OpenMP part of my code with OpenMP flags is parallelized with no problem. But, my problem is the MKL part of the code which runs in a single thread. I have these options in the "configuration properties" of the project as seen in the attached pictures:

In Capture 3, I added the MKL library and parent compiler library:

C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.3.210\windows\mkl\lib\intel64_win and C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.3.210\windows\compiler\lib\intel64_win

In Capture 4, I have these additional dependencies:

libiomp5md.lib mkl_intel_lp64.lib mkl_blas95_lp64.lib mkl_core.lib mkl_intel_lp64.lib mkl_lapack95_lp64.lib mkl_intel_thread.lib mkl_tbb_thread.lib

I played a lot with these options to see any difference, but it was not successful. I should also mention that I have no problem in MKL parallelization in my Linux machine.

Any help is appreciated.

Attachment	Size
Download Capture.JPG	91.34 KB
Download Capture2.JPG	87.91 KB
Download Capture3.JPG	91.09 KB
Download Capture4.JPG	83.26 KB

↧

vslsConvExecX performance

July 6, 2018, 12:21 pm

Latest and popular articles on Intel Technologies

≫ Next: MKL's FFTW wrappers block FFTW on linux (at least)

≪ Previous: MKL part of code is not Parallelized

Using this function vslsConvExecX verses the IPP function IppFilter,. the performance is 10x slower. Does this seem correct?

↧

MKL's FFTW wrappers block FFTW on linux (at least)

July 7, 2018, 2:25 pm

Latest and popular articles on Intel Technologies

≫ Next: Sparse BLAS CSR Matrix Storage Format

≪ Previous: vslsConvExecX performance

Hi,

This problem popped up downstream in R (https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17443) when it is linked with MKL on Linux (I don't know about Windows/OSX). R is a an extensible program, one may write dlls/shared libraries in C which is linked into R at runtime. Now, here is a problem. MKL contains wrappers for fftw, some of them do not work, for instance multidimensional r2r transforms, or strided r2r transforms (which can easily be utilized to make multidimensional ones). I make a shared library for R which uses these parts of fftw3:

# gcc -shared -o foo.so foo.c -lfftw3

Everything seems fine. I start R, it is linked with MKL to have a nice optimized, parallel BLAS, say -lmkl_rt, and loads symbols from there. When I from R load (dlopen) my foo.so, the fftw-references in there are resolved to MKL which is already loaded in R, and they do not work. There is no easy way for my foo.so to get hold of the real fftw3-routines. A workaround is a little bit cumbersome, I can either link R with parts of MKL static (like -Bstatic,-lmkl_gf_lp64,-Bdynamic -lmkl_gnu_threads ...), or link my foo.so with a static libfftw3.a, or preload R with LD_PRELOAD=libfftw3.so.

My question is, would it be possible to ship MKL's fftw-wrappers separate from the working parts of MKL so that it would be possible to avoid the poor fftw masquerade?

↧

Sparse BLAS CSR Matrix Storage Format

July 9, 2018, 8:59 am

Latest and popular articles on Intel Technologies

≫ Next: how to use libtbb.so in mkl with tensorflow

≪ Previous: MKL's FFTW wrappers block FFTW on linux (at least)

After looking here, I can't understand the compressed row format.

Specifically, I don't understand pointerE (and OK with the rest): it should be the index in the valuesarray that is the last non zero element in each row, if I look on the example they give for zero based indexing, the last non zero element of the first row is -3 and it is the third element in the values array but in zero indexing it should be element #2. The last non zero element of the second row is 5 which is the fifth element in values but in zero indexing it should be #4 so pointerE should be [2,4,7,10,12] but in the example it shows [3,5,8,11,13]

what am I missing here?

Once again I am using zero based indexing so where does this bias come from?

↧

how to use libtbb.so in mkl with tensorflow

July 10, 2018, 2:51 am

Latest and popular articles on Intel Technologies

≫ Next: how to use libtbb.so in mkl with tensorflow

≪ Previous: Sparse BLAS CSR Matrix Storage Format

I compile tensorflow with --config=mkl, And I want to use tbb in mkl, but is's only use libiomp5.so,how to use libtbb.so

ldd tensorflow/bazel-tensorflow/external/mkl/lib/libmklml_intel.so
        linux-vdso.so.1 =>  (0x00007ffefb88d000)
        libiomp5.so => not found
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007ff1c0fe5000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x00007ff1c0ce3000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00007ff1c0922000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff1ca13a000)

↧

how to use libtbb.so in mkl with tensorflow

July 10, 2018, 2:54 am

Latest and popular articles on Intel Technologies

≫ Next: mkl_sparse_z_syprd, fortran

≪ Previous: how to use libtbb.so in mkl with tensorflow

I compile tensorflow with --config=mkl, And I want to use tbb in mkl, but is's only use libiomp5.so,how to use libtbb.so

ldd tensorflow/bazel-tensorflow/external/mkl/lib/libmklml_intel.so
        linux-vdso.so.1 =>  (0x00007ffefb88d000)
        libiomp5.so => not found
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007ff1c0fe5000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x00007ff1c0ce3000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00007ff1c0922000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff1ca13a000)

↧

mkl_sparse_z_syprd, fortran

July 10, 2018, 3:28 am

Latest and popular articles on Intel Technologies

≫ Next: VML function crashes on Windowns 7 if linked through mkl_rt

≪ Previous: how to use libtbb.so in mkl with tensorflow

It seems that mkl_sparse_z_syprd requires real arguments even though they should be complex. Any advice? Thanks!

↧

VML function crashes on Windowns 7 if linked through mkl_rt

July 10, 2018, 12:11 pm

Latest and popular articles on Intel Technologies

≫ Next: mkl, matmult.py in windows w/ mkl_rt.dll

≪ Previous: mkl_sparse_z_syprd, fortran

Hello,

We got random crashes with VML functions, e.g., zsqrt(), if the following conditions are ALL met:

1) on a Windowns 7 machine
2) MKL is linked through single dynamic library mkl_rt.lib
3) The VML function is called within an Intel TBB parallel session, e.g., parallel_for()
4) TBB is using more than one (1) thread

The crash occurs no matter which library mkl_rt specify to use (intel threading, sequential or tbb threading).

A piece of code that can be used to describe what we are doing is attached. In the testing, we wrote a batch file to run the program repeatedly to catch the random crash. Normally we would catch the crash within 100 times of running.

We tried linking directly with mkl_intel_threading_dll.lib, mkl_sequential_dll.lib or mkl_tbb_threading_dll.lib, all are fine with no crash. The problem seems to be with linking through mkl_rt alone.

We use the most updated MKL 2018.3.210

Does anyone run into the same issue? Many thanks for any input/response.

Ling

Attachment	Size
Download TBBwVML1.cpp	1.84 KB

↧

mkl, matmult.py in windows w/ mkl_rt.dll

July 11, 2018, 12:49 pm

Latest and popular articles on Intel Technologies

≫ Next: Solving diffusion type equation using Poisson Solver

≪ Previous: VML function crashes on Windowns 7 if linked through mkl_rt

I am trying to give examples to my students for directly calling mkl from python (using intel/anaconda python 2.7 dist) in a Windows 10 environment.; simple examples first, then moving on to paradiso, etc. (Yes, everything works fine in python with auto linking of mkl for standard scipy and scipy.sparse functions.) The students will need ctypes for specialty routines like paradiso and for their own needed c++ code snippets.

MKL 2018.3 is installed and successfully accessed via c++ from VS2017. So all the various mkl_rt.dll etc are present in the appropriate redist dirs. I am afraid I am not that familiar with ctypes in a windows environment. In python, trying to start by running the matmult.py example posted here: (https://software.intel.com/en-us/articles/using-intel-mkl-in-your-python...) but python chokes on cdll statement. How do I edit the cdll line

from ctypes import *
# Load the share library
mkl = cdll.LoadLibrary("./libmkl_rt.so")

for a Windows environment? So far, none of the following work :

mkl=cdll.LoadLibrary(“.\mkl_rt.dll”)  #with mkl_rt.dll in current directory
mkl=cdll.LoadLibrary(“MKLPATH\mkl_rt.dll”) 
 
mkl=windll.LoadLibrary(“.\mkl_rt.dll”) #with mkl_rt.dll in current directory
mkl=windll.LoadLibrary(“MKLPATH\mkl_rt.dll”)  
mkl=windll.LoadLibrary(“MKLPATH\mkl_rt”)

How do I edit the cdll line for a Windows environment?

OR is there a new version of matmult.py needed specific to Windows installation?

Given that we're using the Intel Python distribution - is there a better way to access things like paradiso than using ctypes + the independent MKL install?

↧

Solving diffusion type equation using Poisson Solver

July 12, 2018, 9:19 am

Latest and popular articles on Intel Technologies

≫ Next: MKL linking with MinGW64, is it still impossible?

≪ Previous: mkl, matmult.py in windows w/ mkl_rt.dll

Hi,

I am trying to solve a 3D diffusion type equation with periodic in X and Y and Neumann boundary condition in Z direction using the MKL Poisson Solver and facing couple of problems.

First of all the BCTYPE if i use 'PPPPNN' and put

bd_az[i + j * (nx+1)]= 0.0
bd_bz[i + j * (nx+1)]= 0.0

in the Z boundary, is it considering the Zero Neumann condition acurately at the boundaries?

and the other question is if i write the diffusion equation in terms of poisson equation then my RHS or the 'f' will be time dependent as du/dt. So, in that case will there be any conflict between the time scheme (u_old , u_new)?

Or is there any other way to use the MKL Poisson solver for the time dependent equations with the above mentioned boundary conditions ?

Please explain.

Thanks,

Swagnik

↧

MKL linking with MinGW64, is it still impossible?

July 12, 2018, 10:41 am

Latest and popular articles on Intel Technologies

≫ Next: MKL_VERBOSE

≪ Previous: Solving diffusion type equation using Poisson Solver

Hi, all.

We can find similar topics on it such as https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/277796

It is from 2012, but how is the situation now in 2018? I am trying to replace FFTW3 with MKL due to licensing reasons. The header has:

#include <mkl_dfti.h>
#include <fftw3_mkl.h>

Which can be redundant because fftw3_mkl.h includes most needed stuff. My compilation process is in a Makefile and the MKL-related parts look like this:

MKL_DIR = "c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl"
MKL_INC = -I$(MKL_DIR)/include -I$(MKL_DIR)/include/fftw
MKL_LIBS = $(MKL_DIR)/lib/intel64_win
MKL_ALL = $(MKL_INC) -L$(MKL_LIBS) -lmkl_core -lmkl_intel_lp64 -lm

CXX = g++
CXXFLAGS = -Wall -pthread -mms-bitfields -m64

app:
    $(CXX) $(CXXFLAGS) -o $(BINDIR)/$(EXEC_WIN) $(HELPER_OBJS) $(APP_OBJ) $(MKL_ALL)

The objects will be generated fine, as you probably know if you use MinGW64. But this target, app, which will do the linking, outputs the following error for this configuration, which I borrowed from the Intel online linking guide:

==== BUNCH OF ERROR TEXT ====

c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(_free.obj):(.text[mkl_free]+0x4): undefined reference to `mkl_serv_free'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(_free.obj):(.text[MKL_free]+0x1): undefined reference to `mkl_serv_free'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(_malloc.obj):(.text[mkl_malloc]+0x6): undefined reference to `mkl_serv_malloc'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(_malloc.obj):(.text[MKL_malloc]+0x1): undefined reference to `mkl_serv_malloc'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(fftwf_plan_dft_r2c.obj):(.text[fftwf_plan_dft_r2c]+0x22c): undefined reference to `__security_check_cookie'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(fftwf_plan_dft_r2c.obj):(.xdata+0x10): undefined reference to `__GSHandlerCheck'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(fftwf_plan_guru64_dft_r2c.obj):(.text[fftwf_plan_guru64_dft_r2c]+0x207): undefined reference to `__security_check_cookie'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(fftwf_plan_guru64_dft_r2c.obj):(.xdata+0x14): undefined reference to `__GSHandlerCheck'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dfticommitdescriptor_lp64.obj):(.text[DftiCommitDescriptor]+0x28): undefined reference to `mkl_dft_dfti_verbose'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dfticreatedescriptor_s_1d_lp64.obj):(.text[DftiCreateDescriptor_s_1d]+0x61): undefined reference to `mkl_dft_dfti_create_sc1d'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dfticreatedescriptor_s_1d_lp64.obj):(.text[DftiCreateDescriptor_s_1d]+0x79): undefined reference to `mkl_dft_dfti_create_sr1d'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dfticreatedescriptor_s_1d_lp64.obj):(.text[DftiCreateDescriptor_s_1d]+0x8a): undefined reference to `mkl_dft_bless_node_omp'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dfticreatedescriptor_s_md_lp64.obj):(.text[DftiCreateDescriptor_s_md]+0x220): undefined reference to `mkl_dft_dfti_create_scmd'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dfticreatedescriptor_s_md_lp64.obj):(.text[DftiCreateDescriptor_s_md]+0x23f): undefined reference to `mkl_dft_dfti_create_srmd'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dfticreatedescriptor_s_md_lp64.obj):(.text[DftiCreateDescriptor_s_md]+0x24e): undefined reference to `mkl_dft_bless_node_omp'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dfticreatedescriptor_s_md_lp64.obj):(.text[DftiCreateDescriptor_s_md]+0x2a2): undefined reference to `__security_check_cookie'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dfticreatedescriptor_s_md_lp64.obj):(.xdata+0xc): undefined reference to `__GSHandlerCheck'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dftisetvalue_lp64.obj):(.text[DftiSetValue]+0x108): undefined reference to `mkl_serv_strnlen_s'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dftisetvalue_lp64.obj):(.text[DftiSetValue]+0x316): undefined reference to `__security_check_cookie'
c:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2018.3.210/windows/mkl/lib/intel64_win/mkl_intel_lp64.lib(dftisetvalue_lp64.obj):(.xdata+0xc): undefined reference to `__GSHandlerCheck'
collect2.exe: error: ld returned 1 exit status
make: *** [Makefile:75: windows] Error 1

My MingW64 is from a MSYS2 installation, version is 7.3.0. Is it still a no go, this linking is not going to happen?

↧

MKL_VERBOSE

July 13, 2018, 9:17 am

Latest and popular articles on Intel Technologies

≫ Next: Why SYTRF/SYTRI is much slower than GETRF/GETRI to compute dense matrix inverse

≪ Previous: MKL linking with MinGW64, is it still impossible?

It would be useful to limit the output of MKL_VERBOSE on a per-thread basis. For example, assuming the use of a KNL and you are running 1 process with 16 threads. You may want to limit the MKL_VERBOSE output from only 1 calling thread. (though KML calls may be using multiple threads)

↧

Why SYTRF/SYTRI is much slower than GETRF/GETRI to compute dense matrix inverse

July 13, 2018, 12:05 pm

Latest and popular articles on Intel Technologies

≫ Next: Parallel two medium size GEMM?

≪ Previous: MKL_VERBOSE

Dear MKL experts,

My project needs performing the inverse of the complex symmetric dense matrix. I do it using three different pairs of subroutines, GETRF/GETRI, SYTRF/SYTRI, and SYTRF_ROOK/SYTRI_ROOK, in order to pick best one.It is supposed that the subroutines for symmetric matrix are faster than the subroutines for full matrix from the computational math theory, as it is documented in the INTEL programmer reference manual.

GETRF(...)
for real flavors,
If m = n, The approximate number of floating-point operation is (2/3)n³
The number of operations for complex flavors is four times greater, (8/3)n³.
GETRI(...)
The total number of floating-point operations is approximately (4/3)n³ for real flavors and (16/3)n³ for complex flavors.

SYTRF(...)
The total number of floating-point operations is approximately (1/3)n³ for real flavors or (4/3)n³ for complex flavors.
SYTRI(...)
The total number of floating-point operations is approximately (2/3)n³ for real flavors and (8/3)n³ for complex flavors.
SYTRF_ROOK(...)
[No information of floating-point operations]
SYTRI_ROOK(...)
The total number of floating-point operations is approximately (2/3)n³ for real flavors and (8/3)n³ for complex flavors.

   In reality, it is GETRF/GETRI are much faster for larger size dense matrix. My test results are summarized in the below.
   Matrix Size     GETRF          SYTRF          SYTRF_ROOK
   1000x1000       0.015          0.016           0.047
   2000x2000       0.142          0.141           0.281
   5000x5000       1.486          1.406           2.595
10000x10000       9.907          9.283          16.282

   Matrix Size     GETRI          SYTRI          SYTRI_ROOK
   1000x1000       0.064          0.563           0.595
   2000x2000       0.312          4.908           4.938
   5000x5000       4.437         74.600          74.490
10000x10000      26.972        625.908         615.346

We can learn GETRF/GETRI is 15 to 20 faster than those subroutines for symmetric ones.

Why? Please give me some advice. My test code is attached as the following.

Thanks.

Attachment	Size
Download test_lapack_inverse.f90	10.6 KB

↧

Parallel two medium size GEMM?

July 14, 2018, 8:58 pm

Latest and popular articles on Intel Technologies

≫ Next: Using iterative-direct solver in pardiso

≪ Previous: Why SYTRF/SYTRI is much slower than GETRF/GETRI to compute dense matrix inverse

Hi,

i have a special use case which needs to compute two independent GEMMs.

each one with a MNK in the range of [20~4000], on Xeon skylake 8180, only reaching 600~700 GFlops/sec.

from the algorithm level, the two GEMMs has no denpendency, so they can be launched in parallel.

how can i parallel these two GEMMs? say one socket for each one, perhaps. i suppose i can't use batch GEMM for this.

↧

Using iterative-direct solver in pardiso

July 14, 2018, 9:10 pm

Latest and popular articles on Intel Technologies

≫ Next: Redistributable packges, what to supply with the application

≪ Previous: Parallel two medium size GEMM?

I'm trying to use the iterative-direct solver in Pardiso by changing the default value of iparm[3]. However that didn't work.

I have a nonsymmetric matrix, so I set iparm[3] = 61. However, when I print out iparm[19], which contains the CG/CGS diagnostics information, it always says 0. That implies CG wasn't run at all. In my code, when I first solve the linear system, I set phase to be 13. After that, I set the phase to be 23 (so that it doesn't have to factorize every time). The input for my matrix type mtype is 11.

I then tried a symmetric positive definite matrix and set iparm[3] = 62. I set mtype to be 2. That didn't work either: iparm[19] returns 0.

Did I miss anything?

↧

Redistributable packges, what to supply with the application

July 16, 2018, 12:27 am

Latest and popular articles on Intel Technologies

≫ Next: error in LAPACKE_dgesvd Example row major

≪ Previous: Using iterative-direct solver in pardiso

Good morning, all.

I am writing an application with MKL that, upon launching, requires mkl_core.dll and mkl_intel_thread.dll. As my MKL redist directory is not in %PATH%, I copied these to the application place. Then it asks for libiomp5md.dll, which comes in the redistribution packages here: https://software.intel.com/en-us/articles/redistributable-libraries-for-intel-c-and-fortran-2018-compilers-for-windows. When running, say, a FFT method, it asks for mkl_avx2.dll or mkl_def.dll. I also have to copy one of these from the MKL redist directory to the app directory. So my questions are:

One of the dependencies, libiomp5md.dll, can be obtained from the above download page, no need to ship it with the program. The other MKL dlls are in a development package. Is there any download for such redistributable files so I can keep my shipping package small and prevent the end user to download MKL dev stuff? Or it is just how it is, ship these dlls and end of story?
The console error I get when calling a FFT command says that either mkl_avx2.dll or mkl_def.dll is missing. Copying the first to the app place solves, but will mkl_def.dll be needed at some point? Should I ship just one or both?

Any input will be appreciated. Please let me know if something is not clear so I can explain further.

Thanks a lot.

↧