Install intel optimized tensorflow in windows

April 27, 2019, 10:46 pm

Latest and popular articles on Intel Technologies

≫ Next: min2norm solution of under determined system with sparse qr

Hello

I checked the tensorflow-MKL in conda for tar files but file sizes are low so Could not be sure . How can I install intel-tensorflow in windows as offline. Is conda intel channel only for linux ?

↧

min2norm solution of under determined system with sparse qr

April 26, 2019, 7:17 pm

Latest and popular articles on Intel Technologies

≫ Next: Adding a diagonal matrix to the sytrf (Bunch-Kaufman factorization) function output

≪ Previous: Install intel optimized tensorflow in windows

Does mkl_sparse_?_qr_solve() return the min2norm solution if given an under determined system? Additionally, is there a way to get the decomposition results (Q and R)? Thanks!

↧

Adding a diagonal matrix to the sytrf (Bunch-Kaufman factorization) function output

April 27, 2019, 11:44 am

Latest and popular articles on Intel Technologies

≫ Next: Calling Pardiso failed after enabling OpenMP in Visual studio

≪ Previous: min2norm solution of under determined system with sparse qr

I need to apply ?sytrf to a Matrix (A) then add a diagonal Matrix (λ I) to the output of the sytrf and after that use ?sytrs to solve a system of linear equations. However I can’t understand how output sytrf is written and what is the right way to add the diagonal matrix to get the Bunch-Kaufman factorization of the sum sytrf ( A + λ I) of the matrices. Adding the diagonal matrix before the using the function sytrf is not possible in the algorithm that we are using.

sytrf ( A + λ I) = sytrf ( A ) + C

Can't find the right way to add C to the sytrf ( A ).

I would be very grateful if someone can help me with this.

Thank you again,

David

↧

Calling Pardiso failed after enabling OpenMP in Visual studio

April 28, 2019, 2:54 am

Latest and popular articles on Intel Technologies

≫ Next: MKL 2019U3 seems to load 32bit-DLL from 64bit-App

≪ Previous: Adding a diagonal matrix to the sytrf (Bunch-Kaufman factorization) function output

Hi,

Pardiso in my Fortran program works well if I do not use OpenMP.

However, after I change the setting in Visual studio for enabling OpenMP, by:

Project > Configuration Properties > Fortran > Language > Process OpenMP Directives > Generate Parallel Code (/Qopenmp)

The error comes at calling Pardiso, showing:

forrtl: severe (157): Program Exception - access violation

Does anyone know how to solve this problem?

Thanks a lot!

Yongli

↧

MKL 2019U3 seems to load 32bit-DLL from 64bit-App

April 28, 2019, 7:19 pm

Latest and popular articles on Intel Technologies

≫ Next: Extracting Q matrix using mkl_sparse_?_qr_qmult

≪ Previous: Calling Pardiso failed after enabling OpenMP in Visual studio

Hello!

I'm porting a c++ application to x64. MSVS 2017 / C++ on Win7/64bit machine.

I've set project settings to generate code multithreaded / static linking.

However the spplication won't even start. It looks like the app is loading (and unloading) the 32bit libiomp5md.dll before it exits with code (0xc000007b).

How can I assure that the static 64bit-libs are linked and no other (possibly 32bit) DLLs get loaded?

thanks for any hints!

↧

Extracting Q matrix using mkl_sparse_?_qr_qmult

April 30, 2019, 7:13 pm

Latest and popular articles on Intel Technologies

≫ Next: Convert CSR matrix to CSC one

≪ Previous: MKL 2019U3 seems to load 32bit-DLL from 64bit-App

Hi,

Recently I've been trying to use mkl_sparse_?_qr_qmult to extract the Q matrix by multiplying Q^-1 by an identity matrix. However this function has been returning SPARSE_STATUS_NOT_SUPPORTED. I suspect this is because the matrix dimensions I set don't make sense. I tried to follow the documentation however the link below is quite confusing to me.
https://software.intel.com/en-us/mkl-developer-reference-c-mkl-sparse-qr...

Basically, after factorizing A_{mn} where m > n, I want to multiply the factorization result Q_{mm}^{-1} by the identity matrix I_{mm} so that the x matrix contains Q^-1. However the documentation says, if row major, the number of rows of x should be the number of columns of A, which is n.

Is what I want to do achievable? Could you help explain how does mkl_sparse_?_qr_qmult work?

Thanks,

Yilong

↧

Convert CSR matrix to CSC one

May 1, 2019, 7:50 am

Latest and popular articles on Intel Technologies

≫ Next: Possible Issue with xORMLQ function

≪ Previous: Extracting Q matrix using mkl_sparse_?_qr_qmult

I used mkl_dcsrcsc to convert a CSR matrix to a CSC one. After updating MKL, this routine is marked deprecated. As far as I understand, now I should use inspector-executor routines. However, mkl_sparse_?_export_csr exports a matrix only into a 4-arrays variation of CSR/CSC format, whereas the original routine mkl_dcsrcsc exports into a 3-arrays variation, which I need. Although in all my simple tests for the pointers returned by mkl_sparse_?_export_csr the relation rows_end = rows_start + 1 holds, it seems that there is no guarantee that I can use rows_start as a rowIndex array.

Is there a way to perform a conversion CSR <-> CSC with 3-arrays variation format using non-deprecated routines?

Thanks.

↧

Possible Issue with xORMLQ function

May 1, 2019, 9:16 am

Latest and popular articles on Intel Technologies

≫ Next: memory use on mkl in clusters

≪ Previous: Convert CSR matrix to CSC one

Hi,

I've noticed that when I use the xORMLQ (as part of an LQ solution of an underdetermined system), the xORMLQ function is accessing parts of the right-hand-side matrix that I do not think it should be. I've attached a small test case (double precision, real) exhibiting the behavior.

The test case attempts to solve the matrix A*x = b where A is underdetermined. In the example, A is 30 by 36. After the LQ factorization (which seems correct), you solve the problem as x = Q^T * inv(L) * b where b is of length 30 and x is of length 36. In the test case, x and b are the same array with total length 36. The step inv(L)*b is performed using a call to TRSM and the result is correct (of length 30). The application of Q^T is performed with a call to ORMLQ. In this case the input vector (called 'C' in the function) is of length 30 and the output is of length 36. However, the ORMLQ actually is dependent on the input values of the array C(31:36). If you do not pre-zero these values, the total result is incorrect. If you do pre-zero these values, the result is correct. For the non-blocked code path, the actual issue is in a GEMV called by DLARF (called by ORMLQ). This GEMV multiply is including these extra values of the C-array. In line 100-101 of the test code, you can toggle between zeroing the extra entries or stuffing them with garbage values.

Nothing in the documentation indicates that you need to zero these unused values on input to the ORMLQ function. To me it seems like undesired behavior to require the user to pre-zero these values. I feel that LAPACK should zero them if necessary in preparation for the GEMV calls (which might actually need to access them as C is computed). Or, the documentation needs to change to indicate that the user is required to pre-zero these extra array entries.

I ran the test case on Windows 7 with MKL 2019.0.2 and compile line of 'ifort /Qmkl lqbug.f90'. Note that this issue also exists in the stock version of LAPACK available from netlib.

Thanks,

John

Attachment	Size
Download ormlq_testcase.zip	3.48 KB

↧

memory use on mkl in clusters

May 3, 2019, 8:56 am

Latest and popular articles on Intel Technologies

≫ Next: User-created threads and MKL internal threads

≪ Previous: Possible Issue with xORMLQ function

Hello,

There are some very impressive memory vs mpi process plots in the excellent mkl presentation:

https://cerfacs.fr/wp-content/uploads/2016/03/Kalinkin.pdf

but its a little confusing what the memory requirements are, is the original matrix needed on each node? Sounds like it is from :
https://software.intel.com/en-us/mkl-developer-reference-fortran-dss-distributed-symmetric-matrix-storage
"The algorithm ensures that the memory required to keep internal data on each MPI process is decreased when the number of MPI processes in a run increases. However, the solver requires that matrix A and some other internal arrays completely fit into the memory of each MPI process."

Any thoughts appreciated, thanks!

Don

↧

User-created threads and MKL internal threads

May 4, 2019, 5:56 am

Latest and popular articles on Intel Technologies

≫ Next: How does reordering affects locality in the cluster sparse solver?

≪ Previous: memory use on mkl in clusters

Hi all,

Suppose I create an openmp region with say, 2 threads. And somewhere in that region I have a call to MKL, say DGEMM. Now, is it possible to force this DGEMM call to use exactly my 2 threads ? ( Note: I want DGEMM to use more than 1 thread but I don't want it to create threads of its own). Are there directives/settings to do this ?

My suspect that I can't but would be quite happy if I could.

If not, can TBB do this ? If so, How much effort is it to switch from using openmp to TBB ?

Thanks

Paresh

↧

How does reordering affects locality in the cluster sparse solver?

May 5, 2019, 2:44 am

Latest and popular articles on Intel Technologies

≫ Next: Pardiso mpi version with fatal error

≪ Previous: User-created threads and MKL internal threads

Hi,

I'm using the Intel MKL cluster sparse solver in version 2019.3 with Intel MPI library 2018.5 to solve the sparse symmetric systems occuring in a levenberg-marquardt algorithm for solving large non-linear least squares problems. Therefore, I use the distributed matrix input form iparm[39]=3 and the distributed parallel nested dissection and symbolic factorization iparm[1]=10. Since the matrix to be factorized and solved against becomes too large to fit on every computer, only those parts which are provided to the cluster sparse solver are also computed and allocated locally. Currently, the order of the rows is not particularly controlled, however, since the distributed input to the cluster sparse solver needs to be in subranges of the rows per computer. the row-ranges are determined with the goal to balance the nnz in the matrix on each computer.

The question is: How does the reordering effects the locality of the rows on different computers? Or, in other words: Does an unlucky initial order of the rows and hence the local part of the distributed matrix on every computer increases the need of communication and memory consumption?

And if so, is it possible to use the permutation-matrix from the analysis phase or any other method to permutate the rows/cols such that locality is improved (communication need and memory consumption is reduced)?

↧

Pardiso mpi version with fatal error

May 8, 2019, 8:35 am

Latest and popular articles on Intel Technologies

≫ Next: [Intel Visual Fortran, ps xe 2019.3] Again, I cannot link to intel mkl

≪ Previous: How does reordering affects locality in the cluster sparse solver?

Hello,

I receive a fatal error when using the impi version of intel Pardiso and it would be nice if someone could help me with it. I compile the code with (intel link line advisor)

mpiifort -i8 -I${MKLROOT}/include -c -o 
mkl_cluster_sparse_solver.o ${MKLROOT}/include /mkl_cluster_sparse_solver.f90

mpiifort -i8 -I${MKLROOT}/include -c -o MPI.o MPI.f90
mpiifort mkl_cluster_sparse_solver.o MPI.o -o MPI.out -Wl,
--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a 
${MKLROOT}/lib/intel64/libmkl_intel_thread.a 
${MKLROOT}/lib/intel64/libmkl_core.a 
${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group
 -liomp5 -lpthread -lm -ldl

and run it for instance on two nodes with

mpiexec -n 2 ./MPI.out

I use the 64bit interface. The funny thing is that the reordering phase perfectly works however, the factorisation and solve step don't. The error message I get is the following:

Fatal error in PMPI_Bcast: Message truncated, error stack:
PMPI_Bcast(2654)..................: MPI_Bcast(buf=0x7ffe63518210, count=1, 
MPI_LONG_LONG_INT, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1804).............: fail failed
MPIR_Bcast(1832)..................: fail failed
I_MPIR_Bcast_intra(2057)..........: Failure during collective
MPIR_Bcast_intra(1599)............: fail failed
MPIR_Bcast_binomial(247)..........: fail failed
MPIDI_CH3U_Receive_data_found(131): Message from rank 0 and tag 2 truncated;
1600 bytes received but buffer size is 8

So this seems to be a problem with the buffer size. I thought first of all that my problem is too large however, this is not an issue of the matrix size. I tried to fix it by setting

export I_MPI_SHM_LMT_BUFFER_SIZE=2000

but it did not change the problem. In the impi manual there is also the I_MPI_SHM_LMT_BUFFER_NUM and I also tried to set this number to a higher value. The following versions are used: MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239. I tried also newer versions but it changed nothing. If I should post an example please let me know. However, I have the hope that it can be easily fixed by setting the buffer size (not I_MPI_SHM_LMT_BUFFER_SIZE) to a higher value.

Thanks in advance

↧

[Intel Visual Fortran, ps xe 2019.3] Again, I cannot link to intel mkl

May 8, 2019, 9:33 am

Latest and popular articles on Intel Technologies

≫ Next: Wrong triangular part of matrix accessed in function LAPACKE_ssygvx

≪ Previous: Pardiso mpi version with fatal error

In the past, I already had several issues with automatically (meaning, through the integration tool with the switch no/yes{sequential|...}) linking to the mkl, this concerning c/c++ projects. First it never worked, second, rolling back wasn't putting the solution/project in the same state than the state in which it was before automatically linking to the mkl, therefore messing everything up. I therefore get used to link manually to the mkl, and it worked at compilation and execution.

Now, I am working with Intel Visual Fortran (from ps xe 2019 update 3) under last update of visual studio 2017, under last update of windows 10 pro.

I followed the (simple) instructions from https://software.intel.com/en-us/mkl-windows-developer-guide-automatically-linking-your-intel-visual-fortran-project-with-intel-mkl, compiled my file and had the following output in visual studio :

1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(21): error #5149: Illegal character in statement label field [M]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(21): error #5149: Illegal character in statement label field [O]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(21): error #5149: Illegal character in statement label field [D]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(21): error #5149: Illegal character in statement label field [U]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(21): error #5149: Illegal character in statement label field [L]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(22): error #5149: Illegal character in statement label field [I]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(23): error #5149: Illegal character in statement label field [I]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(24): error #5149: Illegal character in statement label field [E]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(24): error #5149: Illegal character in statement label field [N]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(24): error #5149: Illegal character in statement label field [D]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(24): error #5149: Illegal character in statement label field [M]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(26): error #5149: Illegal character in statement label field [M]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(26): error #5149: Illegal character in statement label field [O]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(26): error #5149: Illegal character in statement label field [D]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(26): error #5149: Illegal character in statement label field [U]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(26): error #5149: Illegal character in statement label field [L]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(28): error #5149: Illegal character in statement label field [I]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(28): error #5149: Illegal character in statement label field [N]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(28): error #5149: Illegal character in statement label field [T]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(28): error #5149: Illegal character in statement label field [E]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(28): error #5149: Illegal character in statement label field [R]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(29): error #5149: Illegal character in statement label field [P]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\mkl.fi(36): error #5082: Syntax error, found CHARACTER_CONSTANT 'mkl_solvers_ee.fi' when expecting one of: ( : % [ . = =>
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(22): error #5082: Syntax error, found '=' when expecting one of: :: ) ( , : * <END-OF-STATEMENT> ; . % (/ + - [ ] /) . ** > ...
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(23): error #5082: Syntax error, found IDENTIFIER 'TEGER' when expecting one of: :: ) ( , : * <END-OF-STATEMENT> ; . % (/ + - [ ] /) . = ' ** > ...
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(23): error #5082: Syntax error, found '=' when expecting one of: :: ) ( , : * <END-OF-STATEMENT> ; . % (/ + - [ ] /) . ** > ...
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(24): error #5082: Syntax error, found IDENTIFIER 'DULEF95_PRECISIONBLAS95ACEASUMREFUNCTIONSASUM_F95' when expecting one of: :: ) ( , : * <END-OF-STATEMENT> ; . % (/ + - [ ] /) . = ' ** > ...
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(29): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: ) ,
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(35): error #5149: Illegal character in statement label field [E]
1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include\blas.f90(36): error #5149: Illegal character in statement label field [P]
1>C:\Users\THEUSER~1\AppData\Local\Temp\316813.i(29742): catastrophic error: Too many errors, exiting

(I have no call to the mkl in my code for now, I just do an include of mkl.fi and blas.f90.)

Again, using the automatic linking tool failed. Fine, I am used to this.

In the project properties I undid the "use intel mkl", compiled the file, and had the same output !

Again, having automatically linked to the mkl and automatically unlinking from doesn't let the project in the state it was before, linking.

So I manually linked to it, again. (Yes, I am really annoyed when there's no thrill, when I know the same problem will appear again again, despite everyone having it, etc etc, despite paying for the libraries, etc. But let's put this aside.)

By manually linking to it, I mean doing this : in Project --> Properties -->

Fortran --> General --> Additional Include Directories : I added C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\include
Linker --> General --> Additional Libraries Directories : I added C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.3.203\windows\mkl\lib\intel64_win
Linker --> Input --> Additional Dependencies : I added mkl_blas95_lp64.lib mkl_lapack95_lp64.lib mkl_intel_lp64.lib mkl_sequential.lib

My question is : what should I do to make

the compiler provided in intel ps xe 2019.3
the mkl provided intel ps xe 2019.3

work together ? Is there a patch or something ? Because according to the ouput mentionned above, clearly, your compiler has issues with the code of the libraries (your libraries) you provide.

↧

Wrong triangular part of matrix accessed in function LAPACKE_ssygvx

May 8, 2019, 4:29 am

Latest and popular articles on Intel Technologies

≫ Next: Multi-threaded PSSYEVD not threading actually

≪ Previous: [Intel Visual Fortran, ps xe 2019.3] Again, I cannot link to intel mkl

Hi all,

with the help of valgrind I noticed that LAPACKE_ssygvx calls LAPACKE_sge_nancheck on the input matrix.

I specified to LAPACKE_ssygvx to use only the lower triangular part of the symmetric matrix (the other part is not even initialized) but LAPACKE_sge_nancheck accesses also the upper traingular part and the algorithm returns an error if there is a NaN.

The attached program verifies this. I tested it with 2018.3.222 but also 2019.3.199, both version are affected.

I guessed MKLD-3999 (Fixed the issue LAPACKE_ssyevd fails when upper triangular part of the matrix is filled with random numbers) could be a fix but nope ...

Used compiler: g++ 8.1.0, Linux

valgrind output:

==28297== Conditional jump or move depends on uninitialised value(s)
==28297==    at 0x4023C7: LAPACKE_sge_nancheck (in mkl_bug)
==28297==    by 0x401FCA: LAPACKE_ssygvx (in mkl_bug)
==28297==    by 0x401AE1: main (in mkl_bug)

Attachment	Size
Download mkl_bug.cpp	2.01 KB

↧

Multi-threaded PSSYEVD not threading actually

May 9, 2019, 3:05 am

Latest and popular articles on Intel Technologies

≫ Next: 2D Correlation fail

≪ Previous: Wrong triangular part of matrix accessed in function LAPACKE_ssygvx

Dear all,

When calling the SCALAPACK routine PSSYEVD with several threads, only the first MPI task is actually threading as expected.

I have met this issue on both my laptop and on a supercomputer, both running with MKL 2019.2.

In practice, the "top" command on my laptop when running with 2 MPI tasks and 2 MKL threads shows that the first MPI task runs at 200%, whereas the second one runs at 100%.

Note that this problem disappear when I call the double precision routine PDSYEVD instead or when I call the more regular PSSYEV.

Best,

Fabien

↧

2D Correlation fail

May 12, 2019, 1:11 am

Latest and popular articles on Intel Technologies

≫ Next: MKL Rectangular matrix Inplace transpose performance issue

≪ Previous: Multi-threaded PSSYEVD not threading actually

Hi all,
I am trying to run "vsldconv_2d_direct.c" example (code attached) but fail on the call to vsldConvExec(task, x, NULL, y, NULL, z, NULL) with error VSL_CC_ERROR_XSHAPE (-2311).
All other variants "vsldconv_2d_auto.c" and "vsldconv_2d_fft.c" fails the same - no surprise.

Everything else works fine, including "vsldconv_1d_auto.c".

I am running on Win10, Visual Studio 2017 with Microsoft compiler.
MKL version:
Intel(R) Math Kernel Library Version 2019.0.3 Product Build 20190125 for Intel(R) 64 architecture applications

Any idea?
Thanks in advance.

Eli

Attachment	Size
Download vsldconv_2d_direct.c	2.78 KB

↧

MKL Rectangular matrix Inplace transpose performance issue

May 13, 2019, 2:04 am

Latest and popular articles on Intel Technologies

≫ Next: issues on Visual Studio 2017

≪ Previous: 2D Correlation fail

I want an in place memory transpose of very large matrix. I am using mkl_simatcopy. But I am observing some performance issue while transposing inplace. I am currently using Intel(R) Xeon(R) CPU E7-8867 v4 @ 2.40GHz having 72 physical cores and redhat os.

My observation is that, when I perform transpose operation, only single core is used and it is not using all cores. I have tried all environment variables like MK_NUM_THREADS, MKL_DYNAMIC="FALSE" etc. My compilation script is as follows :

gcc -std=c99 -m64 -I $MKLROOT/include transpose.c ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_cdft_core.a ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_tbb_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_openmpi_ilp64.a -Wl,--end-group -lstdc++ -lpthread -lm -ldl -o transpose.out

Timings obtained are as follows

Sno.               No. of Rows        No. of Cols     Time(in sec)
1                          16384               8192            16
2                          16384               32768          68
3                          32768               65536          233

Data Type is float. Please let me know , if there is an efficient way to transpose inplace or how can we port to multiple cores or how can we reduce this execution time.

Below is code snippet of transpose.c:

int main(int argc,char *argv[])
{
        if(argc!=3)
        {
                printf("Usage : exe NoofScan and NoofPix \n");
                exit(0);
        }
        unsigned long noOfScan = atol(argv[1]);
        unsigned long noOfPix = atol(argv[2]);
        printf("----->>>> noOfScan = %d and noOfPix =%d \n",noOfScan,noOfPix);
        size_t nEle = noOfScan * noOfPix;

        float *data = (float *)calloc(nEle,sizeof(float));
        initalizeData(data,noOfScan,noOfPix);
long nt = mkl_get_max_threads();
        printf("No Of threads are = %d \n",nt);
        mkl_set_num_threads_local(nt);
        //mkl_set_num_threads(nt);
        double time1 = cpuSecond();
        mkl_simatcopy('R','T',noOfScan,noOfPix,1,data,noOfPix,noOfScan);
        printf("Time elapsed is %lf \n",cpuSecond()-time1);
        memset(data,0,nEle*sizeof(float));
        free(data);
}

↧

issues on Visual Studio 2017

May 14, 2019, 2:08 pm

Latest and popular articles on Intel Technologies

≫ Next: Smallest eigenvalue given by dsyevr

≪ Previous: MKL Rectangular matrix Inplace transpose performance issue

Hello,

This is my first post here.

I'm using MKL for matrices multiplication in my current project.

Sometimes - just in RELEASE mode and with no pattern neither a way to reproduce the bug - my application either crashes or returns me crazy numbers (totally out of expected, according my unit tests). Again, It's sort of random, but I'm suspicious about the MKL parallelism.

And here is the strangeness - Intel Performance Libraries Properties is set this way (look the attachments):

Debug - Use Intel MKL - Parallel
Release - Use Intel MKL - Sequential

It's never crashed in Debug mode.

As an investigative act, I forced it in runtime to be sequential by calling

mkl_set_num_threads(1);

every time my MKL routines need to be used, for both DEBUG and RELEASE.

Results?

It has not crashed anymore, neither has given me crazy numbers as output.

It's much slower, though, obviously. I was suspicious the flag above <Release - Use Intel MKL - Sequential> was not working in fact, because it should not behave differently.

Have you ever faced this kind of situation or I made something wrong right here?

I want to be back to the multi-thread mode, but I don't feel comfortable with these random errors.

Thank you very much.

Attachment	Size
Download debug.png	62.43 KB
Download release.png	62.5 KB

↧

Smallest eigenvalue given by dsyevr

May 14, 2019, 11:44 pm

Latest and popular articles on Intel Technologies

≫ Next: C++/Fortran/MPI code MKL compile error:

≪ Previous: issues on Visual Studio 2017

Hi I have been trying out LAPACKE_dsyevr using the example from https://software.intel.com/sites/products/documentation/doclib/mkl_sa/11...

This works perfectly fine for finding 3 eigenvalues.

Now when I modify the 'NSELECT' to be 5 (finding all 5 eigenvalues), the program gives me the wrong smallest eigenvalue, though the eigenvector corresponding to the smallest eigenvalue is correct.

I have also tried using 'A' as RANGE

(instead of 'I', i.e. info = LAPACKE_dsyevr( LAPACK_ROW_MAJOR, 'V', 'A', 'U', n, a, lda, vl, vu, il, iu, abstol, &m, w, z, ldz, isuppz ));

this gives me the same output as before.

The only remedy is perhaps using 'V' as RANGE, in this case when I put 'vl' to be 0 and 'vu' to be 10, I get the correct answer.

Am I doing something wrong? Any help is much appreciated, thanks!! :)

↧

C++/Fortran/MPI code MKL compile error:

May 16, 2019, 4:16 am

Latest and popular articles on Intel Technologies

≫ Next: Batched dgemm performance plateaus?

≪ Previous: Smallest eigenvalue given by dsyevr

Hi all,
I have (inherited) a code that is written with C++ (mainly), Fortran and MPI (with some scalapack calls too). We have just upgraded to Intel 2019 compilers (linux cluster).

My CMake file has the flags
set(CMAKE_C_FLAGS_INIT -static-libgcc -lstdc++)
set(CMAKE_CXX_FLAGS_INIT -static-libgcc -ansi -lstdc++)
set(CMAKE_CXX_FLAGS "-static-libstdc++ -static-libgcc -static -mkl=cluster -static-intel -ansi -qopenmp -fp-model precise -fp-model source")

and runs as expected (i.e., successfully finds the Intel 2019 MKL, MPI libraries etc).

However, when I run "make", I get the following error

/opt/intel/composer_2019/compilers_and_libraries_2019.3.199/linux/mkl/include/mkl_scalapack.h(3516): error: more than one instance of overloaded function "descinit_" has "C" linkage
void descinit_(MKL_INT* desc, const MKL_INT* m, const MKL_INT* n,

which appears to be a problem with linking to MKL.

Does anyone know what the problem could be?

Thanks in advance.

↧