Pardiso does not scale at all, and possibly a memory leak

I had a problem with Pardiso in the past (https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/...) and thanks to Alex, we were able to come out a solution in 2015.

Now I am running on an i7-8700k 3.70GHz 6-core 12-thread PC, and we are using mkl 2017 update 3. We found out that Pardiso does not scale at all. NOTE the time below is for the solve time, since we factorize the matrix once and solve thousands of times.

The testing matrix has DOF 5811 and 618378 non-zero element (sparsity 1.8%). It was ordered through Metis.

I setup the options, load the matrix, factorize the matrix, AND then I solve the same RHS 1000 times.
Here is what the problems.

1) When I set the MKL thread number to 6 (the number of physical cores), and no matter what value of i in the function Domain_Set_Num_Threads(i, MKL_DOMAIN_PARDISO), Pardiso decided to run 6 cores.
The code is like this (where NT is the number of threads, 1, 2, 4, 6)

GetMKL_Service()->Set_Num_Threads(6);
GetMKL_Service()->Domain_Set_Num_Threads(NT, 4); //4 stands for MKL_DOMAIN_PARDISO
SetOption(3, NT); // set Pardiso Option[3] to NT;
Here is the task manager screenshot
https://drive.google.com/file/d/1-hNBA2a82qIyy4DiZ0WXSveRqGuWZ2yN/view?u...

NOTE a) There are lots of red internal operation while Pardiso is running. b) the memory keeps creeping up even though I called phase=-1 after test on each number of threads.

The times are here

1 threads are running for Pardiso.solve 1000 times
Pardiso.solve takes: 8.6750000000 s to run on 1 threads.
2 threads are running for Pardiso.solve 1000 times
Pardiso.solve takes: 8.1410000000 s to run on 2 threads.
4 threads are running for Pardiso.solve 1000 times
Pardiso.solve takes: 8.1460000000 s to run on 4 threads.
6 threads are running for Pardiso.solve 1000 times
Pardiso.solve takes: 7.9120000000 s to run on 6 threads.

Pardiso scaling 1.0000000220 1 threads
Pardiso scaling 1.0655939308 2 threads
Pardiso scaling 1.0649398712 4 threads
Pardiso scaling 1.0964358178 6 threads

So there is pretty much no gain in the solve time jumping from 1thread to 6 threads

2) When I set the MKL and Pardiso to both use the i number of threads
GetMKL_Service()->Set_Num_Threads(NT);
GetMKL_Service()->Domain_Set_Num_Threads(NT, 4); //4 stands for MKL_DOMAIN_PARDISO
SetOption(3, NT); // set Pardiso Option[3] to NT;

here is the task manager screenshot.
https://drive.google.com/file/d/1H377rFFTYmEqWxTvuzGUnJhBMqD2aEZ1/view?u...

NOTE 1) at least now the Pardiso is running with different threads, consistent with Domain_Set_Num_Threads, and Pardiso Option[3]. 2) there are still lots of red spinning thread there. 3) the memory still crept up after each run.

the data is here.
1 threads are running for Pardiso.solve 1000 times
Pardiso.solve takes: 8.3920000000 s to run on 1 threads.
2 threads are running for Pardiso.solve 1000 times
Pardiso.solve takes: 7.7710000000 s to run on 2 threads.
4 threads are running for Pardiso.solve 1000 times
Pardiso.solve takes: 7.6470000000 s to run on 4 threads.
6 threads are running for Pardiso.solve 1000 times
Pardiso.solve takes: 8.0690000000 s to run on 6 threads.
Pardiso scaling 1.0000000236 1 threads
Pardiso scaling 1.0799125207 2 threads
Pardiso scaling 1.0974238523 4 threads
Pardiso scaling 1.0400297680 6 threads

We still use Pardiso the same as we posted https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/.... We want Padiso to use NT number of threads, but other MKL functions (GEMM, GESVD, etc) to use 1 thread, because they are parallelized through TBB. But we couldnot let it happen.

The tested matrix can be downloaded from here (and it is attached as a zip file)
https://drive.google.com/file/d/1wcl8cRaKq704-nFwlScgLTbTIdmhIWdd/view?u...

The format is like this
# of rows
IA
# of NNZ
JA, Valr, Vali

You read it in like this

std::ifstream is("DSparseDebug.txt", std::ios_base::in);
if (is) {
float val, valr, vali;
is >> val;
m_nRows = val - 1;
m_nCols = m_nRows;
m_row_ptr.resize(m_nRows + 1);
for (int ir = 0; ir < m_nRows + 1; ++ir)
is >> m_row_ptr[ir];
is >> val;
m_nnz = int(val);
m_col_ind.resize(m_nnz);
m_val.resize(m_nnz);
for (int iz = 0; iz < m_nnz; ++iz) {
is >> val >> valr >> vali;
m_col_ind[iz] = val;
m_val[iz] = complex(valr, vali);
}
}
For Padiso's defense, I did see some speedup in the factorization step.
Factor Step:
Pardiso scaling 1.0000000019 1 threads
Pardiso scaling 1.5593220369 2 threads
Pardiso scaling 2.1904761947 4 threads
Pardiso scaling 2.4864864913 6 threads

However, in our application, we use the matrix as preconditioner so we solve the matrix thousands of times.

And one of my coworker pointed out that in the manual:

IPARM (3) — Number of processors. Input On entry: IPARM(3) must contain the number of processors that are available for parallel execution. The number must be equal to the OpenMP environment variable OMP NUM THREADS. Note: If the user has not explicitly set OMP NUM THREADS, then this value can be set by the operating system to the maximal numbers of processors on the system. It is therefore always recommended to control the parallel execution of the solver by explicitly setting OMP NUM THREADS. If fewer processors are available than specified, the execution may slow down instead of speeding up. There is no default value for IPARM(3).

I never set the OMP NUM THREADS, so IPARM(3) cannot be equal to OMP NUM THREADS. will this cause some problem? And I found this two pages that are quite different for IPARM[3], which one should I use?

https://software.intel.com/en-us/mkl-developer-reference-fortran-pardiso...

https://pardiso-project.org/manual/manual.pdf

in Intel manual, iparm(3): Reserved. Set to zero... So how do I set Pardiso to run with different number of threads than the number of physical cores.

thanks

Attachment	Size
Download DSparseDebug.zip	11.12 MB

Pardiso does not scale at all, and possibly a memory leak

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112